Explanation Types in Explain Data

Each time you select a new mark in a viz or dashboard and run Explain Data, Tableau runs a new statistical analysis considering that mark and the underlying data in the workbook. Possible explanations are displayed in expandable sections for the Data Guide pane. For information about how Explain Data analyses and evaluates explanations, see How Explain Data Works.

Explore underlying values

This section lists explanations for each measure that can be explained (referred to as target measures). Each explanation listed here describes a relationship with the values of the target measure that are tested on the analysed mark. Use your real-world, practical understanding of the data to determine if the relationships found by Explain Data are meaningful and worth exploring.

In this example, Trip Distance is the target measure

Underlying Characteristics

These explanations describe how underlying records of the marks in the view may be contributing to the aggregated value of the measure being explained. Mark attributes can include Extreme Values, Null Values, Number of Records or the Average Value of the mark.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

Extreme Values

This explanation type indicates if one or more records have values that are significantly higher or lower than most records. If the explanation is supported by a model, it indicates the extreme value is affecting the target measure of the analysed mark.

When a mark has extreme values, it doesn't automatically mean it has outliers or that you should exclude those records from the view. That choice is up to you depending on your analysis. The explanation is simply pointing out an extreme value in the mark. For example, it could reveal a mistyped value in a record where a banana cost 10 pounds instead of 10 pence. Or, it could reveal that a particular sales person had a great quarter.

Note: This explanation must be enabled by the author to be visible in viewing mode for a published workbook. For more information, see Control Access to Explain Data.

This explanation shows:

  • The number of underlying records in the analysed mark.
  • The extreme value or values contributing to the value of the target measure.
  • The distribution of values in the mark.
  • The record details that correspond to each distribution value.

Exploration options:

  • Hover over a circle in the chart to see its corresponding value.
  • Select the left or right arrow below the details list to scroll through record details.
  • If available, select View Full Data, and then select the Full Data tab to see all records in a table.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • If the number of records is low, examine these values compared to the extreme value.
  • If the extreme value is significantly higher or lower than the other record values, exclude it and consider how it changes the value of the analysed mark.
  • When considering the data with and without the extreme value, use this as an opportunity to apply your practical knowledge about the data.

 

In this example, a single extreme value of 463 hours rented is contributing to the higher-than-expected sum of Total Time Rented of 613 hours.

A likely reason for this high value could be that someone forgot to dock the bike when they returned it. In this case, the author might want to exclude this value for future analysis.

 

Visualise the Difference

This section shows:

  • How the analysed mark value changes when the extreme value is excluded.

 

 

 

 

 

 

 

Exploration options:

  • Select the Open  icon to see a larger version of the visualisation.
  • Explore the difference with and without the extreme value (or values).
  • Authors can open the view as a new sheet and apply a filter to exclude the extreme value.

Next steps for analysis:

  • If the extreme value is significantly higher or lower than the other record values, exclude it and see how it changes the value of the analysed mark.
  • When considering the data with and without the extreme value, use this as an opportunity to apply your practical knowledge about the data.
 

In this example, when the extreme value of 483 is excluded, the analysed mark is no longer high compared to other marks in the view. Other marks now stand out. The author might want to explore the other marks to consider why these other locations have higher hours for bike rentals.

Null Values

The Null Values explanation type calls out situations where there is a higher-than-expected amount of missing data in a mark. It indicates the fraction of target measure values that are null and how the null values might be contributing to the aggregate value of that measure.

This explanation shows:

  • The percentage of values that are null in the target measure for the analysed mark (blue circle).

Exploration options:

  • Hover over each circle in the scatter plot to see its details.
  • Scroll to see more of the chart.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Optionally exclude null values in the mark for further analysis.
 

In this example, the percentage of null values in the target measure is shown as a blue circle.

Number of Records

This explanation type describes when the count of the underlying records is correlated to the sum. The analysis found a relationship between the number of records that are being aggregated in a mark and the mark's actual value.

While this might seem obvious, this explanation type helps you explore whether the mark's value is being affected by the magnitude of the values in its records or simply because of the number of records in the analysed mark.

This explanation shows:

  • The number of records in the target measure for the analysed mark (dark blue bar).
  • The number of records in the target measure for other marks in the source visualisation (light blue bar).

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Compare whether the individual values of records are low or high, or the number of records in the analysed mark is low or high.
  • Authors, if you are surprised by a high number of records, you might need to normalise the data.
 

In this example, the number of records for Trip Distance is listed for each value of Ride Month, which is a dimension in the original visualisation. August has the highest total trip distance value.

You might explore whether August has the highest value for trip distance because more rides occurred in August, or if it has the highest trip distance because some rides were longer.

Average Value of Mark

This explanation type describes when the average of a measure is correlated to the sum. Compare whether the average value is low or high, or the number of records is low or high.

This explanation shows:

  • The average of the target measure for each value of a dimension used in the source visualisation.

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Compare whether the average value is low or high, or the number of records is low or high. For example, are profits high because you sold a lot of items or because you sold expensive items?
  • Try to figure out why the analysed mark has a significantly higher or lower average value.

 

 

In this example, the average trip distance for August is not significantly higher or lower than most months. This suggests that trip distance is higher for August because there were more rides in August, rather than from people taking longer rides.

 

Contributing Single Value

Use this explanation to understand the composition of the record values that make up the analysed mark.

This explanation type identifies when a single value in an unvisualised dimension may be contributing to the aggregate value of the analysed mark. An unvisualised dimension is a dimension that exists in the data source but isn't currently being used in the view.

This explanation indicates when every underlying record of a dimension has the same value, or when a dimension value stands out because either many or few of the records have the same single value for the analysed mark.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The percentage of the number of records for a single value of a dimension for the analysed mark (blue bar) versus all marks (grey bar) in the source visualisation.
  • The percentage of the number of records for all other values of a dimension for the analysed mark (blue bar) versus all marks (grey bar) in the source visualisation.
  • The average of the target measure for the single value of a dimension in the analysed mark (blue bar) versus all marks (grey bar).
  • The average of the target measure for all other values of a dimension for the analysed mark (blue bar) versus all marks (grey bar) in the source visualisation.

Exploration options:

  • Hover over each bar to see its details.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Use this explanation to understand the composition of the record values that make up the analysed mark.
  • Authors might want to create a new visualisation to explore any unvisualised dimension surfaced in this explanation.
 

In this example, the statistical analysis has exposed that many of the rides come from the station neighbourhood of Back Bay. Note that Station Neighbourhood is an unvisualised dimension that has some relationship to Trip Distance in the underlying data for the source visualisation.

 

Top Contributors

Use this explanation to see the values that make up the largest fraction of the analysed mark.

For a COUNT aggregation, the top contributors show dimension values with the most records. For SUM, this explanation shows dimension values with the largest partial sum.

 

Contributing Dimensions

Use this explanation to understand the composition of the record values that make up the analysed mark.

This explanation type shows that the distribution of an unvisualised dimension may be contributing to the aggregate value of the analysed mark. This type of explanation is used for target measure sums, counts and averages. An unvisualised dimension is a dimension that exists in the data source but isn't currently being used in the view.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The percentage of the number of records for all values of a dimension for the analysed mark (blue bar) versus all values of a dimension for all marks (grey bar) in the source visualisation.
  • The average of the target measure for all values of a dimension for the analysed mark (blue bar) all values of a dimension for all marks (grey bar).

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Use this explanation to understand the composition of the record values that make up the analysed mark.
  • Authors might want to create a new visualisation to explore any unvisualised dimensions surfaced in this explanation.
 

In this example, the statistical analysis has exposed that more rides were taken from South Station and MIT and fewer rides were taken from Charles Circle and Kendall, compared to rides taken for marks overall.

Note that Station Name is an unvisualised dimension that has some relationship to Trip Distance in the underlying data for the source visualisation.

 

Contributing Measures

This explanation type shows that the average of an unvisualised measure may be contributing to the aggregate value of the analysed mark. An unvisualised measure is a measure that exists in the data source but isn't currently being used in the view.

This explanation can reveal a linear or quadratic relationship between the unvisualised measure and the target measure.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The relationship between the sum of the target measure and the average of an unvisualised measure for the analysed mark (blue circle) and all marks (grey circles) in the view.
  • If the sum of the target measure is high or low because the average value of the unvisualised measure is high or low.

Exploration options:

  • Hover over each circle to see its details.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Authors might want to create a new visualisation to explore any unvisualised measures surfaced in this explanation.
 

In this example, one possible reason why trip distance is high is because the average total time rented is also high.

Other things to explore

This section provides possible reasons why the analysed mark is unique or unusual. These explanations:

  • Do not explain why the value of this mark is what it is.
  • Are not related in any way to the value of the measures in the source visualisation.
  • Do not take any target measures into account.

Other Dimensions of Interest

Use this explanation to understand the composition of the record values that make up the analysed mark.

The distribution of an unvisualised dimension in the analysed mark is unusual compared to the distribution of values for all other marks in the view. An unvisualised dimension is a dimension that exists in the data source but isn't currently being used in the view.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The percentage of the number of records for all values of a dimension for the analysed mark (blue bar) versus all values of a dimension for all marks (grey bar) in the source visualisation.

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Select the Open  icon to see a larger version of the visualisation.

Next steps for analysis:

  • Use this explanation to understand the composition of the record values that make up the analysed mark.
  • Authors might want to create a new visualisation to explore any unvisualised dimensions surfaced in this explanation.
 

In this example, a high percentage of records are associated with overcast weather. Because the data is about bike rentals in Boston, and the analysed mark is Trip Distance for August, we can assume that the weather is typically warm and humid. People might have rented bikes more often on overcast days to avoid the heat. It's also possible there were more overcast days in August.

 

Thanks for your feedback!Your feedback has been successfully submitted. Thank you!