Explanation Types in Explain Data

Each time you select a new mark in a viz or dashboard and run Explain Data, Tableau runs a new statistical analysis considering that mark and the underlying data in the workbook. Possible explanations are displayed in expandable sections for the Explain Data pane. For information about how Explain Data analyzes and evaluates explanations, see How Explain Data Works.

Contributing to the value of the mark

The Contributing to the value of section of the Explain Data pane lists explanations for each measure that can be explained (referred to as target measures). Each explanation listed here describes a relationship with the values of the target measure that are tested on the explained mark. Use your real-world, practical understanding of the data to determine if the relationships found by Explain Data are meaningful and worth exploring.

In this example, Trip Distance is the target measure

Mark Attributes

These explanations describe how underlying records of the marks in the view may be contributing to the aggregated value of the measure being explained. Mark attributes can include Extreme Values, Null Values, Number of Records, or the of the mark.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

Extreme Values

This explanation type indicates if one or more records have values that are significantly higher or lower than most records. If the explanation is supported by a model, it indicates the extreme value is affecting the target measure of the explained mark.

When a mark has extreme values, it doesn't automatically mean it has outliers or that you should exclude those records from the view. That choice is up to you depending on your analysis. The explanation is simply pointing out an extreme value in the mark. For example, it could reveal a mistyped value in a record where a banana cost 10 dollars instead of 10 cents. Or, it could reveal that a particular sales person had a great quarter.

Note: This explanation must be enabled by the author to be visible in viewing mode for a published workbook. For more information, see Control Access to Explain Data.

This explanation shows:

  • The number of underlying records in the explained mark.
  • The extreme value or values contributing to the value of the target measure.
  • The distribution of values in the mark.
  • The record details that correspond to each distribution value.

Exploration options:

  • Hover over a circle in the chart to see its corresponding value.
  • Click the left or right arrow below the details list to scroll through record details.
  • If available, click View Full Data, and then click the Full Data tab to see all records in a table.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • If the number of records is low, examine these values compared to the extreme value.
  • If the extreme value is significantly higher or lower than the other record values, exclude it and consider how it changes the value of the explained mark.
  • When considering the data with and without the extreme value, use this as an opportunity to apply your practical knowledge about the data.

 

In this example, a single extreme value of 463 hours rented is contributing to the higher than expected sum of Total Time Rented of 613 hours.

A likely reason for this high value could be that someone forgot to dock the bike when they returned it. In this case, the author might want to exclude this value for future analysis.

 

This section shows:

  • How the explained mark value changes when the extreme value is excluded.

 

 

 

 

 

 

 

 

 

Exploration options:

  • Click the Open  icon to see a larger version of the visualization.
  • Explore the difference with and without the extreme value (or values).
  • Authors can open the view as a new sheet and apply a filter to exclude the extreme value.

Next steps for analysis:

  • If the extreme value is significantly higher or lower than the other record values, exclude it and see how it changes the value of the explained mark.
  • When considering the data with and without the extreme value, use this as an opportunity to apply your practical knowledge about the data.
 

 

In this example, when the extreme value of 483 is excluded, the explained mark is no longer high compared to other marks in the view. Other marks now stand out. The author might want to explore the other marks to consider why these other locations have higher hours for bike rentals.

Null Values

The Null Values explanation type calls out situations where there is a higher than expected amount of missing data in a mark. It indicates the fraction of target measure values that are null and how the null values might be contributing to the aggregate value of that measure.

This explanation shows:

  • The percent of values that are null in the target measure for the explained mark (blue circle).

Exploration options:

  • Hover over each circle in the scatter plot to see its details.
  • Scroll to see more of the chart.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Optionally exclude null values in the mark for further analysis.
 

In this example, the percent of null values in the target measure is shown as a blue circle.

Number of Records

This explanation type describes when the count of the underlying records is correlated to the sum. The analysis found a relationship between the number of records that are being aggregated in a mark and the mark's actual value.

While this might seem obvious, this explanation type helps you explore whether the mark's value is being affected by the magnitude of the values in its records or simply because of the number of records in the explained mark.

This explanation shows:

  • The number of records in the target measure for the explained mark (dark blue bar).
  • The number of records in the target measure for other marks in the source visualization (light blue bar).

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Compare whether the individual values of records are low or high, or the number of records in the explained mark is low or high.
  • Authors, if you are surprised by a high number of records, you might need to normalize the data.
 

In this example, the number of records for Trip Distance is listed for each value of Ride Month, which is a dimension in the original visualization. August has the highest total trip distance value.

You might explore whether August has the highest value for trip distance because more rides occurred in August, or if it has the highest trip distance because some rides were longer.

 

Average Value of Mark

This explanation type describes when the average of a measure is correlated to the sum. Compare whether the average value is low or high, or the number of records is low or high.

This explanation shows:

  • The average of the target measure for each value of a dimension used in the source visualization.

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Compare whether the average value is low or high, or the number of records is low or high. For example, are profits high because you sold a lot of items or because you sold expensive items?
  • Try to figure out why the explained mark has a significantly higher or lower average value.

 

 

In this example, the average trip distance for August is not significantly higher or lower than most months. This suggests that trip distance is higher for August because there were more rides in August, rather than from people taking longer rides.

 

Relevant Single Value

Use this explanation to understand the composition of the record values that make up the explained mark.

This explanation type identifies when a single value in an unvisualized dimension may be contributing to the aggregate value of the explained mark. An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view.

This explanation indicates when every underlying record of a dimension has the same value, or when a dimension value stands out because either many or few of the records have the same single value for the explained mark.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The percent of the number of records for a single value of a dimension for the explained mark (blue bar) versus all marks (gray bar) in the source visualization.
  • The percent of the number of records for all other values of a dimension for the explained mark (blue bar) versus all marks (gray bar) in the source visualization.
  • The average of the target measure for the single value of a dimension in the explained mark (blue bar) versus all marks (gray bar).
  • The average of the target measure for all other values of a dimension for the explained mark (blue bar) versus all marks (gray bar) in the source visualization.

Exploration options:

  • Hover over each bar to see its details.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Use this explanation to understand the composition of the record values that make up the explained mark.
  • Authors might want to create a new visualization to explore any unvisualized dimension surfaced in this explanation.
 

In this example, the statistical analysis has exposed that many of the rides come from the station neighborhood of Back Bay. Note that Station Neighborhood is an unvisualized dimension that has some relationship to Trip Distance in the underlying data for the source visualization.

 

Relevant Dimensions

Use this explanation to understand the composition of the record values that make up the explained mark.

This explanation type shows that the distribution of an unvisualized dimension may be contributing to the aggregate value of the explained mark. This type of explanation is used for target measure sums, counts, and averages. An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The percent of the number of records for all values of a dimension for the explained mark (blue bar) versus all values of a dimension for all marks (gray bar) in the source visualization.
  • The average of the target measure for all values of a dimension for the explained mark (blue bar) all values of a dimension for all marks (gray bar).

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Use this explanation to understand the composition of the record values that make up the explained mark.
  • Authors might want to create a new visualization to explore any unvisualized dimensions surfaced in this explanation.
 

In this example, the statistical analysis has exposed that more rides were taken from Boylston stations and fewer rides were taken from MIT and Kendall, compared to rides taken for marks overall.

Note that Station Name is an unvisualized dimension that has some relationship to Trip Distance in the underlying data for the source visualization.

 

Relevant Measures

This explanation type shows that the average of an unvisualized measure may be contributing to the aggregate value of the explained mark. An unvisualized measure is a measure that exists in the data source, but isn't currently being used in the view.

This explanation can reveal a linear or quadratic relationship between the unvisualized measure and the target measure.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The relationship between the sum of the target measure and the average of an unvisualized measure for the explained mark (blue circle) and all marks (gray circles) in the view.
  • If the sum of the target measure is high or low because the average value of the unvisualized measure is high or low.

Exploration options:

  • Hover over each circle to see its details.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Authors might want to create a new visualization to explore any unvisualized measures surfaced in this explanation.
 

In this example, one possible reason why trip distance is high is because the average total time rented is also high.

Other things to explore (Unique attributes of a mark)

This section of the Explain Data pane shows possible reasons why the explained mark is unique or unusual. These explanations:

  • Do not explain why the value of this mark is what it is.
  • Are not related in any way to the value of the measures in the source visualization.
  • Do not take any target measures into account.

Relevant Single Value

The explanation type indicates when all records in the explained mark have the same single value in the unvisualized dimension, which is unusual compared to the distribution of values for all other marks in the view.

An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • When every underlying record has the same single value for a dimension.

Next steps for analysis:

  • If all records in the explained mark have the same single value, you might want to check the number of records in that mark.
  • You might check if the unvisualized dimension is a proxy for a dimension that is being used in the source visualization.
 

This example shows an explanation for data about incidents related to birds and other wildlife colliding with aircraft. In the unfortunate case of a wapiti (elk), three underlying dimensions each had underlying records with single values: Aircraft, Indicated Damage, and Time of Day.

 

Relevant Dimensions

Use this explanation to understand the composition of the record values that make up the explained mark.

The distribution of an unvisualized dimension in the explained mark is unusual compared to the distribution of values for all other marks in the view. An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view.

Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).

This explanation shows:
  • The percent of the number of records for all values of a dimension for the explained mark (blue bar) versus all values of a dimension for all marks (gray bar) in the source visualization.

Exploration options:

  • Hover over each bar to see its details.
  • Scroll to see more of the chart.
  • Click the Open  icon to see a larger version of the visualization.

Next steps for analysis:

  • Use this explanation to understand the composition of the record values that make up the explained mark.
  • Authors might want to create a new visualization to explore any unvisualized dimensions surfaced in this explanation.
 

In this example, a high percentage of records are associated with overcast weather. Because the data is about bike rentals in Boston, and the explained mark is Trip Distance for August, we can assume that the weather is typically warm and humid. People might have rented bikes more often on overcast days to avoid the heat. It's also possible there were more overcast days in August.

 

Thanks for your feedback!