Explanation Types in Explain Data
Each time you select a new mark in a viz or dashboard and run Explain Data, Tableau runs a new statistical analysis considering that mark and the underlying data in the workbook. Possible explanations are displayed in expandable sections for the Data Guide pane. For information about how Explain Data analyzes and evaluates explanations, see How Explain Data Works.
Explore underlying values
This section lists explanations for each measure that can be explained (referred to as target measures). Each explanation listed here describes a relationship with the values of the target measure that are tested on the analyzed mark. Use your real-world, practical understanding of the data to determine if the relationships found by Explain Data are meaningful and worth exploring.
Underlying Characteristics
These explanations describe how underlying records of the marks in the view may be contributing to the aggregated value of the measure being explained. Mark attributes can include Extreme Values, Null Values, Number of Records, or the Average Value of the mark.
Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).
Extreme Values
This explanation type indicates if one or more records have values that are significantly higher or lower than most records. If the explanation is supported by a model, it indicates the extreme value is affecting the target measure of the analyzed mark.
When a mark has extreme values, it doesn't automatically mean it has outliers or that you should exclude those records from the view. That choice is up to you depending on your analysis. The explanation is simply pointing out an extreme value in the mark. For example, it could reveal a mistyped value in a record where a banana cost 10 dollars instead of 10 cents. Or, it could reveal that a particular sales person had a great quarter.
Note: This explanation must be enabled by the author to be visible in viewing mode for a published workbook. For more information, see Control Access to Explain Data.
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, a single extreme value of 463 hours rented is contributing to the higher than expected sum of Total Time Rented of 613 hours. A likely reason for this high value could be that someone forgot to dock the bike when they returned it. In this case, the author might want to exclude this value for future analysis.
|
|
Visualize the DifferenceThis section shows:
Exploration options:
Next steps for analysis:
|
In this example, when the extreme value of 483 is excluded, the analyzed mark is no longer high compared to other marks in the view. Other marks now stand out. The author might want to explore the other marks to consider why these other locations have higher hours for bike rentals. |
Null Values
The Null Values explanation type calls out situations where there is a higher than expected amount of missing data in a mark. It indicates the fraction of target measure values that are null and how the null values might be contributing to the aggregate value of that measure.
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, the percent of null values in the target measure is shown as a blue circle. |
Number of Records
This explanation type describes when the count of the underlying records is correlated to the sum. The analysis found a relationship between the number of records that are being aggregated in a mark and the mark's actual value.
While this might seem obvious, this explanation type helps you explore whether the mark's value is being affected by the magnitude of the values in its records or simply because of the number of records in the analyzed mark.
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, the number of records for Trip Distance is listed for each value of Ride Month, which is a dimension in the original visualization. August has the highest total trip distance value. You might explore whether August has the highest value for trip distance because more rides occurred in August, or if it has the highest trip distance because some rides were longer. |
Average Value of Mark
This explanation type describes when the average of a measure is correlated to the sum. Compare whether the average value is low or high, or the number of records is low or high.
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, the average trip distance for August is not significantly higher or lower than most months. This suggests that trip distance is higher for August because there were more rides in August, rather than from people taking longer rides. |
Contributing Single Value
Use this explanation to understand the composition of the record values that make up the analyzed mark.
This explanation type identifies when a single value in an unvisualized dimension may be contributing to the aggregate value of the analyzed mark. An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view.
This explanation indicates when every underlying record of a dimension has the same value, or when a dimension value stands out because either many or few of the records have the same single value for the analyzed mark.
Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, the statistical analysis has exposed that many of the rides come from the station neighborhood of Back Bay. Note that Station Neighborhood is an unvisualized dimension that has some relationship to Trip Distance in the underlying data for the source visualization. |
Top Contributors
Use this explanation to see the values that make up the largest fraction of the analyzed mark.
For a COUNT aggregation, the top contributors show dimension values with the most records. For SUM, this explanation shows dimension values with the largest partial sum.
Contributing Dimensions
Use this explanation to understand the composition of the record values that make up the analyzed mark.
This explanation type shows that the distribution of an unvisualized dimension may be contributing to the aggregate value of the analyzed mark. This type of explanation is used for target measure sums, counts, and averages. An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view.
Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, the statistical analysis has exposed that more rides were taken from South Station and MIT and fewer rides were taken from Charles Circle and Kendall, compared to rides taken for marks overall. Note that Station Name is an unvisualized dimension that has some relationship to Trip Distance in the underlying data for the source visualization. |
Contributing Measures
This explanation type shows that the average of an unvisualized measure may be contributing to the aggregate value of the analyzed mark. An unvisualized measure is a measure that exists in the data source, but isn't currently being used in the view.
This explanation can reveal a linear or quadratic relationship between the unvisualized measure and the target measure.
Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, one possible reason why trip distance is high is because the average total time rented is also high. |
Other things to explore
This section provides possible reasons why the analyzed mark is unique or unusual. These explanations:
- Do not explain why the value of this mark is what it is.
- Are not related in any way to the value of the measures in the source visualization.
- Do not take any target measures into account.
Other Dimensions of Interest
Use this explanation to understand the composition of the record values that make up the analyzed mark.
The distribution of an unvisualized dimension in the analyzed mark is unusual compared to the distribution of values for all other marks in the view. An unvisualized dimension is a dimension that exists in the data source, but isn't currently being used in the view.
Note: For definitions of common terms used in explanations, see Terms and concepts in explanations(Link opens in a new window).
This explanation shows:
Exploration options:
Next steps for analysis:
|
In this example, a high percentage of records are associated with overcast weather. Because the data is about bike rentals in Boston, and the analyzed mark is Trip Distance for August, we can assume that the weather is typically warm and humid. People might have rented bikes more often on overcast days to avoid the heat. It's also possible there were more overcast days in August. |