Data Aggregation in Tableau
In Tableau, you can aggregate measures or dimensions, though it’s more common to aggregate measures. Whenever you add a measure to your view, an aggregation is applied to that measure by default. The type of aggregation applied varies depending on the context of the view.
Change the Aggregation of a Measure in the View
When you add a measure to the view, Tableau automatically aggregates its values. Sum, average, and median are common aggregations; for a complete list, see List of Predefined Aggregations in Tableau.
The current aggregation appears as part of the measure's name in the view. For example, Sales becomes SUM(Sales). Every measure has a default aggregation that is set by Tableau when you connect to a data source. You can view or change the default aggregation for a measure—see Set the Default Aggregation for a Measure.
You can aggregate measures using Tableau only for relational data sources. Multidimensional data sources contain data that is already aggregated. In Tableau, multidimensional data sources are supported only in Windows.
You can change the aggregation for a measure in the view from its context menu:
Aggregating Dimensions
You can aggregate a dimension in the view as Minimum, Maximum, Count, or Count (Distinct). When you aggregate a dimension, you create a new temporary measure column, so the dimension actually takes on the characteristics of a measure.
Another way to view a dimension is to treat it as an Attribute by choosing Attribute from the context menu for the dimension. The Attribute aggregation has several uses:
- It can ensure a consistent level of detail when blending multiple data sources.
- It can provide a way to aggregate dimensions when computing table calculations, which require an aggregate expression.
- It can improve query performance because it’s computed
Tableau computes Attribute using the following formula:
IF MIN([dimension]) = MAX([dimension]) THEN
MIN([dimension]) ELSE "*" END
The formula is computed in Tableau after the data is retrieved from the initial query. The asterisk (*) is actually a visual indicator of a special type of Null value that occurs when there are multiple values. See Troubleshoot Data Blending(Link opens in a new window) to learn more about the asterisk.
Here is an example of using Attribute in a table calculation. The table shows sales by market, market size, and state. Suppose you wanted to compute the percent of total sales each state contributed to the market. When you add a Percent of Total quick table calc (see Quick Table Calculations(Link opens in a new window)) that computes along State, the calculation computes within the red area. This is because the Market Size dimension is partitioning the data.
When you aggregate Market Size as an Attribute, the calculation is computed within the Market (East, in the following image), and Market Size is purely as a label.
List of Predefined Aggregations in Tableau
Sometimes it’s useful to look at numerical data in an aggregated form such as a sum or an average. The mathematical functions that produce aggregated data are called aggregation functions. Aggregation functions perform a calculation on a set of values and return a single value. For example, a measure that contains the values 1, 2, 3, 3, 4 aggregated as a sum returns a single value: 13. Or if you have 3,000 sales transactions from 50 products, you might want to view the sum of sales per product to determine which products have the highest revenue.
Note: Using floating-point values with aggregations can sometimes lead to unexpected results. For details, see Understanding data types in calculations(Link opens in a new window).
Tableau provides a set of predefined aggregations that are shown
in the table. You can set the default aggregation for any measure that isn’t a calculated field that itself contains an aggregation, such as AVG([Discount])
. See Set the Default Aggregation for a Measure. You can also set the aggregation for a field already in the view. For details, see Change the Aggregation of a Measure in the View.
Aggregation | Description | Result for measure that contains 1, 2, 2, 3 |
---|---|---|
Attribute | Returns the value of the given expression if it only has a single value for all rows in the group, otherwise it displays an asterisk (*) character. Null values are ignored. This aggregation is useful when aggregating a dimension. To set a measure in the view to this aggregation, right-click (control-click on Mac) the measure and choose Attribute. The field then changes to show the text ATTR: | * |
Dimension | Returns all unique values in a measure or dimension. | 3 values (1, 2, 3) |
Sum | Returns the sum of the numbers in a measure. Null values are ignored. | 1 value (8) |
Average | Returns the arithmetic mean of the numbers in a measure. Null values are ignored. | 1 value (2) |
Count (Distinct) | Returns the number of unique values in a measure or dimension. When applied to a dimension, Tableau creates a new temporary column that is a measure because the result of a count is a number. You can count numbers, dates, booleans, and strings. Null values are ignored in all cases. This aggregation isn’t available for the following types of workbooks:
If you’re connected to a workbook that uses of one of these types, Count (Distinct) is unavailable and Tableau shows the message "Requires extract." To use this aggregation, extract your data. See Extract Your Data. | 1 value (3) |
Minimum | Returns the smallest number in a measure or continuous dimension. Null values are ignored. | 1 value (1) |
Maximum | Returns the largest number in a measure or in the given expression based on a sample population. Null values are ignored. Returns a Null if there are fewer than 2 members in the sample that aren’t Null. Use this function if your data represents a sample of the population. | 1 value (3) |
Std. Dev (Pop.) | Returns the standard deviation of all values in the given expression based on a biased population. Assumes that its arguments consist of the entire population. Use this function for large sample sizes. | 1 value (0.7071) |
Variance | Returns the variance of all values in the given expression based on a sample. Null values are ignored. Returns a Null if there are fewer than 2 members in the sample that are not Null. Use this function if your data represents a sample of the population. | 1 value (0.6667) |
Variance (Pop.) | Returns the variance of all values in the given expression based on a biased population. Assumes that its arguments consist of the entire population. Use this function for large sample sizes. | 1 value (0.5000) |
Disaggregate | Returns all records in the underlying data source. To disaggregate all measures in the view, select Aggregate Measures from the Analysis menu (to clear the check mark). Tableau allows you to view data in disaggregated form (relational databases only). When data are disaggregated, you can view the individual rows of your data source. For example, after discovering that the sum of sales for rubber bands is $14,600, you might want to see the distribution of individual sales transactions. To answer this question, you must create a view that shows individual rows of data. That is, you must disaggregate the data (see How to Disaggregate Data). Another way to look at disaggregated data is to view the underlying data for all or part of a view. For more details, see View Underlying Data. | 4 values (1, 2, 2, 3) |
You can also define custom aggregations as described in Aggregate Functions in Tableau(Link opens in a new window). Depending on the type of data view you create, Tableau applies these aggregations at the appropriate level of detail. For example, Tableau applies the aggregation to individual dimension members (the average delivery time in the East region), all members in a given dimension (the average delivery time in the East, West, and Central regions), or groups of dimensions (the sum of sales for all regions and for all markets).
Set the Default Aggregation for a Measure
You can set the default aggregation for any measure that isn’t a calculated field that itself contains an aggregation, such as AVG([Discount])
. A default aggregation is a preferred
calculation for summarizing a continuous or discrete field. The
default aggregation is automatically used when you drag a measure to a view.
To change the default aggregation:
Right-click (control-click on Mac) a measure in the Data pane and select Default Properties > Aggregation, and then select one of the aggregation options.
Note: You can use Tableau to aggregate measures
only with relational data sources. Multidimensional data
sources contain aggregated data only.
You can’t set default aggregations for published data sources. The default aggregation is set when the data source is initially published. Create a Local Copy(Link opens in a new window) of the published data source to adjust the default aggregation.
How to Disaggregate Data
Whenever you add a measure to your view, an aggregation is applied to that measure by default. This default is controlled by the Aggregate Measures setting in the Analysis menu.
If you decide you want to see all the marks in the view at the most detailed level of granularity, you can disaggregate the view. Disaggregating your data means that Tableau displays a separate mark for every data value in every row of your data source.
To disaggregate all measures in the view:
-
Clear the Analysis >Aggregate Measures option. If it’s already selected, click Aggregate Measures one time to deselect it.
When Aggregate Measures is selected, Tableau attempts to aggregate measures in the view by default. Individual row values from your data source are aggregated into a single value (a single mark) at the level of detail in your view.
The different aggregations available for a measure determine how the individual values are collected: they can be added (SUM), averaged (AVG), or set to the maximum (MAX) or minimum (MIN) value from the individual row values.
For a complete list of the available aggregations, List of Predefined Aggregations in Tableau.
The level of detail is determined by the dimensions in your view—for information about the concept of level of detail, see How dimensions affect the level of detail in the view.
Disaggregating your data can be useful for analyzing measures that you may want to use both independently and dependently in the view. For example, you may be analyzing the results from a product satisfaction survey with the Age of participants along one axis. You can aggregate the Age field to determine the average age of participants or disaggregate the data to determine at what age participants were most satisfied with the product.
Disaggregating data can be useful when you’re viewing data as a scatter plot. See Example: Scatter Plots, Aggregation, and Granularity.
Note: If your data source is very large, disaggregating the data can result in a significant performance degradation.
Example: Scatter Plots, Aggregation, and Granularity
If you place one measure on the Rows shelf and another measure on the Columns shelf, you’re asking Tableau to compare two numerical values. Typically, Tableau chooses a scatter plot as the default visualization in such cases. The initial view will most likely be a single mark, showing the sum for all values for the two measures. This is because you must increase the level of detail in the view.
Start building the scatter plot
There are various ways to add detail to a basic scatter plot: you can use dimensions to add detail, you can add more fields to the Rows and Columns shelves, or you can disaggregate the data. You can also use any combination of these options. This topic looks at these alternatives using the Sample-Superstore data source.
To create the initial view, follow these steps:
-
Place the Sales measure on the Columns shelf.
-
Place the Profit measure on the Rows shelf.
The measures are automatically aggregated as sums. The default aggregation (SUM) is indicated in the field names. The values shown in the tooltip show the sum of sales and profit values across every row in the data source.
Follow these steps to use dimensions to add detail to the view and to disaggregate data.
Use dimensions to add detail
Follow these steps to develop the scatter plot view you created by adding dimensions to show more levels of detail.
-
Drag the Category dimension to Color on the Marks card.
This separates the data into three marks—one for each dimension member—and encodes the marks using color.
-
Drag the State dimension to Detail on the Marks card.
Now there are many more marks in the view. The number of marks is equal to the number of distinct states in the data source multiplied by the number of categories.
Although more marks are now displayed, the measures are still aggregated. So regardless of whether there’s one row in the data source where State = North Dakota and Category= Furniture, or 100 such rows, the result is always a single mark.
Maybe this process is developing the view in a direction you find useful, or maybe you prefer to go in a different direction. You decide what questions to ask.
Try adding more fields to the rows and columns shelves
Revert to the original one-mark view and follow these steps to develop the scatter plot view by adding fields to the Rows and Columns shelves.
-
Drag the State dimension to the Columns shelf.
Even if you drop Continent to the right of SUM(Sales), Tableau moves it to the left of SUM(Sales). This is because you can’t insert a dimension within a continuous axis. Instead, your view shows a separate axis for each member of the dimension.
-
Drag the Segment dimension to the Rows shelf.
You now have a view that provides an overview of Sales and Profit across states and customer segments. It can be interesting to hover over the marks in the view to see tooltip data for various segments:
Try disaggregating the data
Another way to modify your original one-mark scatter plot to display more marks is by disaggregating the data.
Clear the Analysis >Aggregate Measures option. If it’s already selected, click Aggregate Measures one time to deselect it.
What you have done is to dis-aggregate the data, because this command is a switch that was originally selected (check mark present). Tableau aggregates data in your view by default.
Now you see many marks—one for each row in your original data source:
When you disaggregate measures, you no longer are looking at the average or sum for the values in the rows in the data source. Instead, the view shows a mark for every row in the data source. Disaggregating data is a way to look at the entire surface area of the data. It's a quick way to understand the shape of your data and to identify outliers. In this case, the view showsthere’s a consistent relationship between sales and profit, indicated by the line of marks aligned at a forty-five degree angle.