Aggregate Functions in Tableau

This article introduces aggregate functions and their uses in Tableau. It also demonstrates how to create an aggregate calculation using an example.

Why use aggregate functions

Aggregate functions allow you to summarise or change the granularity of your data.

For example, you might want to know exactly how many orders your store had for a particular year. You can use the COUNTD function to summarise the exact number of orders your company had, and then break the visualisation down by year.

The calculation might look something like this:

COUNTD(Order ID)

The visualisation might look something like this:

Aggregate functions available in Tableau

Aggregations and floating-point arithmetic: The results of some aggregations may not always be exactly as expected. For example, you may find that the Sum function returns a value such as -1.42e-14 for a column of numbers that you know should sum to exactly 0. This happens because the Institute of Electrical and Electronics Engineers (IEEE) 754 floating-point standard requires that numbers be stored in binary format, which means that numbers are sometimes rounded at extremely fine levels of precision. You can eliminate this potential distraction by using the ROUND function (see Number Functions) or by formatting the number to show fewer decimal places.

ATTR

SyntaxATTR(expression)
DefinitionReturns the value of the expression if it has a single value for all rows. Otherwise returns an asterisk. Null values are ignored.

AVG

SyntaxAVG(expression)
DefinitionReturns the average of all the values in the expression. Null values are ignored.
NotesAVG can only be used with numeric fields.

COLLECT

SyntaxCOLLECT(spatial)
DefinitionAn aggregate calculation that combines the values in the argument field. Null values are ignored.
NotesCOLLECT can only be used with spatial fields.

CORR

SyntaxCORR(expression1, expression2)
OutputNumber from -1 to 1
DefinitionReturns the Pearson correlation coefficient of two expressions.
Example
example
Notes

The Pearson correlation measures the linear relationship between two variables. Results range from -1 to +1 inclusive, where 1 denotes an exact positive linear relationship, 0 denotes no linear relationship between the variance and −1 is an exact negative relationship.

The square of a CORR result is equivalent to the R-Squared value for a linear trend line model. See Trend Line Model Terms(Link opens in a new window).

Use with table-scoped LOD expressions:

You can use CORR to visualise correlation in a disaggregated scatter using a table-scoped level of detail expression(Link opens in a new window). For example:

{CORR(Sales, Profit)}

With a level of detail expression, the correlation is run over all rows. If you used a formula like CORR(Sales, Profit) (without the surrounding brackets to make it a level of detail expression), the view would show the correlation of each individual point in the scatter plot with each other point, which is undefined.

Database limitations

CORR is available with the following data sources: Tableau data extracts, Cloudera Hive, EXASolution, Firebird (version 3.0 and later), Google BigQuery, Hortonworks Hadoop Hive, IBM PDA (Netezza), Oracle, PostgreSQL, Presto, SybaseIQ, Teradata, Vertica.

For other data sources, consider either extracting the data or using WINDOW_CORR. See Table Calculation Functions(Link opens in a new window).

COUNT

SyntaxCOUNT(expression)
DefinitionReturns the number of items. Null values are not counted.

COUNTD

SyntaxCOUNTD(expression)
DefinitionReturns the number of distinct items in a group. Null values are not counted.

COVAR

SyntaxCOVAR(expression1, expression2)
DefinitionReturns the sample covariance of two expressions.
Notes

Covariance quantifies how two variables change together. A positive covariance indicates that the variables tend to move in the same direction, as when larger values of one variable tend to correspond to larger values of the other variable, on average. Sample covariance uses the number of non-null data points n-1 to normalise the covariance calculation, rather than n, which is used by the population covariance (available with the COVARP function). Sample covariance is the appropriate choice when the data is a random sample that is being used to estimate the covariance for a larger population.

If <expression1> and <expression2> are the same, for example COVAR([profit], [profit]), COVAR returns a value that indicates how widely values are distributed.

The value of COVAR(X, X) is equivalent to the value of VAR(X) and also to the value of STDEV(X)^2.

Database limitations

COVAR is available with the following data sources: Tableau data extracts, Cloudera Hive, EXASolution, Firebird (version 3.0 and later), Google BigQuery, Hortonworks Hadoop Hive, IBM PDA (Netezza), Oracle, PostgreSQL, Presto, SybaseIQ, Teradata, Vertica.

For other data sources, consider either extracting the data or using WINDOW_COVAR. See Table Calculation Functions(Link opens in a new window).

COVARP

SyntaxCOVARP(expression 1, expression2)
DefinitionReturns the population covariance of two expressions.
Notes

Covariance quantifies how two variables change together. A positive covariance indicates that the variables tend to move in the same direction, as when larger values of one variable tend to correspond to larger values of the other variable, on average. Population covariance is sample covariance multiplied by (n-1)/n, where n is the total number of non-null data points. Population covariance is the appropriate choice when there is data available for all items of interest as opposed to when there is only a random subset of items, in which case sample covariance (with the COVAR function) is appropriate.

If <expression1> and <expression2> are the same, for example COVARP([profit], [profit]), COVARP returns a value that indicates how widely values are distributed. Note: The value of COVARP(X, X) is equivalent to the value of VARP(X) and also to the value of STDEVP(X)^2.

Database limitations

COVARP is available with the following data sources: Tableau data extracts, Cloudera Hive, EXASolution, Firebird (version 3.0 and later), Google BigQuery, Hortonworks Hadoop Hive, IBM PDA (Netezza), Oracle, PostgreSQL, Presto, SybaseIQ, Teradata, Vertica

For other data sources, consider either extracting the data or using WINDOW_COVAR. See Table Calculation Functions(Link opens in a new window).

MAX

SyntaxMAX(expression) or MAX(expr1, expr2)
OutputSame data type as the argument, or NULL if any part of the argument is null.
Definition

Returns the maximum of the two arguments, which must be of the same data type.

MAX can also be applied to a single field as an aggregation.

Example
MAX(4,7) = 7
MAX(#3/25/1986#, #2/20/2021#) = #2/20/2021#
MAX([Name]) = "Zander"
Notes

For strings

MAX is usually the value that comes last in alphabetical order.

For database data sources, the MAX string value is highest in the sort sequence defined by the database for that column.

For dates

For dates, the MAX is the most recent date. If MAX is an aggregation, the result will not have a date hierarchy. If MAX is a comparison, the result will retain the date hierarchy.

As an aggregation

MAX(expression) is an aggregate function and returns a single aggregated result. This displays as AGG(expression) in the viz.

As a comparison

MAX(expr1, expr2) compares the two values and returns a row-level value.

See also MIN.

MEDIAN

SyntaxMEDIAN(expression)
DefinitionReturns the median of an expression across all records. Null values are ignored.
NotesMEDIAN can only be used with numeric fields.
Database limitations

MEDIAN is not available for the following data sources: Access, Amazon Redshift, Cloudera Hadoop, HP Vertica, IBM DB2, IBM PDA (Netezza), Microsoft SQL Server, MySQL, SAP HANA, Teradata.

For other data source types, you can extract your data into an extract file to use this function. See Extract Your Data(Link opens in a new window).

MIN

SyntaxMIN(expression) or MIN(expr1, expr2)
OutputSame data type as the argument, or NULL if any part of the argument is null.
Definition

Returns the maximum of the two arguments, which must be of the same data type.

MIN can also be applied to a single field as an aggregation.

Example
MIN(4,7) = 4
MIN(#3/25/1986#, #2/20/2021#) = #3/25/1986#
MIN([Name]) = "Abebi"
Notes

For strings

MIN is usually the value that comes first in alphabetical order.

For database data sources, the MIN string value is lowest in the sort sequence defined by the database for that column.

For dates

For dates, the MIN is the earliest date. If MIN is an aggregation, the result will not have a date hierarchy. If MIN is a comparison, the result will retain the date hierarchy.

As an aggregation

MIN(expression) is an aggregate function and returns a single aggregated result. This displays as AGG(expression) in the viz.

As a comparison

MIN(expr1, expr2) compares the two values and returns a row-level value.

See also MAX.

PERCENTILE

SyntaxPERCENTILE(expression, number)
DefinitionReturns the percentile value from the given expression corresponding to the specified <number>. The <number> must be between 0 and 1 (inclusive) and must be a numeric constant.
Example
PERCENTILE([Score], 0.9)
Notes 
Database limitations

This function is available for the following data sources: Non-legacy Microsoft Excel and Text File connections, Extracts and extract-only data source types (for example, Google Analytics, OData or Salesforce), Sybase IQ 15.1 and later data sources, Oracle 10 and later data sources, Cloudera Hive and Hortonworks Hadoop Hive data sources, EXASolution 4.2 and later data sources.

For other data source types, you can extract your data into an extract file to use this function. See Extract Your Data(Link opens in a new window).

STDEV

SyntaxSTDEV(expression)
DefinitionReturns the statistical standard deviation of all values in the given expression based on a sample of the population.

STDEVP

SyntaxSTDEVP(expression)
DefinitionReturns the statistical standard deviation of all values in the given expression based on a biased population.

SUM

SyntaxSUM(expression)
DefinitionReturns the sum of all values in the expression. Null values are ignored.
NotesSUM can only be used with numeric fields.

VAR

SyntaxVAR(expression)
DefinitionReturns the statistical variance of all values in the given expression based on a sample of the population.

VARP

SyntaxVARP(expression)
DefinitionReturns the statistical variance of all values in the given expression on the entire population.

Create an aggregate calculation

Follow along with the steps below to learn how to create an aggregate calculation.

  1. In Tableau Desktop, connect to the Sample - Superstore saved data source, which comes with Tableau.

  2. Navigate to a worksheet and select Analysis > Create Calculated Field.

  3. In the calculation editor that opens, do the following:

    • Name the calculated field Margin.

    • Enter the following formula:

      IIF(SUM([Sales]) !=0, SUM([Profit])/SUM([Sales]), 0)

      Note: You can use the function reference to find and add aggregate functions and other functions (like the logical IIF function in this example) to the calculation formula. For more information, see Use the functions reference in the calculation editor.

    • When finished, click OK.

    The new aggregate calculation appears under Measures in the Data pane. Just like your other fields, you can use it in one or more visualisations.

    Note: Aggregation calculations are always measures.

    When Margin is placed on a shelf or card in the worksheet, its name is changed to AGG(Margin), which indicates that it is an aggregate calculation and cannot be aggregated any further.

    A graphic depicting what happens when you place an aggregate calculation on a shelf. The field displays the AGG prefix.

Rules for aggregate calculations

The rules that apply to aggregate calculations are as follows:

  • For any aggregate calculation, you cannot combine an aggregated value and a disaggregated value. For example, SUM(Price)*[Items] is not a valid expression because SUM(Price) is aggregated and Items is not. However, SUM(Price*Items) and SUM(Price)*SUM(Items) are both valid.

  • Constant terms in an expression act as aggregated or disaggregated values as appropriate. For example: SUM(Price*7) and SUM(Price)*7 are both valid expressions.

  • All of the functions can be evaluated on aggregated values. However, the arguments to any given function must either all be aggregated or all disaggregated. For example: MAX(SUM(Sales),Profit) is not a valid expression because Sales is aggregated and Profit is not. However, MAX(SUM(Sales),SUM(Profit)) is a valid expression.

  • The result of an aggregate calculation is always a measure.

  • Like predefined aggregations, aggregate calculations are computed correctly for grand totals. Refer to Grand Totals for more information.

See also

Understanding Calculations: Aggregate Calculations(Link opens in a new window)

Data Aggregation in Tableau(Link opens in a new window)

Functions in Tableau

Tableau Functions (by Category)(Link opens in a new window)

Tableau Functions (Alphabetical)

Thanks for your feedback!Your feedback has been successfully submitted. Thank you!