Create Efficient Calculations

When your data doesn’t provide all the information you need to answer your questions, you can create calculated fields to help with your analysis.

Within a calculated field you can define a hardcoded constant (such as a tax rate), do very simple mathematical operations like subtraction or multiplication (for example, revenues minus cost), use more complex mathematical formulas, perform logical tests (IF/THEN, CASE), do type conversions, send expressions to external services such as R, and much more.

There are different calculation types in Tableau:

Basic and aggregate calculations: These types of calculations are generated as part of the query to the underlying data source and are calculated in the database. In general, basic and aggregate calculations scale very well, and there are many database tuning techniques that can improve their performance.

Table calculations: These calculations are calculated by Tableau on the query result set. While this means more work for Tableau, table calculations are generally done over a much smaller set of records than are in the original data source. If table calculation performance is a problem (possibly because the result set returned to Tableau is very large) consider pushing some aspects of the calculation back to the data source layer. One way to do this is to aggregate the data and then perform the calculation on the aggregated data.

Level of detail (LOD) expressions: LOD expressions are generated as part of the query to the underlying data source and are calculated in the database. They are expressed as a nested select, so they are dependent on database performance. A table calculation or blending might perform better than a LOD expression, or vice versa.

If you suspect performance is slow due to a LOD expression, you might try replacing it with a table calculation or a data blend to see if performance improves. For an example, see Example 2 in Tableau's Order of Operations.

LOD expressions can be affected by join culling, so look at Assuming Referential Integrity for Joins if your queries run slowly when you use LOD expressions.

For more information, see the Tableau White Paper Understanding Level of Detail (LOD) Expressions.

Use parameters for conditional calculations

A common technique in Tableau is to show a parameter control so users can select a value that determines how a calculation is performed. Typically, to give the user easy-to-understand options, it makes sense to create the parameter as a string type. But numerical calculations are much faster than string calculations, so take advantage of the Display As feature of parameters: that is, show text labels but use underlying integer values for the calculation logic. For more information, see Create Parameters.

Convert date fields

Users often have date data that is not stored in native date formats. For example, a date might be a string or a numeric timestamp. You can use the DATEPARSE function if your data supports it – this function is available for non-legacy Microsoft Excel and text file connections, MySQL, Oracle, PostgreSQL, and Tableau data extract data sources. Otherwise, parse the field into a date string such as “2012-01-01”. ISO strings are preferred because they are not locale-specific. Then pass the value to the DATE function. If the originating data is a numeric field, converting it first to a string and then to a date is very inefficient. It is much better to keep the data as numeric and use DATEADD and date literal values to perform the calculation. The performance gains can be significant with large data sets. See Date Functions for more information about these functions.

Use CASE logic statements

When working with complex logic statements, CASE statements may be faster than IF or ELSEIF statements.

Remember that ELSEIF is faster than ELSE IF, because a nested IF computes a second IF statement rather than being computed as part of the first.

For more information, see the whitepaper Designing Efficient Workbooks.

Aggregate measures

If the views you create are slow, make sure you are working with aggregated measures. With disaggregated data, you might be trying to view many rows of data at once. You can reduce the number of rows by aggregating the data. To do this, select Analysis > Aggregate Measures.

Calculation tips

There are lots of little things you can do that can improve calculation performance.

  • Distinct counting values is one of the slowest aggregation types in almost all data sources. Use the COUNTD aggregation sparingly.

  • Using parameters with a wide scope of impact (for example, in a custom SQL statement) can affect cache performance.

  • Filtering on complex calculations can potentially cause indexes to be missed in the underlying data.

  • Script functions like RAWSQL and SCRIPT_* for integrating with external services can be slow, particularly if there are lots of values that need to be passed back and forth from the DBMS/R server.

  • Use NOW only if you need the time stamp level of detail. Use TODAY for date level calculations.

  • Remember that all basic calculations are passed through to the underlying data—even literal calculations like label strings. If you need to create labels (for example, for column headers) and your data is very large, create a simple text/Excel file data source with just one record to hold them so they don't add overhead on the big data source.

Thanks for your feedback!Your feedback has been successfully submitted. Thank you!