How Analysis Works for Multi-table Data Sources that Use Relationships

Using a data source that has multiple, related tables affects how analysis works in Tableau. Because multiple, related tables have independent domains and retain their native level of detail, when you drag fields into the view:

  • Data is queried at its natural level of detail.
  • Only data that is relevant to a viz is queried.
  • The relationships between the tables affect the results of the query. The flow of building a viz can vary depending on how tables of fields are related to each other in the data model, or if they aren't related directly.

For an overview of data source enhancements and an introduction to using relationships, see this 5-minute video.

Also see Don’t Be Scared of Relationships and these Tableau blog posts:

Note: You can still create single-table data sources in Tableau. You can build a logical table using a combination of joins, unions, custom SQL, and so on. The behavior of single-table analysis in Tableau has not changed. Analysis over a single logical table that contains a mixture of dimensions and measures works just as in Tableau pre-2020.2.

Considerations for analysis

Validating relationships

You have several options for validating your data model for analysis. As you create the model for your data source, we recommend going to a sheet, selecting that data source, and then building a viz to explore record counts, expected data, unmatched values, nulls, or repeated measure values. Try working with fields across different tables to ensure everything looks how you expect it to.

What to look for:

  • Are your relationships in the data model using the correct matching fields for their tables?
  • Would adding multiple matching field pairs make the relationship more accurate?
  • What are the results of dragging different dimensions and measures into the view?
  • Are you seeing the expected number of rows?
  • If you changed any of the Performance Options settings from the default settings, are the values that you are seeing in the viz what you would expect? If they aren't, you might want to check the settings, or reset to the default.

Options for validating relationships and the data model:

  • Every table includes a count of its records, as a field named TableName(Count), at the level of detail for that table. To see the count for a table, drag its Count field into the view. To see the count for all tables, select the Count field for each table in the Data pane, and then click the Text Table in Show Me.
  • Click View Data in the Data pane to see the number of rows and data per table. Also, before you start creating relationships, viewing the data from the data source before or during analysis can be useful to give you a sense of the scope of each table. For more information, see View Underlying Data.
  • Drag dimensions onto rows to see the Number of Rows in the status bar. To see unmatched values, click the Analysis menu, and then select Table Layout > Show Empty Rows or Show Empty Columns. You can also drag different measures to the view, such as <YourTable>(Count) from one of the tables represented in your viz. This ensures that you will see all values of the dimensions from that table.

Tip: If you would like to see the queries that are being generated for relationships, you can use the Performance Recorder in Tableau Desktop.

  1. Click the Help menu, and then select Settings and Performance > Start Performance Recording.
  2. Drag fields into the view to build your viz.
  3. Click the Help menu, and then select Settings and Performance > Stop Performance Recording. The Performance Recording workbook will automatically open.
  4. In the Performance Summary dashboard, under Events Sorted By Time, click an "Executing Query" bar and view the query below.

Another more advanced option is to use the Tableau Log Viewer(Link opens in a new window) on GitHub. You can filter on a specific keyword using end-protocol.query. For more information, start with the Tableau Log Viewer wiki page(Link opens in a new window) in GitHub.

Dimension-only visualizations

When using a multi-table data source with related tables: If you build a dimension-only viz, Tableau uses inner joins and you won't see the full unmatched domain.

To see partial combinations of dimension values, you can:

  • Use Show Empty Rows/Columns to see all of the possible rows. Click the Analysis menu, and then select Table Layout > Show Empty Rows or Show Empty Columns. Be aware that this setting will also trigger densification for Date and Numeric Bin fields, which may be unwanted.
  • Add a measure to the view, such as <YourTable>(Count) from one of the tables represented in your viz. This ensures that you will see all values of the dimensions from that table.

For more information, also see Unmatched values behavior for dimensions that might surprise you and Troubleshooting multi-table analysis.

When to use LOD calculations and expressions

Because Tableau understands the level of detail (LOD) of your input tables, you shouldn't need to use LOD calcs to remove unwanted duplication due to joins.

You may still want to use LOD calcs to:

  • Handle unwanted duplication in your source tables.
  • Compute multi-level aggregations (e.g. an average over a sum)
  • To do cohort analysis (e.g. to compute the first order date for each customer)

If the dimensionality of an LOD calc includes fields from a single table, that LOD calculation will appear in its owning table in the Data pane.

Examples of multi-table analysis

The following set of examples show how data might be queried across multiple, related tables. This multi-table data source contains a short list of actors' appearances in different movie roles.

A row in the Appearances table means that an Actor played a particular Character in a specific Movie. In this data set, an Actor can have zero or more Appearances.

Learn more about how relationships work in these Tableau blog posts:

When you connected to data in Tableau before 2020.2, your data source could consist of a single table, or multiple tables that were joined or unioned together into a single, denormalized table. Starting with Tableau 2020.2, Tableau recognizes and preserves normalized data for multi-table data sources, where the table data remains separate, and each table maintains its native level of detail. The following examples show how analysis differs between single table and multi-table data sources.

This example shows three tables of movie data: Appearances, Actors, and Movies.

These tables can be joined together, joining the Appearances table with the Actors table on Actor = Actor and the Appearances table with the Movies table on Movie = Movie. If the joins are full outer joins so no rows are lost, the final output looks like this. Note that fields that are used in the join clauses appear twice.

Data that is brought together into a single table in this way is called denormalized or flattened.

This joined data is flattened data. Each row consists of an actor’s appearance as a character in a movie (so John Rhys-Davies has two rows for The Return of the King because he played two characters); therefore the granularity of the data is at the level of a character in a movie. Information that is relevant in multiple rows is duplicated. The Return of the King’s premiere date is present twice because there are two characters from that movie in the data set. John Rhys-Davies’ height is listed five times because there are 5 unique combinations of character/movie for him as an actor.

This flattened data therefore has some characteristics to be aware of. For example, if you wanted to plot actor height by the average gross of their movies, you might assume we could bring out Height to columns and Gross to Rows, then take the average of the Gross. But if you do that the default view doesn’t seem right. Here John Rhys-Davies is supposed to be 925cm tall, which is over 30 feet!

This is because the default aggregation is SUM. There are 5 rows in the data for him, so we get his true height of 185cm five times. You could fix this by changing the aggregation on Height, say to average or minimum. This effectively returns the value for only one row (since they are all the same).

When you change the aggregation the heights are much more realistic. But now you need to pay attention to average gross. Remember that Tableau considers all five rows for John Rhys-Davies’ height. When you think about the average gross for the movies he was in, it shouldn’t be the average of five rows, but rather the average of three movies. You don't want to count the gross for Return of the King twice just because he played two characters in it. But is that what is happening?

Doing some quick math, the Lord of the Rings movies should have an average of (869 + 923 + 1119)/3, or $970.3. However, the value in the scatterplot is $990.6. The current average is from the five rows (869 + 923 + 923 + 1119 + 1119)/5.

This isn’t as easy to fix as the height issue with changing the aggregation. You need to use a Level of Detail (LOD) expression to alter which level of detail Tableau looks at, going from the default—Appearance—up to the level of Movie. You could create a calculation for LOD Gross as {FIXED [Movie] : MIN([Gross (USD millions)])} and then take the average of that new LOD Gross field for our view.

The LOD expression can be read as “for each movie, return its minimum Gross”. This removes the duplication problem because the gross is always returned per movie, even though the view is constructed with movie and actor.

Now the numbers are correct. John Rhys-Davies is 185cm tall, and the average gross for his movies in this data set is 970.3. You would need to understand how data was being replicated and how Tableau aggregated the data to display it before you could ensure the correct values were returned.

Normalized data in multiple tables

Creating relationships between logical tables may seem similar to creating joins, but instead of flattening the data into a single table, with all the replications that can entail, Tableau stays aware of the relationships between the tables. Information is brought out of each table at the appropriate level of detail and related to other data.

On the Data Source page, you won't see the “full” grid view of the flattened table. It doesn’t exist. Tableau keeps all three tables as they are and merely establishes the relationships, pulling together the required data as needed by the view.

To create the same scatterplot, drag Height and Gross into the view, and set Gross to average. That’s it! Tableau looks at the way each table’s data is related to data from other related tables and infers how the height should be displayed (by actor) and how the average gross should be computed (by movie).

Example 2: Dimension from a single table

If the dimensions in your visualization come from a single table, Tableau queries just the one table and shows results for the entire domain. You can add measures and still see the entire domain.

For example, using the Movie Appearances data source introduced above, adding the Actor field to a visualization results in the following visualization:

Because the only dimension in the viz came from the Actors table, Tableau will run a query against just the Actors table. All actors that appear in the Actors table are displayed in the viz, regardless of whether they have any Appearances.

Bringing the Appearance Actor field into the view as a measure, and then applying a COUNT aggregation creates a view that shows the number of appearances by actors. Notice that Sigourney Weaver does not have any appearances, but her name is still in the view.

Example 3: Dimensions from multiple tables

If the dimensions in the view are from multiple tables, Tableau finds the table that relates all the dimensions and shows the domain from that table. As a result, some dimension values that you saw in Example 1 change.

For example, dragging a field from the Movies table into the viz changes the query. Because the Movies and Actors tables are related by the Appearances table, the query returns only the Actor/Movie pairs that exist in the Appearances table.

Because Sigourney Weaver does not have any Appearances in this data set (and, thus, is not related to any of the Movies in the data set), the viz of Actor/Movie pairs does not show her:

Example 4: Measures that can’t be split by a dimension

If a measure can't be split by a dimension, Tableau replicates the measure across that dimension.

The next visualization shows the total amount grossed by Movie. Since both fields come from the Movies table, Tableau will query only the Movies table.

The Movies table includes the already aggregated Gross receipts for each movie, as shown in the following visualization (the gross for Infinity War was not available at the time this data set was created and is shown as zero).

If you add Actor to this visualization, Tableau knows that it cannot break down the Movie Gross receipts by Actor, since this more granular information is not available in the data model. Instead, Tableau shows the Movie Gross receipts for each movie, replicated across actors.

Removing the Movie dimension from the previous visualization queries the sum of the Movie Gross measure for each Actor. The result is the aggregated Movie Gross for every movie the actor appeared in.

In this case there isn’t a hierarchical relationship between the dimension, Actor, and the measure, Movie Gross—multiple actors may have appeared in the same movie. For example, Benedict Cumberbatch and Chris Hemsworth both appeared in Dr. Strange. In this case, Tableau includes the Movie Gross for Dr. Strange in both actors’ totals.

Because the same Movie Gross value is included in multiple actors’ totals, Tableau won’t directly sum over these values.

However, when totals are shown for this visualization, note that Tableau correctly computes the total Movie Gross without including duplicate movies.

 

Troubleshooting multi-table analysis

The following situations are possible with multi-table, related tables. This table describes known scenarios and high-level descriptions for how to troubleshoot your analysis.

Analysis Situation   Description
Inner join for dimension-only vizzes   You may not immediately see all of the values you were expecting when you add multiple dimensions to the same view. Or, you might notice that adding a new dimension to the viz from a different table causes some values to disappear from the viz.

Tableau will use queries that preserve the combination of values that actually exist in the data. This means that you will see rows that would be produced by an inner join of the tables providing dimensions to the viz.

If you want to see partial combinations of the dimension values, you can either turn on Show Empty Rows/Columns to see all of the possible rows, or you can add a measure, such as <MyTable>(Count), from one of the tables represented in the view to ensure you see all values of the dimensions from that table.
Constants in calculations   In a multi-table datasource, constant values behave as though they come from their own table with a single row. If you aggregate a constant value, it will behave as though the aggregate was on a single row. Sum(10) will always equal 10. Avg(10) will also always equal 10. Count(10) will always equal 1.

To ensure backwards compatibility, constant values on a single-logical table data source will behave as though the constant value is replicated for each value in the table.

Constants in row level calculations do not change the row level of detail of the calculation. The calculation [Sales] + 10 will behave as though it came from the same table as the [Sales] field.
Forcing outer join   Tableau ensures that all measure values are represented in the viz (along with all combinations of dimension values that actually occur in the data) so if you want to ensure you are seeing all of the possible values in your data including "unmatched nulls" you can do so by bringing measures into the view from each table in the worksheet.
I'm not seeing the measure values I would expect from a cross-table calculation   The domain of a calculation is the inner join of its inputs. If there are not matching values from all of the inputs of the measure calculation they will not be included in the measure calculation.

Consider using an LOD calculation to move the measure values to the same object before creating the row-level calculation.
Calculation to switch between fields from different logical tables gives unexpected results   If you have a calculation that switches between row level fields, either using a case statement, if statement, or a function like "IFNULL", you may see unexpected results, because this calculation is being computed for every row, where row is the inner join between the inputs of the calculation.

A better approach is to switch between Aggregated values, rather than trying to switch within the row level calculation. This will also produce better performance in single table scenarios.

Alternately, this is only an issue for calculations that span between tables, so using LOD calculations to bring all of the fields to the same table will work as well.

Don't do this:
SUM( IF [Parameter] == "Foo" THEN [Field 1] ELSE [Field 2] END )

Do this:
IF [Parameter] == "Foo" THEN SUM([Field 1]) ELSE SUM([Field 2]) END
Unexpected unmatched nulls   You may see measure values associated with a null dimension value that you didn't expect. This can possibly be a sign that the relationships in the data source were incorrectly configured. It may also indicate that there are actually unmatched values from the table that contains the measure, that have no corresponding row in the dimension table.

In the past, this data might have been lost if the wrong join type was selected. When you use relationships, these unmatched values are preserved. If you don't want to see the unmatched values, you can exclude them using a filter.
Incorrect aggregate values   Are you using relationships or joins? With relationships, aggregates are computed correctly by default. When using joins, you may need to write LOD calcs to deduplicate values.

Have you incorrectly set Performance Options on relationships? Try resetting the Performance Options to the default and see if that produces the correct aggregations.
Dimensions replicate measure values rather than partitioning them.

Dimension filters don't subset measures.
  Check that the fields used to define relationships are correct.
Tableau is generating too many queries or queries with many left joins   Check the logs or performance recording to see how many queries are generated and how many left joins are used. With the new data modeling capabilities, Tableau generates queries with left joins and/or additional queries to make sure that unmatched measure values are always included in the visualization. If you don't need to see the unmatched values, use filters to remove the unmatched (NULL) values from your visualization. This should result in fewer queries.

If you know that your data doesn't have any unmatched values, you can set the Referential Integrity setting for each relationship to "All Values Match" in Performance Options. This will also result in fewer queries.

You can also reduce the complexity of your visualization to reduce the number of queries generated. Removing measures and hiding Filter controls are key ways to simplify queries multi-table, related data.
Queries have many subqueries   Check the logs or performance recording to see the complexity of the queries Tableau generates.

Tableau automatically generates subqueries to deduplicate data, when necessary, to generate correct aggregates. This is similar to queries generated by LOD calculations.

If you know the relationships between the logical tables in your data have many-to-one or one-to-one cardinality, then you can set this cardinality information in the relationship Performance Options. This will allow Tableau to eliminate unnecessary subqueries since it will know that no duplication can happen.
I was using joins to filter data   In 2020.2, Tableau will work hard to recover unmatched values. Sometimes that means it will use a left join where you might have specified an inner join to intentionally filter out data.

If you filter out the unmatched values that are brought in by this join, Tableau will be able to optimize the query back to an inner join.

Depending on your specific scenario, it might make sense to model this inner join as a physical join within a logical table. This is especially powerful if a table containing measures is being used to filter a dimension table, as it will not introduce additional measure replication.
Thanks for your feedback!