Aggregate, Join or Union Data
Aggregate, join, or union your data to group or combine data for analysis.
Note: Starting in version 2020.4.1, you can now create and edit flows in Tableau Server and Tableau Cloud. The content in this topic applies to all platforms, unless specifically noted. For more information about authoring flows on the web, see Tableau Prep on the Web in the Tableau Server(Link opens in a new window) and Tableau Cloud(Link opens in a new window) help.
Sometimes you’ll need to adjust the granularity of some data, either to reduce the amount of data produced from the flow, or to align data with other data you might want to join or union together. For example, you might want to aggregate sales data by customer before joining a sales table with a customer table.
If you need to adjust the granularity of your data, use the Aggregate option to create a step to group and aggregate data. Whether data is aggregated or grouped depends on the data type (string, number or date).
In the Flow pane, click the plus icon, and select Aggregate. A new aggregation step displays in the Flow pane and the Profile pane updates to show the aggregate and group profile.
Drag fields from the left pane to the Grouped Fields pane (the fields that make the row) or to the Aggregated Fields pane (the data that will be aggregated and presented at the level of the grouped fields).
You can also:
Drag and drop fields between the two panes.
Search for fields in the list and select only the fields you want to include in your aggregation.
Double-click a field to add it to the left or right pane.
Change the function of the field to automatically add it to the appropriate pane.
Click Add All or Remove All to bulk apply or remove fields.
Apply certain cleaning operations to fields. For more information abut which cleaning options are available, see About cleaning operations(Link opens in a new window).
The following example would show the aggregated sum of profit and quantity, as well as average discount by region and year of sale.
Fields are distributed between the Grouped Fields and Aggregated Fields columns based on their data type. Click the group or aggregation type (for example, AVG or SUM) headings to change the group or aggregation type.
In the data grids below the aggregation and group profile, you can see a sample of the members of the group or aggregation.
Any cleaning operations that are made to the fields are tracked in the Changes pane.
The data that you want to analyse is often made up of a collection of tables that are related by specific fields. Joining is a method for combining the related data on those common fields. The result of combining data using a join is a table that’s typically extended horizontally by adding fields of data.
Joining is an operation you can do anywhere in the flow. Joining early in a flow can help you understand your data sets and expose areas that need attention right away.
Tableau Prep supports the following join types:
|For each row, includes all values from the left table and corresponding matches from the right table. When a value in the left table doesn't have a corresponding match in the right table, you see a null value in the join results.
|For each row, includes values that have matches in both tables.
|For each row, includes all values from the right table and corresponding matches from the left table. When a value in the right table doesn't have a corresponding match in the left table, you see a null value in the join results.
|For each row, includes only values from the left table that don't match any values from the right table. Field values from the right table show as null values in the join results.
|For each row, includes only values from the right table that don't match any values from the left table. Field values from the left table show as null values in the join results.
|For each row, includes all of the values from the right and the left table that don't match.
|For each row, includes all values from both tables. When a value from either table doesn't have a match with the other table, you see a null value in the join results.
To create a join, do the following:
Join two tables using one of the following methods:
- Add at least two tables to the Flow pane, then select and drag the related table to the other table until the Join option displays.
- Click the icon and select Join from the menu, then manually add the other input to the join and add the join clauses.
Note: If you connect to a table that has table relationships defined and includes related fields, you can select Join and select from a list of related tables. Tableau Prep creates the join based on the fields that make up the relationship between the two tables.
For more information about connectors with table relationships, see Join data in the Input step(Link opens in a new window).
A new join step is added to the flow and the profile pane updates to show the join profile.
To review and configure the join, do the following:
Review the Summary of Join Results to see the number of fields included and excluded as a result of the join type and join conditions.
Under Join Type, click in the Venn diagram to specify the type of join you want.
Under Applied Join Clauses, click the plus icon or, on the field chosen for the default join condition, specify or edit the join clause. The fields you selected in the join condition are the common fields between the tables in the join.
You can also click the recommended join clauses shown under Join Clause Recommendations to add the clause to the list of applied join clauses.
Inspect the results of the join
The summary in the join profile shows metadata about the join to help you validate that the join includes the data you expect.
Applied Join Clauses: By default, Tableau Prep defines the first join clause based on common field names in the tables being joined. Add or remove join clauses as needed.
Join Type: By default, when you create a join, Tableau Prep uses an inner join between the tables. Depending on the data that you connect to, you might be able to use left, inner, right, leftOnly, rightOnly, notInner or full joins.
Summary of Join Results: The Summary of Join Results shows you the distribution of values that are included and excluded from the tables in the join.
Click each Included bar to isolate and see the data in the join profile included in the join.
Click each Excluded bar to isolate and see the data in the join profile that are excluded from the join.
Click any combination of the Included and Excluded bars to see a cumulative perspective of the data.
Join Clause Recommendations: Click the plus icon next to the recommended join clause to add it to the Applied Join Clauses list.
Join Clauses pane: In the Join Clauses pane, you can see the values in each field in the join clause. The values that don't meet the criteria for the join clause are displayed in red text.
Join Results pane: If you see values in the Join Results pane that you want to change, you can edit the values in this pane.
Common join issues
If you don't see the results you expect after joining your data, you may need to do some additional cleaning on your field values. The following issues will result in Tableau Prep reading the values as not matching and exclude them from the join:
Different capitalisation: My Sales and my sales
Different spelling: Hawaii and Hawai'i
Mispelling or data entry errors: My Company Health and My Company Heath
Name changes: John Smith and John Smith Jr.
Abbreviations: My Company Limited and My Company Ltd
Extra separators: Honolulu and Honolulu (Hawaii)
Extra spaces: This includes extra space between characters, tabbed spaces or extra leading or trailing spaces
Inconsistent use of periods: Returned, not needed. and Returned, not needed.
The good news is that if your field values have any of these issues, you can fix the field values directly in the Join Clauses or work with excluded values by clicking in the Excluded bars in the Summary of Join Results and use the cleaning operations in the profile card menu.
For more information about the different cleaning options available in the Join step, see About cleaning operations(Link opens in a new window).
Fix mismatched fields and more
You can fix mismatched fields right in the join clause. Double-click or right-click on the value and select Edit Value from the context menu on the field that you want to fix and enter a new value. Your data changes are tracked and added to the Changes pane right in the Join step.
You can also select multiple values to keep, exclude or filter in the Join Clauses panes, or apply other cleaning operations in the Join Results pane. Depending on which fields you change and where they are in the join process, your change is applied either before or after the join to give you the corrected results.
For more information about cleaning fields see Apply cleaning operations(Link opens in a new window).
Union is a method for combining data by appending rows of one table onto another table. For example, you might want to add new transactions in one table to a list of past transactions in another table. Make sure the tables you union have the same number of fields, the same field names, and the fields are the same data type.
Tip: To maximise performance a single union can have a maximum of 10 inputs. If you need to union more than 10 files or tables, try unioning files in the Input step. For more information about this type of union, see Union files and database tables in the Input step(Link opens in a new window).
Similar to a join, you can use the union operation anywhere in the flow.
To create a union, do the following:
After you add at least two tables to the flow pane, select and drag a related table to the other table until you see the Union option. You can also click the icon and select Union from the menu. A new union step is added in the Flow pane, and the Profile pane updates to show the union profile.
Add additional tables to the union by dragging tables toward the unioned tables until you see the Add option.
In the union profile, review the metadata about the union. You can remove tables from the union as well as see details about any mismatched fields.
Inspect the results of the union
After you create a union, inspect the results of the union to validate that the data in the union is what you expect. To validate your unioned data, check the following areas:
Review the union metadata: The union profile shows some metadata about the union. Here you can see the tables that make up the union, the resulting number of fields and any mismatched fields.
Review the colours for each field: Next to each field listed in the Union summary and above each field in the union profile, is a set of colours. The colours correspond to each table in the union.
If all table colours show for that field, then the union performed correctly for that field. A missing table colour indicates that you have mismatched fields.
Mismatched fields are fields that might have similar data but are different in some way. You can see the list of fields that don't match in the Union summary and the tables where they came from. If you want to take a closer look at the data in the fields, tick the Show only mismatched fields box to isolate the mismatched fields in the Union profile.
To fix these field, follow one of the suggestions in the Fix fields that don’t match section below.
When tables in a union don’t match, the union produces extra fields. The extra fields are valid data being excluded from their appropriate context.
To resolve a field mismatch issue, you must merge the mismatched fields together.
There are a number of reasons why fields might not match.
Corresponding fields have different names: If corresponding fields between tables have different names, you can use union recommendations, manually merge fields in the Mismatched Fields list, or rename the field in the union profile to merge the mismatched fields together.
To use union recommendations, do the following:
in the Mismatched Fields list, click on a mismatched field. If a suggested match exists, the matching field is highlighted in yellow.
Suggested matches are based on fields with similar data types and field names.
Hover on the highlighted field and click the plus button to merge the fields.
To manually merge fields in the Mismatched Fields list, do the following:
Select one or more fields in the list.
Right-click or Ctrl-click (MacOS) on a selected field and if the merge is valid, the Merge Fields menu option appears.
If you see No options available when you right-click on the field, this is because the fields are not eligible to merge. For example trying to merge two fields from the same input.
Click Merge Fields to merge the selected fields.
To rename the field in the union profile pane, right-click on the field name and click Rename Field.
Corresponding fields have the same name but are a different type: By default, when the name of corresponding fields match but the data type of the fields don’t, Tableau Prep changes the data type of one of the fields so that they are compatible with each other. If Tableau Prep makes this change, it’s noted at the top of the merged field by the Change Data Type icon.
In some cases, Tableau Prep might not pick the correct data type. If that happens and you want to undo the merge, right-click or Ctrl-click (MacOS) the Change Data Type icon and select Separate Inputs with Different Types.
You can then merge the fields again by first changing the data type of one of the fields and then using the suggestions in Additional merge field options.
Corresponding tables have different number of fields: To union tables, each table in the union must contain the same number of fields. If a union results in extra fields, merge the field into an existing field.
In addition to the methods described in the above section for merging fields you can also use one of the following methods to merge fields. You can merge fields in any step, except for the Output step.
For information about how to merge fields in the same file, see Merge fields.
To merge fields, do one of the following:
Drag and drop one field onto another. A Drop to merge fields indicator displays.
Select multiple fields and right-click within the selection to open the context menu, and then click Merge Fields.
Select multiple fields, and then click Merge Fields on the context-sensitive toolbar.