Improve Performance for Cross-Database Joins
Important: This feature temporarily moves data outside of Tableau. Be sure that the database you’re connected to is from a trusted source.
When joining data between a single database and a single file, Tableau can improve performance by choosing to perform the join using the database instead of Hyper. This default setting allows Tableau to choose the fastest approach (Hyper or the connected database). If Tableau uses the connected database, the data from the file connection is moved into temporary tables in the database and the join is performed there.
Feature conditions
The option to use the connected database for the join is only available if the following conditions are met:
- The data source consists of one or more file-based connections and a single SQL-based connection.
- The files to be joined must be one of the following file types: Microsoft Excel, PDF, or Text (.csv, .txt, .tsv or .tab).
- The connected database is one of the following:
- Microsoft SQL Server
- Oracle
- PostgreSQL
- Vertica
- Teradata
- In web authoring: The Allow users to use web authoring option is enabled.
Changing the preferred option for cross-database joins
- Connect to the first data source.
- In Tableau Desktop: On the Start page, under Connect, connect to a supported file type or supported database type. This step creates the first connection in the Tableau data source.
- In web authoring: From the Home or Explore page, click Create > Workbook to start a new workbook and then connect to your data. This step creates the first connection in the Tableau data source.
- Select the file or database that you want to connect to, then double-click or drag a table to the canvas.
-
In the left pane, under Connections, click the Add button ( in web authoring) to add your second connection to the Tableau data source.
The Cross-database join option is displayed.
Note: If you don't see this option, check that you're using only supported data source types and that you have at least two data sources (one database and one or more files of supported types).
- To change how Tableau performs the join, next to the Cross-database join option, click Edit.
- In the Cross-Database Join dialog, select one of the following options, then click OK:
- Always perform joins in the database. This option forces Tableau to use the live database to perform the join.
- Let Tableau decide where to join. This option is the default and allows Tableau to choose the fastest option to perform the join - either Hyper or the database you’re connected to.
The Cross-database join option on the Multiple Connections panel will update to reflect your choice.
Important: If you select Let Tableau decide where to join, Tableau chooses the fastest option when performing the join. This behavior is predetermined by a set of criteria including join types. For instance, Tableau always chooses Hyper for non-inner joins.
If Tableau uses Hyper to perform the join, this process happens in the background and no indicator is shown to identify where the join was performed.
- Add one or more join clauses by selecting a field from one data source, a join operator, and a field from the added table. Inspect the join clause to make sure it reflects how you want to connect the tables.
About working with multi-connection data sources
Working with multi-connection data sources is just like working with any other data source, with a few caveats, discussed in this section.
Union data from within a connection
To union data, you must use text tables or Excel tables from the same connection. That is, you can't union tables from different databases. In Tableau Desktop, you can union tables across different Excel workbooks and files in different directories. For more information, see the Union tables using wildcard search (Tableau Desktop).
If you need to union data from different databases, use Tableau Prep(Link opens in a new window).
Collation
Collation refers to the rules of a database that determine how string values should be compared and sorted. Usually, the collation is handled by the database. However, when you work with cross-database joins, you might join columns that have different collations.
For example, suppose your cross-database join used a join key comprised of a case-sensitive column from SQL Server and a case-insensitive column from Oracle. In cases like this, Tableau maps certain collations to others to minimize interpreting values incorrectly.
The following rules are used in cross-database joins:
- If a column uses collation standards of the International Components for Unicode (ICU), Tableau uses the collation of the other column.
- If all columns use collation standards of the ICU, Tableau uses the collation of the column of the left table.
- If no columns use collation standards of the ICU, Tableau uses a binary collation. A binary collation means the locale of the database and data type of the columns determine how string values should be compared and sorted.
Maintain case sensitivity for Excel data
If you need to maintain case sensitivity for your Excel data when performing joins, enable the Maintain Character Case (Excel) option from the Data menu.
When this option is selected, Tableau maintains the casing and uniquely identifies values with different casing instead of combining them, resulting in a different number of rows.
For example, consider one worksheet with "House" and another with "house" and "HOUSE". By default, Tableau ignores the casing and considers all three variations of "house" as the same. With the Maintain Character Case (Excel) option enabled, when you join your tables, Tableau preserves the character casing differences. "House", "house", and "HOUSE" are treated as different values.
Note: This option is available for all Tableau supported languages and isn't dependent on the locale of your operating system. This option is only available for Microsoft Excel data sources.
Calculations and multi-connection data sources
Only a subset of calculations can be used in a multi-connection data source.
- In Tableau Desktop: You can use a specific calculation if it's both:
- Supported by all the connections in the multi-connection data source
- Supported by Tableau extracts.
- In web authoring (Tableau Cloud and Tableau Server): You can use a specific calculation if it's supported by all the connections in the multi-connection data source.
Stored procedures
Stored procedures aren't available for multi-connection data sources.
Pivot data from within a connection
To pivot data, you must use text columns or Excel columns from the same connection. That is, you can't include columns from different databases in a pivot.
Make extract files the first connection (Tableau Desktop only)
When connecting to extract files in a multi-connection data source, make sure that the connection to the extract (.hyper) file is the first connection. This preserves any customizations that might be a part of the extract, including changes to default properties, calculated fields, groups, aliases, and so on.
Note: If you must connect to multiple extract files in your multi-connection data source, only the customizations in the extract in the first connection are preserved.
Extracts of multi-connection data sources that contain connections to file-based data (Tableau Desktop only)
If you're publishing an extract of a multi-connection data source with file-based data such as Excel, selecting the Include external files option copies the file-based data as part of the data source. In this case, a copy of your file-based data can be downloaded and its contents accessed by other users. If there's sensitive information in the file-based data that you’ve intentionally excluded from your extract, don't select Include external files when you publish the data source.
For more information about publishing data sources, see Publish a Data Source.
About queries and cross-database joins
For each connection, Tableau sends independent queries to the databases in the join. The results are stored in a temporary table, in the format of an extract file.
Important: Cross-database joins may move data between databases. Be sure the databases you're joining are trusted sources.
For example, suppose you create connections to two tables, dbo.listings and reviews$. These tables are stored in two different databases, SQL Server and Excel. Tableau queries the database in each connection independently. The database performs the query and applies customizations such as filters and calculations, and Tableau stores the results for each connection in a temporary table. In this example, FQ_Temp_1 is the temporary table for the connection to SQL Server and FQ_Temp_2 is the temporary table for the connection to Excel.
SQL Server table
|
Excel table |
When you perform a cross-database join, the temporary tables are joined by Tableau Desktop. These temporary tables are necessary for Tableau to perform cross-database joins.
After the tables have been joined, a Top N filter is applied to limit the number of values shown in the data grid to the first 1,000 rows. This filter is applied to help maintain responsiveness of the data grid and the overall performance of the Data Source page.
Joined tables