Improve Performance for Cross-Database Joins
Important: This feature temporarily moves data outside of Tableau. Be sure that the database you are connected to is from a trusted source.
Beginning with Tableau Desktop and Tableau Server version 2019.3, you can improve performance when joining data from a single file connection and a single database connection by allowing Tableau to perform the join using the database that you are connected to instead of Hyper. When this option is enabled, Tableau chooses the fastest option (Hyper or the connected database). If Tableau uses the connected database, the data from the file connection is moved into temporary tables in the database and the join is performed there.
This option is only available if the following conditions are met:
- The data source consists of one or more file-based connections and a single SQL-based connection.
- The file is a Microsoft Excel, PDF or Text (.csv, .txt, .tsv or .tab) file type.
- The connected database is one of the following:
- Microsoft SQL Server
- The join type is an inner join.
In web authoring: The Allow users to use web authoring option is enabled.
Site Administrators can also configure the options available for users by setting the Cross-Database Joins options on the Settings > General tab in Tableau Server. For more information about these settings, see Set a Sites Web Authoring Access(Link opens in a new window) in the Tableau Server help.
Overriding feature conditions
Starting in version 2020.3, as an administrator, you can override the file size, join type and single file connection limitations and force Tableau to use the live database connection to perform the join. This enables you to experiment and determine optimal performance configurations. The single database connection requirement still applies.
To enable this option enter the following information from the command line:
Tableau Desktop: Enter the command
Tableau Server: Enter the tsm configuration command
tsm configuration set -k native_api.force_alternative_federation_engine -v true
For more information about setting configuration values in Tableau Server, see tsm configuration set Options(Link opens in a new window) in the Tableau Server help.
Enable the performance option for cross-database joins
In Tableau Desktop: On the Start page, under Connect, connect to a supported file type or supported database type. This step creates the first connection in the Tableau data source.
In web authoring: From the Home or Explore page, click Create> Workbook to start a new workbook and then connect to your data. This step creates the first connection in the Tableau data source.
Select the file or database that you want to connect to, then double-click or drag a table to the canvas.
In the left pane, under Connections, click the Add button ( in web authoring) to add your second connection to the Tableau data source.
The Cross-database join option is displayed.
Note: If you don't see this option, check that you are using only supported data sources and that you have only two data sources (one file and one database type). Otherwise, the Site Administrator may have set the Cross-Database Joins configuration option to Tableau only.
- To change how Tableau performs the join, next to the Cross-database join option, click Edit.
In the Cross-Database Join dialog, select one of the following options, then click OK:
Use Tableau or existing databases. This option allows Tableau to choose the fastest option to perform the join – either Hyper or the database you're connected to.
Use Tableau only. This option is the default and always uses Hyper to perform the join.
The Cross-database join option changes from the default option, Using Tableau (using Hyper), to the new option Using your database, depending on what you choose.
Important: If you select Use Tableau or existing databases, Tableau chooses the fastest option when performing the join. This behaviour is pre-determined by a set of criteria including join types. For instance, Tableau will always choose Hyper for non-inner joins.
If Tableau uses Hyper to perform the join, this process happens in the background and no indicator is shown to identify where the join was performed.
- Add one or more join clauses by selecting a field from one of the available tables used in the data source, a join operator and a field from the added table. Inspect the join clause to make sure it reflects how you want to connect the tables.
Working with multi-connection data sources is just like working with any other data source, with a few caveats, discussed in this section.
Union data from within a connection
To union data, you must use text tables or Excel tables from the same connection. That is, you cannot union tables from different databases. In Tableau Desktop, you can union tables across different Excel workbooks and files in different folders. For more information, see the Union tables using wildcard search (Tableau Desktop).
If you need to union data from different databases, use Tableau Prep(Link opens in a new window).
Collation refers to the rules of a database that determine how string values should be compared and sorted. In most cases, the collation is handled by the database. However, when you work with cross-database joins, you might join columns that have different collations.
For example, suppose your cross-database join used a join key comprised of a case-sensitive column from SQL Server and a case-insensitive column from Oracle. In cases like this, Tableau maps certain collations to others to minimise interpreting values incorrectly.
The following rules are used in cross-database joins:
If a column uses collation standards of the International Components for Unicode (ICU), Tableau uses the collation of the other column.
If all columns use collation standards of the ICU, Tableau uses the collation of the column of the left table.
If no columns use collation standards of the ICU, Tableau uses a binary collation. A binary collation means the locale of the database and data type of the columns determine how string values should be compared and sorted.
When working with Microsoft Excel data if you need to maintain case sensitivity for your data when performing joins, for example Japanese data that is Kana-sensitive, you can enable the Maintain Character Case (Excel) option from the Data menu.
When this option is selected, Tableau maintains the casing of the values from your Excel data source and uniquely identifies values with different casing instead of combining them, resulting in a different number of rows.
For example, you have one worksheet with the value "House" and another worksheet with the values "House", "house" and "houSe". By default, when you join the two tables, Tableau ignores the character casing differences in the second worksheet and joins "House", "house" and "houSe" from the second worksheet with "House" from the first worksheet.
With the Maintain Character Case (Excel) option enabled, when you join your tables, Tableau recognises the character casing differences. The join results show fewer results because the join will only match "House" from the first worksheet with the value "House from the second worksheet and read "house" and "houSe" as unique separate values.
Note: This option is available for all Tableau supported languages and is not dependent on the locale of your operating system. This option is only available for Microsoft Excel data sources.
Calculations and multi-connection data sources
Only a subset of calculations can be used in a multi-connection data source.
In Tableau Desktop: You can use a specific calculation if it is both:
Supported by all the connections in the multi-connection data source
Supported by Tableau extracts.
In web authoring (Tableau Online and Tableau Server): You can use a specific calculation if it is supported by all the connections in the multi-connection data source.
Stored procedures are not available for multi-connection data sources.
Pivot data from within a connection
To pivot data, you must use text columns or Excel columns from the same connection. That is, you cannot include columns from different databases in a pivot.
Make extract files the first connection (Tableau Desktop only)
When connecting to extract files in a multi-connection data source, make sure that the connection to the extract (.tde or .hyper) file is the first connection. This preserves any customisations that might be a part of the extract, including changes to default properties, calculated fields, groups, aliases, etc.
Note: If you need to connect to multiple extract files in your multi-connection data source, only the customisations in the extract in the first connection are preserved.
Extracts of multi-connection data sources that contain connections to file-based data (Tableau Desktop only)
If you're publishing an extract of a multi-connection data source that contains a connection to file-based data such as Excel, selecting the Include external files option puts a copy of the file-based data on the server as part of the data source. In this case, a copy of your file-based data can be downloaded and its contents accessed by other users. If there is sensitive information in the file-based data that you have intentionally excluded from your extract, do not select Include external files when you publish the data source.
For more information about publishing data sources, see Publish a Data Source.
For each connection, Tableau sends independent queries to the databases in the join. The results are stored in a temporary table, in the format of an extract file.
For example, suppose you create connections to two tables, dbo.listings and reviews$. These tables are stored in two different databases, SQL Server and Excel. Tableau queries the database in each connection independently. The database performs the query and applies customisations such as filters and calculations, and Tableau stores the results for each connection in a temporary table. In this example, FQ_Temp_1 is the temporary table for the connection to SQL Server and FQ_Temp_2 is the temporary table for the connection to Excel.
SQL Server table
When you perform a cross-database join, the temporary tables are joined together by Tableau Desktop. These temporary tables are necessary for Tableau to perform cross-database joins.
After the tables have been joined, ‘topn’ filter is applied to limit the number of values shown in the data grid to the first 1,000 rows. This filter is applied to help maintain responsiveness of the data grid and the overall performance of the Data Source page.