Connect to Data
To use Tableau Prep Builder to clean and prepare your data, start a new flow by connecting to your data, just like in Tableau Desktop. You can also open an existing flow and pick up where you left off.
You can see and access your most recent flows right on the Start page, so it's easy to find your work in progress. After you connect to your data, use the different options in the Input step to identify the data that you want to work with in your flow.
Tableau Prep Builder supports connections to popular types of data as well as Tableau data extracts (.tde or .hyper) and starting in version 2019.3.1, published data sources. Connectors that are supported in Tableau Desktop might not yet be supported in your version of Tableau Prep Builder. But new connectors are added with each new version, so check the Connections pane to see if your connector is available.
Note: If you open a flow in a version where the connector isn't supported, the flow may open but might have errors or won't run unless the data connections are removed.
To check whether you can connect to your data, open Tableau Prep Builder and click the Add connection button to see if a connector for your data is listed in the left pane under Connect.
Note: Some connectors might require you to download and install a driver before you can connect to your data. See the page on the Tableau website to get driver download links and installation instructions.
Start a new flow
Open Tableau Prep Builder and click the Add connection button.
From the list of connectors, select the file type or server that hosts your data. If prompted, enter the information needed to sign in and access your data.
From the Connections pane, do one of the following:
If you connected to a file, double-click or drag a table to the Flow pane to start your flow.
For single tables, Tableau Prep Builder automatically creates an Input step for you in the Flowpane when you add data to your flow.
If you connected to a database, select a database or schema, and then double-click or drag a table to the Flowpane to start your flow.
Note: If you connect to multiple files or database tables from a single data source, you can use a wildcard search and union the files or tables in the input step. For more information, see Union files and database tables in the Input step.
Open an existing flow
To open an existing flow, on the Start page do one of the following :
Under Recent Flows, select a flow.
Click Open a Flow to navigate to your flow file and open it.
Click File > New or File > Open to open a new workspace. You can open multiple Tableau Prep Builder workspaces using these option if you want to work on multiple flows at the same time. In Tableau Prep Builder version 2019.3.1 and earlier, if you select File > Open, Tableau Prep Builder replaces your current open flow with the new flow you select.
If you publish data sources to Tableau Server or Tableau Online from Tableau Desktop or as output from your flow, you can connect to these published data sources and use them as an input data source for your flow.
Note: When you publish a flow that includes a published data source as an input, the publisher is assigned as the default flow owner. When the flow runs, it uses the flow owner for the Run As account. For more information about the Run As account, see Run As Service Account. Only the Site or Server Administrator can change the flow owner in Tableau Server or Tableau Online.
When using this feature the following requirements apply:
- Connecting to published data sources that are available on Tableau Server version 2019.3 are fully supported. All features may not be available if connecting to published data sources on earlier version of Tableau Server .
- You must be assigned a role of Explorer or higher in the Tableau Server site you are signed into to connect to published data sources. For more information about site roles, see Set User's Site Roles in the Tableau Server help.
- Published data sources that include multi-dimensional (cube) data or includes user filters are not currently supported.
- Multi-server connections aren't supported. You must use the same server or site connection to do the following:
- Connect to the published data source.
- Publish flow output to Tableau Server or Tableau Online.
- Schedule the flow to run on Tableau Server or Tableau Online.
To connect to a published data source, do the following:
Open Tableau Prep Builder and click the Add connection button.
From the list of connectors, select Tableau Server.
Sign in to connect to your server or site.
Select your data source or use the search option to find your data source and drag it to the flow pane to start your flow. The Tableau Server tab in the Input pane shows details about the published data source.
You can connect to cloud data sources in Tableau Prep Builder just like Tableau Desktop, but if you plan to publish flows that connect to cloud data sources and run them on a schedule in your server, you'll need to configure your credentials in Tableau Server or Tableau Online.
Set up your credentials in the Settings tab in the My Account Settings page and connect to your cloud connector input using these same credentials. When publishing the flow, on the Publish dialog, click Edit to edit the connection, then in the Authentication drop-down, select Embed <your credentials>.
If you don't have saved credentials set up and select Prompt user in the Authentication drop-down, after you publish the flow you must edit the connection and enter your credentials in the Connections tab in Tableau Server or Tableau Online or the flow will fail when run.
In Tableau Prep Builder version 2019.4.1, the following cloud connectors were added:
- Google Drive
For more information about how to connect to your data using these connectors, see the connector-specific help topic in the Tableau Desktop help.
Connect to Google BigQuery (version 2019.3.1 and later)
Tableau Prep Builder supports connecting to data using Google BigQuery just like Tableau Desktop. You must configure Oauth credentials to enable Tableau Prep Builder to communicate with Google BigQuery. If you plan to publish flows to Tableau Server or Tableau Online, Oauth connections must also be configured for those applications.
- Set up Oauth for Google - Configuring Oauth connections for Tableau Server.
- Oauth Connections - Configure Oauth connections for Tableau Online.
Note: Tableau Prep Builder doesn't currently support using Google BigQuery customization attributes.
The credentials that you use to connect to Google BigQuery in your Input step must match the credentials that are set up in the Settings tab in the My Account Settings page for Google BigQuery in Tableau Server or Tableau Online.
If you select different credentials or no credentials in your authentication setting when publishing your flow, the flow will fail with an authentication error until you edit the connection for the flow in Tableau Server or Tableau Online.
To edit the credentials do the following:
- In Tableau Server or Tableau Online, on the Connections tab, on the Google BigQuery connection, click More actions .
- Select Edit Connection.
- Select the saved credentials that are set up in the Settings tab in the My Account Settings page.
For more information about setting and managing your credentials, see the following topics:
Manage Your Account Settings in the Tableau Desktop and Web Authoring help.
Publish a flowfor information about setting authentication options when publishing a flow.
View and resolve errors for information about resolving connection errors in Tableau Server or Tableau Online.
Connect to SAP HANA (version 2019.2.1 and later)
Tableau Prep Builder supports connecting to data using SAP HANA just like Tableau Desktop but with a few differences. Connect to the database using the same procedure you would use in Tableau Desktop. For more information see SAP HANA. After you connect and search for your table, drag the table to the canvas to begin building your flow.
Prompting for variables and parameters when opening a flow isn't supported in Tableau Prep Builder. Instead, in the Input pane, click the Variables and Parameters tab and select the variables and operands you want to use, then select from a list of preset values or enter custom values to query your database and return the values you need.
Note: Starting in Tableau Prep Builder version 2019.2.2 you can use Initial SQL to query your connection and if you have multiple values for a variable, you can select the value you need from a drop-down list.
You can also add additional variables. Click the plus button in the Variables section and select a variable and operand, then enter a custom value.
Note: This connector isn't supported in Tableau Prep Conductor version 2019.1. If you create a flow that uses this connector, you can publish the flow but you won't be able to run the flow using the scheduling features in Tableau Server. Instead you can run the flow using the command line interface to keep your data fresh. For more information about running flows from the command line see Refresh flow output files from the command line.
If you need to connect to data sources that aren't listed in the Connections pane, you can connect to any data source using the Other Databases (ODBC) connector that supports the SQL standard and implements the ODBC API. Connecting to data using the Other Databases (ODBC) connector works similarly to how you might use it in Tableau Desktop, however there are a few differences:
You can only connect using the DSN (data source name ) option.
To publish and run your flow in Tableau Server, the server must be configured using a matching DSN.
Note: Running flows from the command line (Tableau Prep Builder) that include the Other Databases (ODBC) connector isn't currently supported.
There is a single connection experience for both Windows and MacOS. Prompting for connection attributes for ODBC drivers (Windows) isn't supported.
Only 64-bit drivers are supported by Tableau Prep Builder.
Before you connect
To connect to your data using the Other Databases (ODBC) connector, you must install the database driver and set up and configure your DSN (data source name). To publish and run flows to Tableau Server, the server must also be configured with a matching DSN.
Important: Tableau Prep Builder only supports 64-bit drivers. If you have a 32-bit driver already set up and configured, you may need to uninstall it and then install the 64-bit version if the driver doesn't allow both versions to be installed at the same time.
Create a DSN using either the ODBC Data Source Administrator (64-bit) (Windows) or the using an ODBC Manager utility (MacOS).
If you don't have the utility installed on your Mac, you can download one from (www.odbcmanager.net for example) or you can manually edit the odbc.ini file.
In the ODBC Data Source Administrator (64-bit) (Windows) or the ODBC Manager utility (MacOS), add a new data source then select the driver for the data source then click Finish.
In the ODBC Driver Setup dialog, enter the configuration information such as server name, port, user name and password. Click Test (if your dialog has that option) to verify that your connection is set up correctly, then save your configuration.
Note: Tableau Prep Builder doesn't support prompting for connection attributes so you must set this information when configuring the DNS.
This example shows the configuration dialog for a MySQL Connector.
Connect using Other Databases (ODBC)
Open Tableau Prep Builder and click the Add connection button.
From the list of connectors, select Other Databases (ODBC).
In the Other Databases (ODBC) dialog, select a DSN from the drop-down list and enter the user name and password. Then click Sign In.
From the Connections pane, select your database from the drop-down list.
You can specify an Initial SQL command when connecting to databases that support it, that will run when a connection is made to the database. For example when connecting to Amazon Redshift, you can enter a SQL statement to apply a filter when connecting to the database just like adding filters in the Input step. The SQL command will apply before data is sampled and loaded into Tableau Prep Builder.
To refresh your data and run the Initial SQL command again do one of the following:
•Change the Initial SQL command and refresh the Input step be re-establishing the connection.
• Run the flow. The initial SQL command is run before processing all of the data.
•Schedule the flow to run on Tableau Server. The initial SQL is run every time that the flow is run, as part of the data loading experience.
- In the Connections pane, select a connector in the list that supports Initial SQL.
- Click the Show Initial SQL link to expand the dialog and enter your SQL statements.
Important: Parameters are not yet supported for Initial SQL in Tableau Prep Builder. If you try and enter the parameters like you would in Tableau Desktop, you will receive errors.
Connect to Tableau data extracts
When you connect to a Tableau data extract, Tableau Prep Builder unpackages the extract and hyper expands, using a lot of temp space as it applies your flow operations to the resulting raw data.
This means you may need more RAM and disk space to accommodate a file that size. For example, an extract file with 18 columns and 1.2 million rows that is 360MB (8.5 GB uncompressed) may need up to 32GB RAM, 16-core, and 500GB of disk space available to support the file when it is unzipped.
When working with Microsoft Excel files, you can use Data Interpreter to detect sub-tables in your data as well as remove extraneous information to help prepare your data for analysis. When you turn on Data Interpreter, it detects these sub-tables and lists them as new tables in the Tables section of the Connections pane.
You can then drag them into the Flow pane. If you are using Tableau Prep Builder version 2018.1.2 or later, you can select the Wildcard union option in the Multiple Files tab to include all found sub-tables in your flow. For more information about using Wildcard union in the Input step see Union files and database tables in the Input step.
If you turn Data Interpreter off, these tables are removed from the Connectionspane. If these tables are already used in the flow, this will result in flow errors from the missing data.
Note: Currently, Data Interpreter only detects sub-tables in your Excel spreadsheets and doesn't support specifying the starting row for text files and spreadsheets.
The example below shows the results of using Data Interpreter on an Excel spreadsheet in the Connectionspane. Data Interpreter detected two additional sub-tables.
|Before Data Interpreter||After Data Interpreter|
To use Data Interpreter, complete the following steps:
Select Connect to Data then select Microsoft Excel.
Select your file and click Open.
Select the Use Data Interpreter check box.
Drag the new table to the Flow pane to include it in your flow. To remove the old table, right-click the Input step for the old table and select Remove.