Use Python scripts in your flow
Disclaimer: This topic includes information about a third-party product. Please note that while we make every effort to keep references to third-party content accurate, the information we provide here might change without notice as python changes. For the most up-to-date information, please consult the python documentation and support.
Python is a widely used high-level programming language for general-purpose programming. By sending Python commands to an external service through Tableau Prep Builder, you can easily extend your data preparation options by performing actions like adding row numbers, ranking fields, filling down fields and performing other cleaning operations that you might otherwise do using calculated fields.
To include Python scripts in your flow, you need to configure a connection between Tableau and a TabPy server. Then you can use Python scripts to apply supported functions to data from your flow using a pandas dataframe. When you add a script step to your flow and specify the configuration details, file and function that you want to use, data is securely passed to the TabPy server, the expressions in the script are applied, and the results are returned as a table that you can clean or output as needed.
You can run flows that include script steps in Tableau Server as long as you have configured a connection to your TabPy server. Running flows with script steps in Tableau Cloud, isn't currently supported. To configure Tableau Server, see Configure the Tableau Python (TabPy) server for Tableau Server.
For information about how to configure sites on Tableau Server with analytics extensions for workbooks, see Configure Connections with Analytics Extensions.
Prerequisites
To include Python scripts in your flow, complete the following setup. Creating or running flows with script steps in Tableau Cloud isn't currently supported.
Download and install Python(Link opens in a new window). Download and install the most current version of Python for Linux, Mac or Windows.
Download and install the Tableau Python server (TabPy(Link opens in a new window)). Follow the installation and configuration instructions for installing TabPy. Tableau Prep Builder uses TabPy to pass data from your flow through TabPy as the input, applies your script, then returns the results back to the flow.
- Install Pandas. Run
pip3 install pandas
. You must use a pandas data frame in your scripts to integrate with Tableau Prep Builder.
Configure the Tableau Python (TabPy) server for Tableau Server
If you plan to publish, create, edit and run flows that include script steps in Tableau Server, you will need to configure a connection between your TabPy server and Tableau Server.
- Version 2019.3 and later: You can run published flows that include script steps in Tableau Server.
- Version 2020.4.1 and later: You can create, edit and run flows that include script steps in Tableau Server.
- Tableau Cloud: Creating or running flows with script steps isn't currently supported.
- Open the TSM command line/shell .
Enter the following commands to set the host address, port values and connect timeout:
tsm security maestro-tabpy-ssl enable --connection-type {maestro-tabpy-secure/maestro-tabpy} --tabpy-host <TabPy IP address or host name> --tabpy-port <TabPy port> --tabpy-username <TabPy username> --tabpy-password <TabPy password> --tabpy-connect-timeout-ms <TabPy connect timeout>
- Select
{maestro-tabpy-secure}
to enable a secure connection or{maestro-tabpy}
to enable an unsecured connection. - If you select
{maestro-tabpy-secure}
, specify the certificate file-cf<certificate file path>
in the command line. - Specify the
--tabpy-connect-timeout-ms <TabPy connect timeout>
in milliseconds. For example--tabpy-connect-timeout-ms 900000
.
- Select
To disable the TabPy connection enter the following command
tsm security maestro-tabpy-ssl disable
Create your python script
When you create your script, include a function that specifies a pandas (pd.DataFrame) as an argument of the function. This will call your data from Tableau Prep Builder. You will also need to return the results in a pandas (pd.DataFrame) using supported data types.
For example to add encoding to a set of fields in a flow, you could write the following script:
def encode(input): le = preprocessing.LabelEncoder() Return pd.DataFrame({ 'Opportunity Number' : input['Opportunity Number'], 'Supplies Subgroup Encoded' : le.fit_transform(input['Supplies Subgroup']), 'Region Encoded' : le.fit_transform(input['Region']), 'Route To Market Encoded' : le.fit_transform(input['Route To Market']), 'Opportunity Result Encoded' : le.fit_transform(input['Opportunity Result']), 'Competitor Type Encoded' : le.fit_transform(input['Competitor Type']), 'Supplies Group Encoded' : le.fit_transform(input['Supplies Group']), })
The following data types are supported:
Data type in Tableau Prep Builder | Data type in Python |
---|---|
String | Standard UTF-8 string |
Decimal | Double |
Int | Integer |
Bool | Boolean |
Date | String in ISO_DATE format “YYYY-MM-DD” with optional zone offset. For example, “2011-12-03” is a valid date. |
DateTime | String in ISO_DATE_TIME format “YYYY-MM-DDT:HH:mm:ss” with optional zone offset. For example, “2011-12-03T10:15:30+01:00” is a vslid date. |
Note: Date and DateTime must always be returned as a valid string.
If you want to return different fields than what you input, you'll need to include a get_output_schema function in your script that defines the output and data types. Otherwise, the output will use the fields from the input data, which are taken from the step that is prior to the script step in the flow.
Use the following syntax when specifying the data types for your fields in the get_output_schema:
Function in Python | Resulting data type |
---|---|
prep_string() | String |
prep_decimal() | Decimal |
prep_int() | Integer |
prep_bool() | Boolean |
prep_date() | Date |
prep_datetime() | DateTime |
Square brackets can be used to access rows from a DataFrame. Single brackets output a Pandas Series, while a double bracket will output a Pandas DataFrame.
def create_key(df): return pd.DataFrame({ "Key": ['12345'] })
The following example shows the get_output_schema function added to the field encoding python script:
def get_output_schema(): return pd.DataFrame({ 'Opportunity Number' : prep_int(), 'Supplies Subgroup Encoded' : prep_int(), 'Region Encoded' : prep_int(), 'Route To Market Encoded' : prep_int(), 'Opportunity Result Encoded' : prep_int(), 'Competitor Type Encoded' : prep_int(), 'Supplies Group Encoded' : prep_int() })
Connect to your Tableau Python (TabPy) server
Important: Starting in Tableau Prep Builder version 2020.3.3, you can configure your server connection once from the top Help menu instead of setting up your connection per flow in the Script step by clicking Connect to Tableau Python (TabPy) Server and entering your connection details. You will need to reconfigure your connection using this new menu for any flows that were created in an older version of Tableau Prep Builder that you open in version 2020.3.3.
- Select Help > Settings and Performance > Manage Analytics Extension Connection.
In the Select an Analytics Extension drop-down list, select Tableau Python (TabPy) Server.
- Enter your credentials:
- Port 9004 is the default port for TabPy.
- If the server requires credentials, enter a username and password.
If the server uses SSL encryption, select the Require SSL tick box, then click the No custom configuration file specified link to select a certificate for the connection. This is your SSL server certificate file.
Note: Tableau Prep Builder doesn't provide a way to test the connection. If there is a problem with the connection an error message shows.
Add a script to your flow
Start your TabPy server then complete the following steps:
Note: TabPy requires tornado package version 5.1.1 to run. If you receive the error 'tornado.web' has no attribute 'asynchronous' when trying to start TabPy, from the command line run pip list
to check the version of tornado that was installed. If you have a different version installed, download the tornado package version 5.1.1(Link opens in a new window). Then run pip uninstall tornado
to uninstall your current version, then run pip install tornado==5.1.1
to install the required version.
Open Tableau Prep Builder and click the Add connection button.
In web authoring, from the Home page, click Create > Flow or from the Explore page, click New > Flow. Then click Connect to Data.
From the list of connectors, select the file type or server that hosts your data. If prompted, enter the information needed to sign in and access your data.
Click the plus icon, and select Add Script from the context menu.
In the Script pane, in the Connection type section, select Tableau Python (TabPy) Server.
- In the File Name section, click Browse to select your script file.
Enter the Function Name then press Enter to run your script.