Use Python scripts in your flow

Disclaimer: This topic includes information about a third-party product. Please note that while we make every effort to keep references to third-party content accurate, the information we provide here might change without notice as python changes. For the most up-to-date information, please consult the python documentation and support.

Python is a widely used high-level programming language for general-purpose programming. By sending Python commands to an external service through Tableau Prep Builder, you can easily extend your data preparation options by performing actions like adding row numbers, ranking fields, filling down fields and performing other cleaning operations that you might otherwise do using calculated fields.

To include Python scripts in your flow, you need to configure a connection between Tableau Prep Builder and a TabPy server. Then you can use Python scripts to apply supported functions to data from your flow using a pandas dataframe. When you add a script step to your flow and specify the configuration details, file, and function that you want to use, data is securely passed to the TabPy server, the expressions in the script are applied, and the results are returned as a table that you can clean or output as needed.

Note: To run flows that include script steps on Tableau Server (version 2019.3 and later), Tableau Server must also have a connection to a TabPy server.

Prerequisites

To include Python scripts in your flow, complete the following setup.

  1. Download and install Python. Download and install the most current version of Python for Linux, Mac or Windows.

  2. Download and install the Tableau Python server (TabPy). Follow the installation and configuration instructions for installing TabPy. Tableau Prep Builder uses TabPy to pass data from your flow through TabPy as the input, applies your script, then returns the results back to the flow.

  3. Install Pandas. Run pip3 install pandas. ou must use a pandas data frame in your scripts to integrate with Tableau Prep Builder.

Configure the Tableau Python (TabPy) server for Tableau Server

If you intend to publish flows that include script steps and run them on Tableau Server you will need to configure a connection between your TabPy server and Tableau Server version 2019.3 or later. Running flows with script steps in Tableau Online isn't currently supported.

  1. Open the TSM command line/shell .
  2. Enter the following commands to set the host address, port values and connect timeout:

    tsm security maestro-tabpy-ssl enable --connection-type {maestro-tabpy-secure/maestro-tabpy} --tabpy-host <TabPy IP address or host name> --tabpy-port <TabPy port> --tabpy-username <TabPy username> --tabpy-password <TabPy password> --tabpy-connect-timeout-ms <TabPy connect timeout>

    • Select {maestro-tabpy-secure} to enable a secure connection or {maestro-tabpy} to enable an unsecured connection.
    • If you select {maestro-tabpy-secure}, specify the certificate file -cf<certificate file path> in the command line.
    • Specify the --tabpy-connect-timeout-ms <TabPy connect timeout> in milliseconds. For example --tabpy-connect-timeout-ms 900000.
  3. To disable the TabPy connection enter the following command

    tsm security maestro-tabpy-ssl disable

Create your python script

When you create your script, include a function that specifies a pandas (pd.DataFrame) as an argument of the function. This will call your data from Tableau Prep Builder. You will also need to return the results in a pandas (pd.DataFrame) using supported data types.

For example to add encoding to a set of fields in a flow, you could write the following script:

def encode(input):     
  le = preprocessing.LabelEncoder()
  Return pd.Data.Frame({
    'Opportunity Number' : input['Opportunity Number'],
    'Supplies Subgroup Encoded' : le.fit_transform(input['Supplies Subgroup']),
    'Region Encoded' : le.fit_transform(input['Region']),
    'Route To Market Encoded' : le.fit_transform(input['Route To Market']),
    'Opportunity Result Encoded' : le.fit_transform(input['Opportunity Result']),
    'Competitor Type Encoded' : le.fit_transform(input['Competitor Type']),
    'Supplies Group Encoded' : le.fit_transform(input['Supplies Group']),
})

The following data types are supported:

Data type in Tableau Prep Builder Data type in Python
String Standard UTF-8 string
Decimal Double
Int Integer
Bool Boolean
Date String in ISO_DATE format “YYYY-MM-DD” with optional zone offset. For example, “2011-12-03” is a valid date.
DateTime String in ISO_DATE_TIME format “YYYY-MM-DDT:HH:mm:ss” with optional zone offset. For example, “2011-12-03T10:15:30+01:00” is a vslid date.

Note: Date and DateTime must always be returned as a valid string.

If you want to return different fields than what you input, you'll need to include a get_output_schema function in your script that defines the output and data types. Otherwise, the output will use the fields from the input data, which are taken from the step that is prior to the script step in the flow.

Use the following syntax when specifying the data types for your fields in the get_output_schema:

Function in Python Resulting data type
prep_string () String
prep_decimal () Decimal
prep_int () Integer
prep_bool () Boolean
prep_date () Date
prep_datetime () DateTime

The following example shows the get_output_schema function added to the field encoding python script:

def get_output_schema():       
  return pd.DataFrame({
    'Opportunity Number' : prep_int(),
    'Supplies Subgroup Encoded' : prep_int(),
    'Region Encoded' : prep_int(),
    'Route To Market Encoded' : prep_int (),
    'Opportunity Result Encoded' : prep_int (),
    'Competitor Type Encoded' : prep_int()
    'Supplies Group Encoded' : prep_int()
})

Add a script to your flow

Start your TabPy server then complete the following steps:

Note: TabPy requires tornado package version 5.1.1 to run. If you receive the error 'tornado.web' has no attribute 'asynchronous' when trying to start TabPy, from the command line run pip list to check the version of tornado that was installed. If you have a different version installed, download the tornado package version 5.1.1. Then run pip uninstall tornado to uninstall your current version, then run pip install tornado==5.1.1 to install the required version.

  1. Open Tableau Prep Builder and click the Add connection button.

  2. From the list of connectors, select the file type or server that hosts your data. If prompted, enter the information needed to sign in and access your data.

  3. Click the plus icon, and select Add Scirpt from the context menu.

  4. In the Script pane, in the Connection type section, select Tableau Python (TabPy) Server.

  5. Click Connect to Tableau Python (TabPy) Server and enter your connection details.

    • Port 9004 is the default port for TabPy.
    • If the server requires credentials, enter a username and password.
    • If the server uses SSL encryption, select the Require SSL check box, then click the Custom configuration file... link to specify a certificate for the connection.

    Note: Tableau Prep Builder doesn't provide a way to test the connection. If there is a problem with the connection an error message shows.

  6. In the File Name section, click Browse to select your script file.
  7. Enter the Function Name then press Enter to run your script.

Thanks for your feedback! There was an error submitting your feedback. Try again or send us a message.