Use R (Rserve) scripts in your flow
Disclaimer: This topic includes information about a third-party product. Please note that while we make every effort to keep references to third-party content accurate, the information we provide here might change without notice as R and Rserve changes. For the most up-to-date information, please consult the R and Rserve documentation and support.
R is an open source software programming language and a software environment for statistical computing and graphics. To extend the functionality of Tableau Prep Builder, you can create scripts in R to use in your flow that run through an Rserve server to produce output that you can further work with in your flow.
For example, you might want to add statistical modeling data or forecasting data to the data that you already have in your flow using a script in R, then use the power of Tableau Prep Builder to clean the resulting data set for analysis.
To include R scripts in your flow, you need to configure a connection between Tableau Prep Builder and an Rserve server. Then you can use R scripts to apply supported functions to data from your flow using R expressions. After you enter the configuration details and point Tableau Prep Builder to the file and function that you want to use, data is securely passed to the Rserve server, the expressions are applied, and the results are returned as a table (R data.frame) that you can clean or output as needed.
You can run flows that include script steps in Tableau Server as long as you have configured a connection to your Rserve server. Running flows with script steps in Tableau Cloud, isn't currently supported. To configure Tableau Server, see Configure Rserve Server for Tableau Server.
Prerequisites
To include R script steps in your flow, install R and configure a connection to an Rserve server.
Resources
- Download and Install R(Link opens in a new window). Download and install the most current version of R for Linux, Mac, or Windows.
- R Implementation notes(Link opens in a new window) (community post). Install and configure a connection to R and Rserve for Windows.
- Install and configure Rserve(Link opens in a new window): Instructions for general installation and configuration for all platforms.
Rserve for Windows (release notes)(Link opens in a new window): This topic covers limitations when installing Rserve locally on Windows.
Configure Rserve Server for Tableau Server
Use the following instructions to configure a connection between your Rserve server and Tableau Server.
- Version 2019.3 and later: You can run published flows that include script steps in Tableau Server.
- Version 2020.4.1 and later: You can create, edit, and run flows that include script steps in Tableau Server.
- Tableau Cloud: Creating or running flows with script steps isn't currently supported.
- Open the TSM command line.
-
Enter the following commands to set the host address, port values, and connect timeout:
tsm security maestro-rserve-ssl enable --connection-type {maestro-rserve-secure/maestro-rserve} --rserve-host <Rserve IP address or host name> --rserve-port <Rserve port> --rserve-username <Rserve username> --rserve-password <Rserve password> --rserve-connect-timeout-ms <RServe connect timeout>
- Select
{maestro-rserve-secure}
to enable a secure connection or{maestro-rserve}
to enable an unsecured connection. - If you select
{maestro-rserve-secure}
, specify the certificate file-cf<certificate file path>
in the command line. - Specify the
--rserve-connect-timeout-ms <RServe connect timeout>
in milliseconds. For example--rserve-connect-timeout-ms 900000
.
- Select
-
To disable the Rserve connection enter the following command
tsm security maestro-rserve-ssl disable
Additional Rserve configuration (optional)
You can create a file named Rserv.cfg to set default configuration values to customize Rserve and place it in the /etc/Rserve.conf
installation location. To improve stability with the Rserve server and Tableau Prep Builder, you can add additional values to your Rserve configuration. When you launch Rserve you can refer to this file to apply your configuration options. For example:
- Windows:
Rserve(args="--RS-conf C:\\folder\\Rserv.cfg")
- MacOS and Linux:
Rserve(args=" --no-save --RS-conf ~/Documents/Rserv.cfg")
The following example shows some additional options you can include in your Rserve.conf
configuration file:
# If your data includes characters other than ASCII, make it explicit that data should be UTF8 encoded. encoding utf8 # Disable interactive behavior for Rserve or Tableau Prep Builder will stall when trying to run the script as it waits for an input response. interactive no
For information about setting up an Rserve.conf file, see the Advanced Rserve configuration section in the R Implementation notes(Link opens in a new window) (community post).
Create your R script
When you create your script, include a function that specifies a data frame as an argument of the function. This will call your data from Tableau Prep Builder. You will also need to return the results in a data frame using supported data types.
For example:
postal_cluster <- function(df) { out <- kmeans(cbind(df$Latitude, df$Longitude), 3, iter.max=10) return(data.frame(Latitude=df$Latitude, Longitude=df$Longitude, Cluster=out$cluster)) }
The following data types are supported:
Data type in Tableau Prep Builder | Data type in R |
---|---|
String | Standard UTF-8 string |
Decimal | Double |
Int | Integer |
Bool | Logical |
Date | String in ISO_DATE format “YYYY-MM-DD” with optional zone offset. For example, “2011-12-03+01:00” is a valid date. |
DateTime | String in ISO_DATE_TIME format “YYYY-MM-DDT:HH:mm:ss” with optional zone offset. For example, “2011-12-03T10:15:30+01:00” is a vslid date. |
Note: Date and DateTime must always be returned as a valid string. Native Date (DateTime) types in R aren't supported as returned values but can be used in the script.
If you want to return different fields than what you input, you'll need to include a getOutputSchema function in your script that defines the output and data types. Otherwise, the output will use the fields from the input data, which are taken from the step that is prior to the script step in the flow.
Use the following syntax when specifying the data types for your fields in the getOutputSchema:
Function in R | Resulting data type |
---|---|
prep_string () | String |
prep_decimal () | Decimal |
prep_int () | Integer |
prep_bool () | Boolean |
prep_date () | Date |
prep_datetime () | DateTime |
The following example shows the getOutputSchema function for the postal_cluster script:
getOutputSchema <- function() { return (data.frame ( Latitude = prep_decimal (), Longitude = prep_decimal (), Cluster = prep_int ())); }
Connect to your Rserve server
Important: Starting in Tableau Prep Builder version 2020.3.3, you can configure your server connection once from the top Help menu instead of setting up your connection per flow in the Script step by clicking Connect to Rserve Server and entering your connection details. You will need to reconfigure your connection using this new menu for any flows that were created in an older version of Tableau Prep Builder that you open in version 2020.3.3.
- Select Help > Settings and Performance > Manage Analytics Extension Connection.
-
In the Select an Analytics Extension drop-down list, select Rserve.
- Enter your credentials:
- Port 6311 is the default port for plaintext Rserve servers.
- Port 4912 is the default port for SSL-encrypted Rserve servers.
- If the server requires credentials, enter a Username and Password.
If the server uses SSL encryption, select the Require SSL check box, then click the Custom configuration file link to specify a certificate for the connection.
Note: Tableau Prep Builder doesn't provide a way to test the connection. If there is a problem with the connection an error message shows when you try and run the flow.
Add a script to your flow
Start your Rserve server then complete the following steps:
-
Open Tableau Prep Builder click the Add connection button.
In web authoring, from the Home page, click Create > Flow or from the Explore page, click New > Flow. Then click Connect to Data.
-
From the list of connectors, select the file type or server that hosts your data. If prompted, enter the information needed to sign in and access your data.
-
Click the plus icon, and select Add Script from the context menu.
-
In the Script pane, under Connection type , select Rserve.
- In the File Name section, click Browse to select your script file.
-
Enter the Function Name then press Enter to run your script.