Configure your Data Set
Note: Starting in version 2020.4.1, you can now create and edit flows in Tableau Server and Tableau Online. The content in this topic applies to all platforms, unless specifically noted. For more information about authoring flows on the web, see Tableau Prep on the Web.
To determine how much of your data set to include in the flow, you can configure your data set. When you connect to your data or drag tables into the Flow pane, an Input step is automatically added to the flow. This is always the first step in your flow. You can right-click the Input step to rename or remove it. If you're connected to an Excel or text file, you can also refresh the data from the Input step. For more information about how to refresh data from the input step, see Add More Data in the Input Step(Link opens in a new window).
In the Input step, you can see the details about your data set. Here you can search for fields, see sample values, and perform actions to reduce the size of your data set, such as selecting the fields to include, selecting the data sample to work with or applying filters to selected fields or rows. You can also configure the field properties by changing the field name or configuring the text settings for text files.
You can also change the data type in the Input step for data connections that support it. These include Microsoft Excel, text and PDF files, and data from Box, Dropbox, Google Drive, and OneDrive. For other data sources you can change the data type in a cleaning step. For more information, see Review the data types assigned to your data.
Note: Field values that include square brackets are automatically converted to parentheses.
Connect to a custom SQL query
If your database supports using custom SQL, you will see Custom SQL displayed near the bottom of the Connections pane. Double-click Custom SQL to open the Custom SQL tab where you can enter queries to preselect data and use source-specific operations. After the query retrieves the data set, you can select the fields to include, apply filters or change the data type before adding the data to your flow.
For more information about using custom SQL, see Use Custom SQL to connect to data.
Only some cleaning operations are available in an Input step. You can make any of the following changes in the Input field list. Your changes are tracked in the Changes pane and annotations are added to the left of the Input step in the Flow pane and in the Input field list.
Filter: Click Filter Values in the toolbar then enter your filter criteria in the calculation editor.
Rename Field: In the Field Name field, double-click or Ctrl-click (MacOS) on the field name and enter a new field name.
Change Data Type: Click on the data type for the field and select a new data type from the menu.
Remove Field: Untick the box next to the fields that you don't want to include in your flow.
Select the fields to include in the flow
The Input pane shows you a list of fields in your data set. You can use the Search field to find fields in the list, and then use the boxes to tick the fields to include or exclude. To include or exclude all fields from the flow, toggle the tick box at the top left of the grid.
To filter a field, do the following:
In the toolbar click Filter Values.
Enter your filter criteria in the calculation editor.
The calculation filter type is the only filter option available in the Input step. Other filter options are available in other step types. For more information, see Filter Your Data(Link opens in a new window)
Change field names
To change the name of a field, in the Field Name column, select the name and then type the new name in the field. An annotation is added in the field grid and in the flow pane to the left of the Input step. Your changes are also tracked in the Changes pane.
Change data types
To change the data type for a field, do the following:
Click the data type for the field.
Select the new data type from the menu.
You can also change the data type for fields in other step types in the flow or assign data roles to help validate your field values. For more information about changing your data type or using data roles, see Review the data types assigned to your data(Link opens in a new window) and Use Data Roles to Validate your Data(Link opens in a new window).
Configure field properties
When you work with text files, you see a Settings tab where you can edit your connection and configure text properties, such as the field separator for text files. You can also edit the file connection in the Connections pane or configure incremental refresh settings. For more information about setting up incremental refresh for your flow, see Refresh Flow Data Using Incremental Refresh.
When you work with text or Excel files, you can correct data types that have been inferred incorrectly before you even start your flow. Data types can always be changed in subsequent steps in the Profile pane after you start your flow.
Configure text settings in text files
To change the settings used to parse text files, select from the following options:
First line contains header (default): Select this option to use the first row as the field labels.
Generate field names automatically: Select this option if you want Tableau Prep Builder to auto-generate the field headers. The field naming convention follows the same model as Tableau Desktop. For example F1, F2 and so on.
Field Separator: Select a character from the list to use to separate the columns. Select Other to enter a custom character.
Text Qualifier: Select the character that encloses the values in the file.
Character Set: Select the character set that describes the text file encoding.
Locale: Select the locale to use to parse the file. This setting indicates which decimal and thousand separator to use.
To maintain peak performance, Tableau Prep limits the data included in the flow to a representative sample of your data set by default. The data sample is determined by calculating the optimal number of rows based on the total number of fields in the data set and the data types for those fields. Tableau Prep then retrieves the top number of rows for the calculated amount as quickly as possible.
The resulting data sample may include all the rows you need, or it may not, depending on how the sample was calculated and returned. If you don't see the data that you expect, you can change the data sample settings to run the query again.
When creating or editing flows on the web, limits are applied to the amount of data you can include in a flow and the options available to change your data sample are slightly different than when working in Tableau Prep Builder. For more information, see Sample data and processing limits.
Note: If your data is sampled, a Sampled badge shows in the Profile pane and persists for every step you add. Any changes you make apply to the sample you are working with in the flow. All changes apply to your entire data set when you run the flow.
To change your data sample settings, select an Input step, then on the Data Sample tab select from the following options:
Default sample amount (default): Tableau Prep calculates the total number of rows to return.
Use all data: (Tableau Prep Builder only) Retrieve all rows in your data set regardless of size. This can impact performance or cause Tableau Prep Builder to time out.
Note: To maintain performance, even if you select this setting, a data sample limit of 1 million rows is applied to Aggregate and Union step types and a data sample limit of 3 million rows is applied to Join and Pivot step types.
Fixed number of rows: Select the number of rows to return from the data set. The recommended number of rows is 1 million or less. Setting the number of rows to more than 1 million can impact performance.
- In Web authoring: The maximum number of rows that a user can select when using large data sets is configured by the administrator. As a user, you can select the number of rows up to that limit.
Quick select (default): The database returns the number of rows requested as quickly as possible. This might be the first N number of rows or the rows that the database had cached in memory from a previous query.
Random sample: The database returns the number of rows requested but looks at every row in the data set and returns a representative sample from all of the rows. This option may impact performance when the data is first retrieved.