Examine Your Data
Note: Starting in version 2020.4.1, you can now create and edit flows in Tableau Server and Tableau Online. The content in this topic applies to all platforms, unless specifically noted. For more information about authoring flows on the web, see Tableau Prep on the Web.
Use the options in this topic to get a good understanding about the composition of your data to better understand changes you need to make and the effect of the operations you include in the flow.
Like Tableau Desktop, Tableau Prep interprets the data in your fields when you drag a connection to the Flow pane and automatically assigns a data type to it. Because different databases can handle data in different ways, Tableau Prep's interpretation might not always be correct.
To change a data type, click the data type icon and select the correct data type from the context menu. You can change string or integer data types to Date or Date & Time, and Tableau Prep will trigger Auto DateParse to change these data types. Like Tableau Desktop, if the change is not successful you will see Null values in the fields instead and you can create a calculation to make the change.
For more information about using DateParse, see (Link opens in a new window) in the Tableau Desktop and Web Authoring Help.
You can change the data type in your Input step after connecting to data from the following data sources:
- Microsoft Excel
- Text files
- PDF files
- Google Drive
For all other data sources, add a cleaning step or other step type to make this change. To see a list of cleaning options available in the different step types, see About cleaning operations.
After you connect to your data, add a table to the flow, and then add a step. You can use the Profile pane to see the current state and structure of your data and spot nulls and outliers.
Number of fields and rows: In the upper-left corner of the Profile pane you can find information that summarizes the number of fields and rows in the data at a particular point in the flow. Tableau Prep rounds to the nearest thousand. In the example below, there are 21 fields and 3000 rows in the data set.
When you hover over the number of fields and rows, you can see the exact number of rows (in this example, 2848).
Data set size: Work with a subset of your data by specifying the number of rows to include in the Data Sample tab in the Inputpane.
Sampled: To enable you to interact directly with your data, Tableau Prep works with a subset of your raw data. The number of rows is determined by the data types and number of fields that are being rendered. String fields take more storage space than integers, so if you have 10 fields of strings in your data set, you might get fewer rows than if you had 10 fields of integers.
A Sampled badge displays next to the size details in the Profile pane to indicate that this is a subset of your data set. You can modify the amount of data that you include in your flow. When creating or editing flows on the web, additional data limits apply. For more information, see Set your data sample size.
Number of unique values: The number next to each field header represents the distinct values that are contained within that field. Tableau Prep rounds to the nearest thousand. In the example below, there are 3,000 distinct values that are represented in the Description field, but if you hover over the number, you can see the exact number of unique values.
By default, Tableau Prep groups numerical, date, and date & time values in a field into buckets. These buckets are also known as bins. The bins ensure that you can see the distribution of values as a whole and quickly identify outliers and null values. The bin size is calculated based on the minimum and maximum values in the field, and null values are always shown at the top of the distribution.
For example, order and ship dates are summarized or "binned" by year. Each bin represents a year from January of the beginning year to January of the following year and labeled accordingly. Because there are sales dates and ship dates that fall in the latter part of 2018 and 2019, a bin is created for the following year for those values.
If a discrete (or categorical) data field contains many rows or has a distribution that is large enough that it can’t be displayed in the field without scrolling, you can see a summarized distribution to the right of the field. You can click and scroll through the distribution to target specific values.
When your data contains numeric or date fields, you can toggle to display the detailed (discrete) version of the values or a summarized (continuous) version of the values. The summarized view shows you the range of values in a field and the frequency with which certain values appear.
This toggle can help you isolate unique values (like the number of “3” records in a field) or the distribution of values (like the sum of all “3” records in a field)
To toggle your view:
In the Profile pane, Results pane or data grid, click the More optionsmenu for a numeric or date field.
In the context menu, select Detail to see the detailed version of the values, or Summary to see the distributed version of the values.
In the Profile pane or Results pane, you can search for fields or values of particular interest to you and use the search results to filter your data.
To search for fields, enter a full or partial search term in the search box on the toolbar.
To search for a value in a field:
Click the Search icon for a field, and enter a value.
To use advanced search options, click the Search options... button.
To use the search results to filter the data, select Keep Only or Exclude.
In the Flow pane, a filter icon appears above affected steps.
Sort options on a profile card let you sort the bins (the count of values represented by the distribution bars) in ascending or descending order or the individual field values in alphabetical order.
If you want to rearrange the order of your fields, in the Profile pane, Results pane or Data grid simply select a profile card or field in the data grid and drag it until you see the black target line appear. Then drop it into place. The Profile pane and data grid are synced so the field will appear in the same order in both places.
Tableau Prep makes it easy to find fields and values in your flow data. Trace where a field originated and where it is used throughout the flow in the flow pane, or click individual values in a profile card or in the data grid to highlight related or identical values.
In Tableau Prep, you can highlight everywhere a field is used in a flow, even where it originated to help you track down missing values or troubleshoot a flow when you aren't seeing the results you expect.
Click on a field in the Profile pane in a cleaning step or in the Results pane in any other step type and the flow pane will highlight the path where that field is used.
Note: This option is not available for Input or Output step types.
You can use highlighting to find related values across fields. When you click a value in the Profile card in the Profile pane or Results pane, all the related values in the other fields are highlighted in blue. The blue color shows the relationship distribution between the value you selected and the values in the other fields.
For example, to highlight related values, in the Profile pane, click a value in a field. The related values in other fields turn blue and the proportion of the bar highlighted in blue represents the degree of association.
When you select a value in the data grid, all identical values are highlighted too. These highlights help you identify patterns or irregularities in your data.