Clean and Shape Data

Tableau Prep Builder provides various cleaning operations that you can use out of the box to clean and shape your data. Cleaning up dirty data makes it easier to combine and analyze your data or makes it easier for others to understand your data when sharing your data sets.

You can also clean your data using a pivot step or a script step to apply R or Python scripts to your flow. For more information, see Pivot Your Data or Use R and Python scripts in your flow.

About cleaning operations

You clean data by applying cleaning operations such as filtering, adding, renaming, splitting, grouping, or removing fields. In prior releases, cleaning operations could only be performed in the Clean step type. In Tableau Prep Builder version 2018.2.1 and later, you could also perform cleaning operations in the data grid in a cleaning step.

In Tableau Prep Builder version 2018.3.1 and later, you can perform cleaning operations in most step types in your flow. You can apply limited cleaning operations in the Input step and can't apply cleaning operations in the output step. For more information about applying cleaning operations in the Input step, see Apply cleaning operations in an input step.

Available cleaning operations

The following table shows which cleaning operations are available in each step type:

  Input Clean Aggregate Pivot Join Union Output
Filter X X X X X X  
Group and Replace   X   X X X  
Clean   X   X X X  
Split Values   X   X X X  
Rename Field X X X X X X  
Duplicate Field   X   X X X  
Keep Only Field X X X X X X  
Remove Field X X X X X X  
Create Calculated Field   X   X X X  
Edit Value   X   X X X  
Change Data Type X X X X X X  

As you make changes to your data, annotations are added to the corresponding step in the Flow pane and an entry is added in the Changes pane to track your actions. If you make changes in the Input step, the annotation shows to the left of the step in the Flow pane and is also shown in the Input profile in the field list.

The order that you apply your changes matters. Changes made in Aggregate, Pivot, Join and Union step types are performed either before or after those reshaping actions, depending on where the field is when you make the change. Where the change was made is shown in the Changes pane for the step.

The following example shows a field change from a single table in a Join step. The change is performed prior to the join action to give the corrected results.

Order of operations

The following table shows where the cleaning action is performed in Aggregate, Pivot, Join and Union step types depending on where the field is in the step.

Action Step Type: Aggregate Aggregate Pivot Pivot Join Join Union Union
  Field Location: Grouped fields Aggregated fields Not in pivot Created from pivot Included in one table* Included in both tables* Mismatched fields Combined fields
Filter   Before Aggregation After Aggregation Before Pivot After Pivot Before Join After Join Before Union After Union
Group and Replace   NA NA Before Pivot After Pivot Before Join After Join Before Union After Union
Clean   NA NA Before Pivot After Pivot Before Join After Join Before Union After Union
Split Values   NA NA Before Pivot After Pivot Before Join After Join Before Union After Union
Rename Field   Before Aggregation After Aggregation Before Pivot After Pivot Before Join After Join Before Union After Union
Duplicate Field   NA NA Before Pivot After Pivot Before Join After Join Before Union After Union
Keep Only Field   After Aggregation After Aggregation Before Pivot After Pivot Before Join After Join Before Union After Union
Remove Field   Removes from Aggregation Removes from Aggregation Before Pivot After Pivot Before Join After Join Before Union After Union
Create Calculated Field   NA NA Before Pivot After Pivot After Join After Join Before Union After Union
Edit Value   NA NA Before Pivot After Pivot Before Join After Join Before Union After Union
Change Data Type   Before Aggregation After Aggregation Before Pivot After Pivot Before Join Before Join Before Union After Union

Note: For joins, if the field is a calculated field that was created using a field from one table, the change is applied before the join. If the field is created with fields from both tables, the change is applied after the join.

Apply cleaning operations

To apply cleaning operations to fields, use the toolbar options or click More options on the field profile card, data grid or Results pane to open the menu.

In Aggregate, Pivot, Join and Union step types, the More options menu is available on the profile cards in the Results pane and corresponding data grid. If you perform the same cleaning operations or actions over and over throughout your flow, you can copy and paste your steps, actions or even fields. For more information see Copy steps, actions and fields.

Profile pane toolbar Drop-down menu

Select your view

You can perform cleaning operations outside of the profile or results pane in the data grid or in the list view. Use the view toolbar (version 2019.3.2 and later) to change your view, then click More options on a field to open the cleaning menu.

  • Show profile pane : This is the default view. Select this button to go back to the Profile pane or Results pane view.

  • Show data grid : Collapse the profile or results pane to expand and show only the data grid. This view provide a more detailed view of your data and can be useful when you need to work with specific field values. After you select this option, this view state persists across all steps in your flow but you can change it at any time.

    Note: Not all cleaning operations are available in the data grid. For example if you want to edit a value in-line, you must do this in the Profile pane.

  • Show list view (version 2019.3.2 and later): Convert the profile pane or results pane into a list. After you select this option, this view state persists across all steps in your flow but you can change it at any time.

    You can select multiple rows and click the X to remove them or use the More options menu to apply operations to selected fields. If you assign a data role to the field, or select Filter, Group and Replace, Clean, or Split Values, you will be returned to the Profile or Results view to complete those actions.  All other options can be performed in the list view.

Version 2019.3.1 and earlier

You can also apply cleaning operations in the data grid. Just click the Hide profile pane button to collapse the Profile pane then click More options on a field to open the menu.

Apply cleaning operations

To apply cleaning operations to a field, do the following:

Note: You can perform cleaning operations in a list view beginning in version 2019.3.2.

  1. In the Profile pane, data grid, Results pane or list view, select the field you want to make changes to.

  2. From either the toolbar or More options menu for the field , select from the following options:

    • Filter or Filter Values: Select from one of the filter options, right-click or Ctrl-click (MacOS) a field value to keep or exclude values. You can also use the Selected Values filter to pick and choose the values to filter, included values not in your flow sample. For more information about filter options, see Filter Your Data.

    • Group and Replace: Manually select values or use automatic grouping. You can also multi-select values in the Profile card and right-click Ctrl-click (MacOS) to group or ungroup values or edit the group value. For more information about using Group and Replace, see Automatically map values to a standard value using fuzzy match .

    • Clean: Select from a list of quick cleaning operations to apply to all values in the field.

    • Split Values: Split values automatically based on a common separator or use custom split to specify how you want to split field values.

      Automatic split and custom split work the same as they do in Tableau Desktop. For more information, see Split a Field into Multiple Fields in the Tableau Desktop and Web Authoring Help.

    • Rename Field

    • Duplicate Field (version 2019.2.3 and later) Create a copy of your field and values.

    • Keep Only Field(version 2019.2.2 and later)

    • Create Calculated Field

    • Remove (Remove Field in previous versions)

      Note: Starting in version 2019.3.1, you can use the Publish as Data Role option to create custom data roles that you can then apply to your fields to validate the field values when cleaning data. For more information about this option, see Create custom data roles (version 2019.3.1 and later).

  3. To edit a value, right-click or Ctrl-click (MacOS) one or more values and select Edit Value then enter a new value. You can also select Replace with Null to replace the values with a Null value or double-click in a single field to edit it directly. For more information about editing field values see Edit field values.

  4. Review the results of these operations in the Profile pane, Summary panes or data grid.

View your changes

The different types of cleaning operations are represented by icons over the steps in your flow. If more than four types of operations are applied to a step, an ellipsis displays over the step. Hover over these icons to view annotations showing applied operations and the order in which they are performed.

Starting in Tableau Prep Builder version 2019.1.3 and later, you can click on an annotation on the change icon on a step in the Flow pane or on a profile card in the Profile or Results pane and the change and field it impacts will be highlighted in the Changes pane and the Profile or Results pane.

You can also select a step and then expand the Changes pane to view the details for each change, edit or remove your changes, drag changes up or down to change the order in which they're applied and in Tableau Prep Builder version 2019.1.1, you can add a description to provide context to other users. For more information about adding descriptions to your changes, see Add descriptions to flow steps and cleaning actions

Cleaning annotation Changes pane

When viewing changes in an Aggregation, Pivot, Join, or Union step, the order that the change is applied shows either before or after the reshaping action. The order of these changes is applied by the system and cannot be changed. You can edit and remove the change.

Copy steps, actions and fields

When cleaning your data you often perform the same cleaning operations or actions over and over throughout your flow. To help make cleaning and shaping your data more efficient, you can copy and paste these operations or actions throughout your flow so you can perform a cleaning operation or action once, then reuse it where you need it. You can even duplicate fields to experiment with different cleaning operations.

Copy and paste steps

Copy one or more steps to use them in another area of the same flow. This option is not available for Input steps that include a wildcard union.

  1. In the Flow pane, select one or more steps in the flow .

  2. Right-click or Ctrl-click (MacOS) and select Copy.

  3. Right-click or Ctrl-click (MacOS) in any whitespace in the canvas and click Paste.

  4. Drag and drop the steps to where you want to place them in the flow. Depending on the steps you copied and where you want to add them you may need to remove flow lines or move steps around. For example to connect a step to the copied steps, remove the existing flow line if there is one, then drag the existing step to the new step and drop on Add.

    For more information about organizing your flow, see Reorganize the layout of your flow (version 2019.2.2 and later).

Copy and paste cleaning operations

You can copy and paste cleaning operations in the same flow to reuse your actions using one of the following options:

  • Copy an operation from the Changes pane in one step and paste it in the Changes pane for the same step or another step to apply that same operation in that step.
  • Drag and drop an operation from the Changes pane and drop it to other fields in the Profile pane for that step to apply that operation to multiple fields. This option is not available for operations that impact multiple fields, such as calculated fields.
  1. In the Changes pane select the change you want to copy.
  2. Right-click or Ctrl-click (MacOS) on the change item, then select Copy from the menu.

  3. In the Changes pane where you want to past the change right-click or Ctrl-click (MacOS) and select Paste. Select the change and click on Edit to make any adjustments as needed.

To drag and drop a change to other fields in the step do the following:

  1. In the Changes pane select the change you want to copy.
  2. Drag the change over the field where you want to apply it and drop it. Repeat this action as needed.

Copy fields (version 2019.2.3 and later)

If you wanted to experiment with your cleaning operations on a field but don't want to change the original data, you can copy your fields .

  1. In the profile pane, data grid, results pane, or list view, select the field you want to copy.

  2. From the More options menu, select Duplicate Field.

A new field is created with the same name and a modifier. For example, "Ship Date -1".

Create reusable flow steps (version 2019.3.2 and later)

If you commonly perform the same actions over and over again with your data and you want to apply these same steps in other flows, you can select one or more flow steps and their associated actions or the entire flow and save it locally to a file on your computer or publish it to Tableau Server or Tableau Online to share with others. When the flow steps are published to your server, a Saved Steps tag is automatically added so you can easily search and find them when adding them to your flows.

Create reusable steps

  1. Select one or more steps.

  2. Right-click on a selected step and select Save Steps as Flow.

  3. Select Save to File to save the flow locally or Publish to Server to publish the flow to Tableau Server or Tableau Online.

  4. If you publish the flow to Tableau Server or Tableau Online, sign into your server if needed, then complete the fields in the Publish Flow dialog then click Publish.

Insert reusable steps in a flow

  1. Open a flow.

  2. In the flow pane, in the white area of the canvas, right-click or Ctrl-click (MacOS) and click Insert Flow or click Edit > Insert Flow from the top menu.

  3. In the Add Flow dialog, select from flows saved to either your local file or your server, then click Add. The list of flows is automatically filtered to show flows tagged with Saved Steps. To insert other flows, change the Flow Type to All Flows.

  4. The flow is added to the flow pane. move the steps where you want them in the flow and fix any errors.

Merge fields

If you have fields that contain the same values but are named differently, you can easily merge them into a single field to combine them by dragging one field on top of the other. When you merge the fields, the target field becomes the primary field and the field name of the target field persists. The field that you merge to the target field is removed.

Example:

Wildcard union results in 3 fields with the same values Merge 3 fields into 1

When you merge fields, Tableau Prep Builder keeps all of the fields from the target field and replaces any nulls in that field with values from the source fields that you merge with the target field. The source fields are removed.

Example

Name Contact_Phone Business_Phone Cell_Phone Home_Phone
Bob 123-4567 123-4567 null null
Sally null null 456-7890 789-0123
Fred null null null 567-8901
Emma null 234-5678 345-6789 null

If you merge the Business _Phone, Cell_Phone and Home_Phone fields with the Contact_phone field, the other fields are removed and results in the following:

Name Contact_Phone
Bob 123-4567
Sally 456-7890
Fred 567-8901
Emma 234-5678

To merge fields, do one of the following:

  • Drag and drop one field onto another. A Drop to merge fields indicator displays.

  • Select multiple fields and right-click within the selection to open the context menu, and then click Merge Fields.

  • Select multiple fields, and then click Merge Fields on the toolbar.

For information about how to fix mismatched fields as a result of a union, see Fix fields that don’t match.

Apply cleaning operations using recommendations (version 2018.3.2 and later)

Sometimes it can be hard to identify which cleaning operation you need to use to fix problems in your data. Tableau Prep Builder can analyze your data and recommend cleaning operations that you can apply automatically to quickly fix problems in your data fields or help to identify problems so you can fix them. This feature is available in all step types except Input, Output and Join step types.

Note: If you don't want to use this feature, you can turn it off. From the top menu, go to Help > Settings and Performance. Then click on Enable Recommendations to clear the check mark next to the setting.

Recommendation types include:

  • Data roles

  • Group and Replace (also applies to fields with data roles starting in version 2019.2.3)
  • Filter

  • Replace values with Null values

  • Remove fields

  • Split (version 2019.1.1 and later)

    Note: This option works specifically with data in fixed-width type text files. To use the split recommendation with this file type, after you connect to the data source, in the Input step, in the Text Settings tab, select a Field Separator character that is not used in the data so the data loads as a single field.

  • Trim spaces

Apply recommendations

  1. Do one of the following:

    • Click the light bulb icon in the top right corner of the profile card.
    • From the toolbar, click the Recommendations drop-down arrow to view all recommendations for your data set and select a recommendation from the list.

    This option only appears when recommended changes are identified by Tableau Prep Builder.

  2. To apply the recommendation, hover on the Recommendations card and then click Apply.

    The change is automatically applied and an entry is added to the Changes pane. To remove the change, click Undo in the top menu or hover over the change in the Changes pane and click the X to remove it.

  3. If Tableau Prep Builder identifies further recommendations as a result of the change, the light bulb icon remains on the Profile card until no further recommendations are found.

    Repeat the steps above to apply any additional changes or ignore the suggested change and use the other cleaning tools in Tableau Prep Builder to address the data problems.

Edit field values

Multiple variations of the same value can prevent you from accurately summarizing your data. You can quickly and easily correct these variations using the following options in Tableau Prep Builder.

Note: Any edits that you make to the values must be compatible with the field data type.

Edit a single value

  1. In the Profile card, click the value you want to edit, and enter the new value. A group icon shows next to the value.

    Alternatively, right-click a value and click Edit Value. The change is recorded in the Changes pane on the left side of the screen.

  2. View the results in the Profile pane, and data grid.

Edit multiple values

You have several options to edit multiple values at once. For example, use quick cleaning operations to remove punctuation for all values in a field, manually group values using multi-select, automatically group values together using fuzzy-match algorithms that find similar values or select multiple values and replace them with Null.

Note: When you map multiple values to a single value, the original field shows a group icon next to the value, showing you which values are grouped together.

Edit multiple values using quick cleaning operations

This option applies only to text fields.

  1. In the Profile pane, Results pane or data grid, select the field you want to edit.

  2. Click More options, select Clean, and then select one of the following options:

    • Make Uppercase: Change all values to uppercase text.

    • Make Lowercase: Change all values to lowercase text.

    • Remove Letters: Remove all letters and leave only other characters.

    • Remove Numbers: Remove all numbers and leave letters and other characters.

    • Remove Punctuation: Remove all punctuation.

    • Trim Spaces: Remove leading and trailing spaces.

    You can stack operations to apply multiple cleaning operations to the fields. For example first select Clean > Remove Numbers and then select Clean > Remove Punctuation to remove all numbers and punctuation from the field values.

  3. To undo your changes, click the Undo arrow at the top of the Flow pane, or remove the change from the change list.

Group and edit multiple values inline

  1. In the Profile card, select the field you want to edit.

  2. Press Ctrl or Shift+click or Command or Shift+click (MacOS), and select the values that you want to group.

  3. Right-click, and select Group from the context menu. The value in the selection that you right-click becomes the default name for the new group but you can edit this in-line.

  4. To edit the group name, select the grouped field and edit the value or right-click or Ctrl+click (Mac) on the grouped field and select Edit Value from the context menu.

  5. To ungroup the grouped field values, right-click on the grouped field and select Ungroup from the context menu.

Replace one or more values with Null

If you have data rows that you want to include in your analysis but you want to exclude certain field values you can change them to a Null value.

  1. In the Profile card, press Ctrl or Shift+click or Command or Shift+click (on Mac), and select the values that you want to change

  2. Right-click or Ctrl+click (Mac), and select Repace with Null from the menu. The values are changed to Null and the group icon shows next to the value.

Manually map multiple values to a standard value

Use Group and Replace to map the value of a field from one value to another or manually select multiple values to group them. You can even add new values to set up mapping relationships to organize your data.

For example, let’s say you have three values in a field: My Company, My Company Incorporated, and My Company Inc. All these values represent the same company, My Company. You can use Group and Replace to map the values My Company Incorporated and My Company Inc to My Company, so that all three values appear as My Company in the field.

Map multiple values to a single selected field

  1. In the Profile pane or Results pane, select the field you want to edit.

  2. Click More options and select Group and Replace > Manual Selection from the menu.

  3. In the left pane of the Group and Replace editor, select the field value that you want to use as the grouping value. This value now shows at the top of the right pane.

  4. In the lower section of the right pane in the Group and Replace editor, select the values you want to add to the group.

    To remove values from the group, in the upper section of the right pane in the Group and Replace editor, clear the check box next to the values.

Create a group by selecting multiple values

  1. In the Profile pane or Results pane, select the field you want to edit.

  2. Click More optionsand select Group and Replace > Manual Selection from the menu.

  3. In the left pane of the Group and Replace editor, select multiple values that you want to group.

  4. In the right pane of the Group and Replace editor, click Group Values.

    A new group is created using the last selected value as the group name. To edit the group name, select the grouped field and edit the value or right-click or Ctrl+click (Mac) on the grouped field and select Edit Value from the menu.

    Add and identify values that aren't in the data set

    If you want to map values in your data set to a new value that doesn't exist, you can add it using Group and Replace. To easily identify any values that are not in the data set, these values are marked with a red dot next to the value name in the Group and Replace editor.

    For example in the image below, Wyoming and Nevada aren’t in the data set.

    Some reasons why a value might not be in the data set include the following:

    • You just added the new value manually.

    • The value is no longer in the data.

    • The value is in the data, but isn’t in the sampled data set.

    To add a new value:

    1. In the Profile pane or Results pane, select the field you want to edit.

    2. Click More options and select Group and Replace > Manual Selection from the context menu.

    3. In the left pane of the Group and Replace editor, click the plus to add a new value.

    4. Type a new value in the field and press Enter to add it.

    5. In the right pane, select the values that you want to map to the new value.

    6. (Optional) To add additional new values to your mapped value, click the plus button in the right pane in the Group and Replace editor.

Automatically map values to a standard value using fuzzy match

To search for and automatically group similar values, use one of the fuzzy match algorithms. Field values are grouped under the value that appears most frequently. Review the grouped values and add or remove values in the group as needed.

If you use data roles to validate your field values, you can use the Group and Replace option to match invalid values with valid ones. For more information, see Group similar values by data role

Choose one of the following options to group values:

  • Pronunciation: Find and group values that sound alike. This option uses the Metaphone 3 algorithm that indexes words by their pronunciation and is most suitable for English words. This type of algorithm is used by many popular spell checkers. This option isn't available for data roles.

  • Common Characters: Find and group values that have letters or numbers in common. This option uses the ngram fingerprint algorithm that indexes words by their unique characters after removing punctuation, duplicates, and whitespace. This algorithm works for any supported language. This option isn't available for data roles.

    For example, this algorithm would match names that are represented as "John Smith" and "Smith, John" because they both generate the key "hijmnost". Since this algorithm doesn't consider pronunciation, the value "Tom Jhinois" would have the same key "hijmnost" and would also be included in the group.

  • Spelling: Find and group text values that are spelled alike. This option uses the Levenshtein distance algorithm to compute an edit distance between two text values using a fixed default threshold. It then groups them together when the edit distance is less than the threshold value. This algorithm works for any supported language.

    Starting in Tableau Prep Builder version 2019.2.3 this option is available to use after a data role is applied. In that case, it matches the invalid values to the closest valid value using the edit distance. If the standard value isn't in your data set sample, Tableau Prep Builder adds it automatically and marks the value as not in the original data set.

  • Pronunciation +Spelling: (version 2019.1.4 and later) If you assign a data role to your fields, you can use that data role to match and group values with the standard value defined by your data role. This option then matches invalid values to the most similar valid value based on spelling and pronunciation. If the standard value isn't in your data set sample, Tableau Prep Builder adds it automatically and marks the value as not in the original data set. This option is most suitable for English words.

    For more information see Clean and Shape Data.

    Note: In Tableau Prep Builder version 2019.1.4 and 2019.2.1 this option was labeled Data Role Matches.

Group similar values using fuzzy match

  1. In the Profile pane or Results pane, select the field you want to edit.

  2. Click More optionsand select Group and Replace then select one of these options:

    • Pronunciation

    • Common Characters

    • Spelling

    Tableau Prep Builder finds and groups values that match and replaces them with the value that occurs most frequently in the group.

  3. Review the groupings and manually add or remove values or edit them as needed. Then click Done.

Adjust your results when grouping field values

If you group similar values by Spelling or Pronunciation, you can change your results by using the slider on the field to adjust how strict the grouping parameters are.

Depending on how you set the slider, you can have more control over the number of values included in a group and the number of groups that get created. By default, Tableau Prep Builder detects the optimal grouping setting and shows the slider in that position.

When you change the threshold, Tableau Prep Builder analyzes a sample of the values to determine the new grouping. The groups generated from the setting are saved and recorded in the Changes pane, but the threshold setting isn't saved. The next time the Group and Replace editor is opened, either from editing your existing change or making a new change, the threshold slider is shown in the default position, enabling you to make any adjustments based on your current data set.

  1. In the Profile pane or Results pane, select the field you want to edit.

  2. Click More optionsand select Group and Replace then select one of these options:

    • Pronunciation

    • Spelling

    Tableau Prep Builder finds and groups values that match and replaces them with the value that occurs most frequently in the group.

  3. In the left pane of the Group and Replace editor, drag the slider to one of the 5 threshold levels to change your results.

    To set a stricter threshold, move the slider to the left. This results in fewer matches and creates less groups. To set a looser threshold, move the slider to the right. This results in more matches and creates more groups.

  4. Click Done to save your changes.
Thanks for your feedback! There was an error submitting your feedback. Try again or send us a message.