Tips for Working with Your Data
Data can be organized in various ways. To take advantage of Tableau Desktop, Tableau recommends that you connect to data that is formatted for analysis. Specifically, data that is:
-
as granular as possible rather than aggregated (such as daily weather data rather than monthly averages)
-
organized like a database table (rather than a column-oriented table such as a crosstab)
-
stripped of extraneous information (anything that's not the data and its headers)
When data is structured for analysis, it's much easier to ask and answer questions. Tableau can aggregate raw data to the desired level, rather than being restricted by the aggregations already present in the data. Groups and hierarchies can be created as needed, and calculations can be performed in the flow of analysis.
Tableau Desktop has basic cleaning options and the Data Interpreter. Tableau Prep may be necessary for more complex formatting issues.
The following Tableau Desktop-specific sections highlight and provide suggestions for resolving some common formatting or issues that can make analyzing your data difficult.
Pivot data in crosstab format
When data is formatted as in crosstab format, the table is column oriented. In a column oriented table, the variables are stored as column headers. However, Tableau Desktop is optimized for row oriented data. In a row-oriented table, the variables are stored in the row values.
For example, suppose you have column-oriented table, which shows math, science, and history scores for grade school students.
Tableau Desktop is optimized to connect to row-oriented tables, where math, science, and history values are organized under a column called "Subject" and the scores for each student are organized under a column called "Score." You can pivot the columns in the rows, by manually editing your Excel data. Alternatively, connect to your Excel data from Desktop and then use the pivot option. For more information about the pivot option, see Pivot Data from Columns to Rows.
Remove pre-aggregated data
Data can often come pre-aggregated. That is, data can contain sums, averages, medians, etc. A common of example of pre-aggregated data comes in the form of subtotals and grand totals. Subtotals and grand totals data are computed from the raw data, but is not part of raw data itself.
For example, suppose you have a table that contains a row of subtotals information.
In this case, pre-aggregated data needs to be removed. To use subtotals and grand totals in your analysis, manually remove this type of information from your table. Then, connect to you Excel data from Desktop and calculate subtotals and totals using the totals option. For more information, see Show Totals in a Visualization. Alternatively, connect to your Excel data from Desktop, turn on Data Interpreter, and then use the totals option. For more information, see Clean Data from Excel, CSV, PDF, and Google Sheets with Data Interpreter.
Remove or exclude introductory text
Excel data that is delivered as a report can contain titles or blocks of introductory text. Because Desktop expects either column headers or row values in the first row of a table, this information can cause problems during your analysis.
For example, suppose you have a table that contains a report title and date.
In this case, the title and date information needs to be removed. To use a title and date for a report, do one of the following:
- Manually remove this information from your Excel data. Then connect to your Excel data from Desktop and add a report title using the title option. For more information, see Format Individual Parts of the View.
- Connect to your Excel data from Desktop, turn on Data Interpreter, and then use the title option. For more information, see Clean Data from Excel, CSV, PDF, and Google Sheets with Data Interpreter.
- If you cannot remove this information from your Excel data, create a named range and connect to the named range from Desktop. For more information, see Excel.
Flatten hierarchical headers to a single row
In general, Tableau Desktop expects only the first row in your Excel data to contain column headers. Data that contains multiple layers of column headers can cause problems during your analysis.
For example, suppose you have a table that contains one major header and multiple sub-headers.
In this case, the hierarchy of headers must be flattened or removed. To do this, you can manually create a new column for each header in the hierarchy directly in your Excel data. Alternatively, connect to your Excel data from Tableau Desktop and then turn on Data Interpreter. Verify that your headers are flattened correctly. For more information about Data Interpreter, see Clean Data from Excel, CSV, PDF, and Google Sheets with Data Interpreter.
Make sure there are no blank cells
If you create new columns for your hierarchical headers, make sure that each cell in the new columns contains values.
While you might repeat the same value for each row, it's important that each row contains the data that associates it with the data that was stored in the hierarchical header. You must manually remove blank cells from your Excel data.
Remove blank rows
Make sure that there are no blank rows in your data. To fix blank rows, you must remove the blank rows from your Excel data.
Add missing headers
Make sure that there are no missing column headers. To fix missing headers, you must manually add the missing headers directly to your Excel data.