Best Practices for Published Data Sources
Publishing data sources to Tableau Cloud or Tableau Server is integral to maintaining a single source for your data. Publishing also enables sharing data among colleagues; including those who don’t use Tableau Desktop, but have permission to edit workbooks in the web editing environment.
Updates to a published data source flow to all connected workbooks, whether the workbooks themselves are published or not.
A Tableau data source consists of the following:
The data connection information that describes what data you want to bring in to Tableau for analysis. When you connect to the data in Tableau Desktop, you can create joins, including joins between tables from different data types. You can rename fields on the Data Source page to be more descriptive for the people who work with your published data source.
An extract, if you decide to create one. Guidelines for when to create an extract are included below, as well as in the additional resources.
Information about how to access or refresh the data. The connection also includes access information. Examples of this type of information include:
The path to an original Excel file.
Embedded credentials or OAuth access tokens for accessing the data directly.
Alternatively, no credentials, so that users are prompted to enter them when they want to access the data (whether it’s to view a workbook that connects to it, or to connect a new workbook to it).
For more information, see Set Credentials for Accessing Your Published Data.
Customization and cleanup that helps you and others use the data source efficiently. When you’re working with your view, you can add calculations, sets, groups, bins, and parameters; define any custom field formatting; hide unused fields; and so on.
All of these refinements become part of the metadata contained in the data source that you publish and maintain.
When you publish a data source, consider these best practices:
Create the connection for the information you want to bring into Tableau and do any customization and cleanup that will help you and others use the data source efficiently.
If appropriate, create an extract of the data you want to publish. For more information, see the following section, When to use an extract.
Develop a data source naming convention.
- After publishing a data source, you can rename it in Tableau Cloud or Tableau Server. To rename a published data source, choose the More actions menu next to the name of your data source. Then, choose Rename and enter the new name. You can also use the Update Data Source REST API to rename a published data source. Be sure to use a well-considered naming convention to help other users of the data deduce which data source to connect to.
- When a published data source is renamed, all workbooks that use that data source will use the new name after the next data source refresh is complete. Like renaming workbooks, renaming a published data source isn’t saved in the revision history of a data source.
- You can add and edit captions for your data source, but changing the caption doesn’t change the name of the underlying published data source. If you edit the underlying published data source name, the caption isn’t updated. But don’t worry—the correct data source is still referenced. You’ll see the updated underlying published data source name in the Data Source tab.
Consider designating the following roles among your Tableau users:
A data steward (or team) who creates and publishes the data sources for the Tableau community, which meet your organization’s data requirements.
A site administrator who manages published content, extract refreshes, and permissions on the server you publish to (Tableau Server or Tableau Cloud).
Central management helps to avoid data source proliferation. Authors who connect to managed data can be confident that the answers they find in it reflect the current state of the business.
Under the following conditions you might be required or choose to publish an extract instead of connecting live.
Publishing data to Tableau Cloud that it cannot reach directly
Tableau Cloud in the cloud cannot reach data sources that you maintain on your local network. Depending on the connection, you might be required to publish an extract and set up a refresh schedule using Tableau Bridge.
Some cloud-hosted data sources always require extracts. These include Google Analytics, Salesforce.com, Oracle, OData, and some ODBC data sources. You can set up refresh schedules for some of these data sources directly on Tableau Cloud; for others you use Tableau Bridge.
Web data connector data sources always require extracts. If you connect to the data source using standard user name and password authentication, you can refresh it using Tableau Bridge. If you connect to the WDC data source using OAuth authentication, you will need to use an alternative method to refresh it.
For more about how Tableau Bridge supports both extract and live connections to data Tableau Cloud cannot reach directly, see Use Tableau Bridge to Expand Data Freshness Options(Link opens in a new window) in the Tableau Cloud Help.
Even if the server supports live connections to your data, an extract might make more sense. For example, if the database is large or the connection slow, you can extract a subset that includes only the pertinent information. The extract can be easier and faster to work with than connecting live.
In cases where you can use a live connection or an extract that you refresh on a schedule, you might want to experiment with both options to see which works best for you.
Enabling functionality the data source does not inherently support
For example, suppose you want to use the Median function with SQL Server data.
To learn more about creating data extracts, see Extract Your Data.
You can publish data sources as standalone resources that workbooks connect to, or you can publish workbooks with the data sources included within them.
When you publish a workbook, if any connection specifies anything other than a Tableau data source published to the same project, the data is published as part of the workbook (sometimes referred to as embedded in the workbook).
When data is embedded in a workbook:
Access to the data source is limited to the workbook in which you published it. Neither you nor other users can connect to that data from another workbook.
You can set up extract refresh schedules as you do for data sources that you publish separately.
Each way of publishing has its advantages. The table below shows a few common points of comparison. It is not a comprehensive list, and these are generalizations. How these and other factors apply to you are specific to your environment.
|Embedded in workbook
Publishing data sources is a step toward centralizing data management. You can create policies geared toward minimizing data source proliferation and helping people find the right data for the work they do.
Each embedded data source has a separate connection to the data.
Each has the potential to show something different than the other at any given time (and data source proliferation is common).
Meant to be shared; becomes available for other Tableau users to connect to.
Data is available only inside the workbook; it is not available for other Tableau Desktop users to connect to.
Without content management and self-service guidelines, seeing a long list of data sources to connect to can be confusing to users who rely on the data to do their work, and is more difficult to manage on the server.
Users create their own connections, and they know exactly what data they’re getting.
Someone who changes a shared data source might be uncertain or unaware of the effects that those changes have on connected workbooks.
Changing the data requires opening the workbook, where you can see the result of the change.
Even if effects of data source changes on connected workbooks are planned, updating those connected workbooks is cumbersome.
Same as above; however, if multiple workbooks use similar data and need to be updated, it might be worth connecting to a published data source instead.
Extracts can be refreshed on a schedule. You set up one refresh schedule for the extract, and all workbooks that connect to it always show the most current data.
Embedded extracts that aren’t refreshed can be useful for showing snapshots in time.
If you want to keep the data fresh, each workbook must have its own refresh schedule.
Generally helps you to optimize performance on the server or site.
Performance might be affected when the server contains multiple workbooks that connect to the same original data, and each workbook has its own refresh schedule.
When you publish a data source with an extract, you can refresh it on a schedule. The way you schedule refreshes depends on the data source type and whether you're publishing to Tableau Server or Tableau Cloud.
For more information, see the following topics:
Keep Data Fresh(Link opens in a new window) on Tableau Cloud
Keep Data Fresh(Link opens in a new window) on Tableau Server
A version-agnostic, three-part series by Gordon Rose on the Tableau blog. It includes an in-depth look at the extract's file structure, guidelines for when to use extracts, and best practices.
Posts by Tableau Visionary Jonathan Drummey on his blog Drawing with Numbers. Includes tips on extracts, explains the different file types, describes different publishing scenarios. (Read the comments, too.)
From the blog maintained by The Information Lab, a Tableau Gold Partner.
Disclaimer: Although we make every effort to ensure these links to external websites are accurate, up to date, and relevant, Tableau cannot take responsibility for the accuracy or freshness of pages maintained by external providers. Contact the external site for answers to questions regarding its content.