Enable Tableau Catalog
Tableau Catalog(Link opens in a new window) discovers and indexes all of the content on your Tableau Cloud site or Tableau Server, including workbooks, data sources, sheets, metrics and flows. (The legacy Metrics feature was retired in February 2024 for Tableau Cloud and in Tableau Server version 2024.2. For more information, see Create and Troubleshoot Metrics (Retired).) Indexing is used to gather information about the content, or metadata, about the schema and lineage of the content. Then from the metadata, Catalog identifies all of the databases, files and tables used by the content on your Tableau Cloud site or Tableau Server.
Catalog is available with the Data Management licence. For more information, see About Data Management.
In addition to Catalogue, metadata about your content can also be accessed from both the Tableau Metadata API(Link opens in a new window) and the Tableau REST API using Metadata Methods(Link opens in a new window).
Before enabling Catalogue
As a Tableau Server admin, there are a few things that you need to consider before and while enabling Catalogue to ensure optimal performance of Catalogue in your Tableau Server environment.
Required versions
Before enabling Catalogue, make sure you're running one of the following versions of Tableau Server:
- At least Tableau Server 2019.3.4 or later
- At least Tableau Server 2019.4.2 or later
- At least Tableau Server 2020.1.0 or later
- At least Tableau Server 2020.2.15 or later
- Tableau Server 2020.3 and later
For more information about why these versions are required, see the Tableau Knowledge Base.
What to expect when enabling Catalogue
When Catalogue is enabled, the content that already exists on your Tableau Server is immediately indexed.
Initial ingestion
The indexing process is comprised of two primary components, one of which is called initial ingestion. Initial ingestion can be broken down into two additional components:
- Content backfill
- Lineage backfill
The status of content backfill and lineage backfill are important to note later on when monitoring progress and validating that Catalogue has been successfully turned on and is running in your Tableau Server environment.
Initial ingestion speed
The time it takes Catalogue to index the content for the first time depends on a couple of factors:
Amount of content on Tableau Server: The amount of content is measured by the total number of workbooks, metrics, published data sources and flows published to Tableau Server. For more information, see Disk space to store metadata.
Number of non-interactive microservice containers: Catalogue uses the non-interactive microservice containers to index all the content on Tableau Server. For more information, see Memory for non-interactive microservice containers.
Understanding the factors that impact initial ingestion can help you gauge how long it might take to enable and run Catalogue in your environment.
Disk space to store metadata
During initial ingestion, metadata is generated and stored in the Tableau Server repository (“relationship” PostgreSQL database). The amount of disk space needed to store the metadata is roughly up to half of the disk space currently used by the repository ("workgroup" PostgreSQL database).
For example, suppose the repository uses 50 GB of disk space prior to enabling Catalogue, the repository can use up to 75 GB of disk space after enabling Catalogue.
Memory for non-interactive microservice containers
Initial ingestion runs inside the non-interactive microservice container. The non-interactive microservice container is one of two Tableau Server microservice container(Link opens in a new window) processes. By default, one instance of the non-interactive microservice container process is added to every node that has a backgrounder process installed.
By default, initial ingestion on a single instance of the non-interactive microservice container can use up to 4 GB of memory on the backgrounder node. If the amount of content on Tableau Server exceeds 10,000, a non-interactive microservice container process may require up to 16 GB of memory on the backgrounder node. Therefore, when enabling Catalogue, ensure that every backgrounder node has the available capacity to support each non-interactive microservice container during the initial ingestion process. If capacity needs to be increased, you must update the JVM heap size for non-interactive microservice containers to allocate up to 16 GB of memory on the backgrounder nodes. For more information, see noninteractive.vmopts.
If you are planning to add more non-interactive microservice containers to decrease the time of initial ingestion, first determine how many total containers are needed (using Step 2: Estimate how long initial ingestion will take, below) and then verify whether your Tableau Server environment is configured with enough capacity to support all non-interactive microservice containers. Depending on how your Tableau Server environment is already configured, you might not be able to add all the additional non-interactive microservice containers that you need to decrease initial ingestion time.
Best practices for enabling Catalogue
Because the speed of initial ingestion and requirements are unique to each Tableau Server environment, Tableau recommends that when you enable Catalogue you do one or more of the following:
Make sure there is enough disk space that the Tableau Server repository can use to support the additional metadata that initial ingestion will generate and store. As a general rule, the repository will need an additional 50% of disk space currently used by the repository. For more information about Tableau Server disk usage, see Server Disk Space.
Depending on the amount of content on Tableau Server, make sure each backgrounder node has at least 4–16 GB of available memory for each instance of a non-interactive microservice container during initial ingestion.
Perform the process over the weekend to allow initial ingestion to complete before your users start using Catalogue capabilities.
Perform the process in a test environment with production content first. This is because the type of content that needs to be ingested can play a significant role on ingestion speed.
Summary of steps to enable Catalogue
The following steps summarise the process to turn on and run Catalogue on Tableau Server. The steps must be performed sequentially.
- Determine the amount of content on Tableau Server
- Estimate how long initial ingestion will take
- Decrease the time of initial ingestion
- Activate the Data Management licence
- Turn off Catalogue capabilities
- Run the tsm maintenance metadata-services command
- Monitor initial ingestion progress and validate its status
- Configure SMTP
- Turn on Catalogue capabilities
Note: Because indexing metadata about Tableau content on Tableau Sever is powered by the Metadata API, enabling the Metadata API is required to run and use Catalogue.
Enable Catalogue
Step 1: Determine the amount of content on Tableau Server
To determine the amount of content on Tableau Server, do the following:
Sign in to Tableau Server using your admin credentials.
Go to the Explore page.
Click the Top-Level Project drop-down menu and add the numbers next to All Workbooks, All Metrics, All Data Sources, and All Flows together. This is the total amount of content on Tableau Server.
Step 2: Estimate how long initial ingestion will take
To estimate the time it will take Catalogue to ingest content on Tableau Server for the first time (initial ingestion), compare your Tableau Server setup to a baseline Tableau Server setup.
For a Tableau Server with the following setup, initial ingestion could take about 6 hours to complete.
Components | Baseline values |
---|---|
Content | 17,000 workbooks, metrics, published data sources and flows |
Non-interactive microservice containers | 10 |
Ingestion | ~6 hours |
If you have roughly half the content in your Tableau Server environment, initial ingestion might take half the time to complete.
For example: 8,500 (workbooks, metrics, published data sources and flows) + 10 non-interactive microservice containers = ~3 hours (initial ingestion)
If you have roughly double the content in your Tableau Server environment, initial ingestion might take double the time to complete.
For example: 34,000 (workbooks, metrics, published data sources and flows) + 10 non-interactive microservice containers = ~12 hours (initial ingestion)
Step 3: Decrease the time of initial ingestion
As a general rule, the time it takes for Catalogue to perform initial ingestion is correlated to the number of non-interactive microservice containers. To help decrease the time of initial ingestion, you can increase the number of non-interactive microservice containers.
Increase the number of non-interactive microservice containers
By default, one non-interactive microservice container is added to every node that has a backgrounder. To help decrease the time of initial ingestion, Tableau recommends that you increase the number of non-interactive microservice containers using the tsm topology set-process
command.
Open a command prompt as an admin on the initial node (where TSM is installed) in the cluster.
Run the command:
tsm topology set-process --count <process_count> --node <node_ID> --process <process_name>
For example, to increase the non-interactive microservice containers on the initial node to 4 containers, run the following command:
tsm topology set-process --count 4 –-node node1 --process noninteractive
For more information about running the command and its global options, see tsm topology.
Important: Before increasing the number of non-interactive microservice containers, review the following:
The recommendation for increasing non-interactive microservice containers is for the total number of non-interactive microservice containers, not the total non-interactive microservice containers per node. For example, suppose you have 4 nodes but you want to increase the number of non-interactive microservice containers to 8. The
--count
value you use in the tsm command is 2.For each non-interactive microservice container added, 4 GB of additional memory will be used on the node, and the load will be added to the Tableau Server repository (PostreSQL database).
Tableau recommends that you incrementally increase non-interactive microservice containers by no more than 2 at a time while closely monitoring your Tableau Server environment to avoid issues with CPU utilisation of the Tableau Server repository (PostgreSQL database).
Be aware that when too many non-interactive microservice containers are added, CPU utilisation of the PostgreSQL database might spike and failover. Symptoms to watch for include SQLException errors in the vizportal logs. For more information, see Repository Failover(Link opens in a new window) topic.
Step 4: Activate the Data Management licence
(Requires Data Management)
If not already done, you can activate Data Management. For more information, see Licence Data Management.
Step 5 (optional): Turn off Catalogue capabilities for each site
(Requires Data Management)
As part of Data Management activation, Catalog capabilities are turned on by default. Because of the indexing process and the estimated time it takes to complete, consider temporarily turning off Catalogue capabilities for each site so that Tableau Server users can't access Catalogue capabilities until Catalogue is ready and able to provide complete and accurate results.
Sign in to Tableau Server using your admin credentials.
From the left navigation pane, click Settings.
On the General tab, under Tableau Catalogue, clear the Turn on Tableau Catalogue tick box.
Repeat steps 2-3 for each site on your Tableau Server.
Step 6: Run the tsm maintenance metadata-services command
Run the tsm maintenance metadata-services
command to enable the Tableau Metadata API. Running the command begins initial ingestion. If your Tableau Server is licensed with Data Management, running the command also turns on Catalog capabilities (if it wasn’t turned off earlier).
Open a command prompt as an admin on the initial node (where TSM is installed) in the cluster.
Run the command:
tsm maintenance metadata-services enable
For more information about running the tsm command, see tsm maintenance(Link opens in a new window).
Notes: When running this command, keep the following points in mind:
This command stops and starts some services used by Tableau Server, which causes certain functionality, such as the Recommendations capability, to be temporarily unavailable.
A new index of metadata is created at this time. Running this command any subsequent times will create and replace the previous index.
Step 7: Monitor initial ingestion progress and validate its status
Running the tsm command above starts the initial ingestion process. To ensure that the initial ingestion process is going smoothly, you can monitor its progress using the Backfill API. For more information, see Get Initial Ingestion Status.
Step 8: Configure SMTP Setup
If not already set up for Tableau Server, configure SMTP Setup. SMTP supports sending email to owners who need to be contacted about changes to data. For more information about configuring SMTP, see Configure SMTP Setup(Link opens in a new window).
Step 9 (optional): Turn on Catalogue capabilities for each site
(Requires Data Management)
If you turned off Catalogue capabilities before enabling Catalogue in one of the procedures above, you must turn on Catalogue to make its capabilities accessible to your users.
Sign in to Tableau Server using your admin credentials.
From the left navigation pane, click Settings.
On the General tab, under Tableau Catalogue, tick the Turn on Tableau Catalogue box.
Repeat steps 2-3 for each site on your Tableau Server.
Troubleshoot Catalogue
You or your users might encounter one of the following issues when using Catalogue.
Timeout limit and node limit exceeded messages
To ensure that Catalogue tasks that have to return a large number of results don’t take up all
Timeout limit
When tasks in Catalogue reach the timeout limit, you and your users see the following message:
“Showing partial results, Request time limit exceeded. Try again later.” or TIME_LIMIT_EXCEEDED
To resolve this issue, as a Tableau Server admin, you can increase the timeout limit using the
tsm configuration set –k metadata.query.limits.time
command. For more information, see the tsm configuration(Link opens in a new window) and tsm configuration set Options(Link opens in a new window) topics.Important: Increasing the timeout limit can utilise more CPU for longer, which can affect the performance of other processes on Tableau Server.
Node limit
When tasks in Catalogue reach the node limit, you and your users see the following message:
NODE_LIMIT_EXCEEDED
To resolve this issue, as a Tableau Server admin, you can increase the node limit using the
tsm configuration set –k metadata.query.limits.count
command. For more information, see the tsm configuration(Link opens in a new window) and tsm configuration set Options(Link opens in a new window) topics.Important: Increasing the timeout limit can affect system memory.
Missing content
If you suspect, after initial ingestion, content is missing from Catalogue, you can use the Eventing API to help troubleshoot. Eventing handles indexing content on Tableau Server after initial ingestion. For more information, see Get Eventing Status.
When the connection between an embedded external asset and its downstream Tableau content is removed, it remains in Catalogue (or the Tableau Metadata API) until it’s automatically deleted by a backgrounder process that runs everyday at 22:00:00 UTC (coordinated universal time). For example, suppose a workbook, initially published with an embedded text file A, is republished with an embedded text file B. File A remains visible (or query-able) as an external asset until the backgrounder processes is able to delete it.
You can disable this backgrounder process from running if you do not want to remove these types of external assets or if you believe that it takes up system resources that you don’t want to dedicate to this process. Alternatively, you can adjust the number of external embedded assets that are deleted. For more information, see features.DeleteOrphanedEmbeddedDatabaseAsset and databaseservice.max_database_deletes_per_run.
You can monitor this process in one of two ways:
Filter on the One-time job re-canonicalise existing database/table assets after a canonicalisation logic change task type in the Background Tasks for Non Extracts admin view.
Refer to the Finished removal of orphaned embedded databases or database_service_canonicalisation_change events in the Tableau Server log files.
Performance after initial ingestion
In some Tableau Server environments where specific content that is updated very frequently (for example, through high-frequency schedules or command line or API requests), the indexing process might get over saturated. In these cases, as the server admin, you might consider enabling event throttling to better preserve Catalogue performance. For more information, see metadata.ingestor.pipeline.throttleEventsEnable.
Note: When event throttling is enabled, users might notice an intended delay in content changes in Catalogue.
Out of memory errors
In some cases, Tableau Server out of memory errors can occur as a result of problems with ingesting complex content. If you suspect ingestion is the cause of the out of memory errors on your Tableau Server, contact and work with Tableau Support to metadata.ingestor.blocklist from being ingested to help resolve the issue.
Disable Catalogue
You can disable Catalog in one of two ways.
Turn off Catalogue capabilities for each site
(Requires Data Management)
You can turn off Catalogue capabilities at any time. When Catalog capabilities are turned off, the features of Catalog, such as adding data quality warnings or the ability to explicitly manage permissions to database and table assets, are not accessible. However, Catalogue continues to index published content and the metadata is accessible from the Tableau Metadata API and metadata methods in the Tableau REST API.
- Sign in to Tableau Server using your admin credentials.
- From the left navigation pane, click Settings.
- On the General tab, under Tableau Catalogue, clear the Turn on Tableau Catalogue tick box.
Stop indexing metadata
To stop indexing the published content on Tableau Server, you can disable the Tableau Metadata API. To disable the Metadata API, run the tsm maintenance metadata-services disable
command. For more information, see tsm maintenance(Link opens in a new window).