Enable Tableau Catalog
Tableau Catalog(Link opens in a new window) discovers and indexes all of the content on your Tableau Cloud site or Tableau Server, including workbooks, data sources, sheets, metrics, and flows. (The legacy Metrics feature was retired in February 2024 for Tableau Cloud and in Tableau Server version 2024.2. For more information, see Create and Troubleshoot Metrics (Retired).) Indexing is used to gather information about the content, or metadata, about the schema and lineage of the content. Then from the metadata, Catalog identifies all of the databases, files, and tables used by the content on your Tableau Cloud site or Tableau Server.
Catalog is available with the Data Management license. For more information, see About Data Management.
In addition to Catalog, metadata about your content can also be accessed from both the Tableau Metadata API(Link opens in a new window) and the Tableau REST API using Metadata Methods(Link opens in a new window).
Before enabling Catalog
As a Tableau Server admin, there are a few things that you need to consider before and while enabling Catalog to ensure optimal performance of Catalog in your Tableau Server environment.
Required versions
Before enabling Catalog, make sure you're running one of the following versions of Tableau Server:
- At least Tableau Server 2019.3.4 or later
- At least Tableau Server 2019.4.2 or later
- At least Tableau Server 2020.1.0 or later
- At least Tableau Server 2020.2.15 or later
- Tableau Server 2020.3 and later
For more information about why these versions are required, see the Tableau Knowledge Base.
What to expect when enabling Catalog
When Catalog is enabled, the content that already exists on your Tableau Server is immediately indexed.
Initial ingestion
The indexing process is comprised of two primary components, one of which is called initial ingestion. Initial ingestion can be broken down into two additional components:
- Content backfill
- Lineage backfill
The status of content backfill and lineage backfill are important to note later on when monitoring progress and validating that Catalog has been successfully turned on and is running in your Tableau Server environment.
Initial ingestion speed
The time it takes Catalog to index the content for the first time depends on a couple of factors:
-
Amount of content on Tableau Server: The amount of content is measured by the total number of workbooks, metrics, published data sources, and flows published to Tableau Server. For more information, see Disk space to store metadata.
-
Number of non-interactive microservice containers: Catalog uses the non-interactive microservice containers to index all the content on Tableau Server. For more information, see Memory for non-interactive microservice containers.
Understanding the factors that impact initial ingestion can help you gauge how long it might take to enable and run Catalog in your environment.
Disk space to store metadata
During initial ingestion, metadata is generated and stored in the Tableau Server repository (“relationship” PostgreSQL database). The amount of disk space needed to store the metadata is roughly up to half of the disk space currently used by the repository ("workgroup" PostgreSQL database).
For example, suppose the repository uses 50 GB of disk space prior to enabling Catalog, the repository can use up to 75 GB of disk space after enabling Catalog.
Memory for non-interactive microservice containers
Initial ingestion runs inside of the non-interactive microservice container. The non-interactive microservice container is one of two Tableau Server microservice container(Link opens in a new window) processes. By default, one instance of the non-interactive microservice container process is added to every node that has a backgrounder process installed.
By default, initial ingestion on a single instance of the non-interactive microservice container can use up to 4 GB of memory on the backgrounder node. If the amount of content on Tableau Server exceeds 10,000, a non-interactive microservice container process may require up to 16 GB of memory on the backgrounder node. Therefore, when enabling Catalog, ensure that every backgrounder node has the available capacity to support each non-interactive microservice container during the initial ingestion process. If capacity needs to be increased, you must update the JVM heap size for non-interactive microservice containers to allocate up to 16 GB of memory on the backgrounder nodes. For more information, see noninteractive.vmopts.
If you are planning to add more non-interactive microservice containers to decrease the time of initial ingestion, first determine how many total containers are needed (using Step 2: Estimate how long initial ingestion will take, below) and then verify if your Tableau Server environment is configured with enough capacity to support all non-interactive microservice containers. Depending on how your Tableau Server environment is already configured, you might not be able to add all the additional non-interactive microservice containers that you need to decrease initial ingestion time.
Best practices for enabling Catalog
Because the speed of initial ingestion and requirements are unique to each Tableau Server environment, Tableau recommends that when you enable Catalog you do one or more of the following:
-
Make sure there is enough disk space that the Tableau Server repository can use to support the additional metadata that initial ingestion will generate and store. As a general rule, the repository will need an additional 50% of disk space currently used by the repository. For more information about Tableau Server disk usage, see Server Disk Space.
-
Depending on the amount of content on Tableau Server, make sure each backgrounder node has at least 4-16 GB of available memory for each instance of a non-interactive microservice container during initial ingestion.
-
Perform the process over the weekend to allow initial ingestion to complete before your users start using Catalog capabilities.
-
Perform the process in a test environment with production content first. This is because the type of content that needs to be ingested can play a significant role on ingestion speed.
Summary of steps to enable Catalog
The following steps summarize the process to turn on and run Catalog on Tableau Server. The steps must be performed sequentially.
- Determine the amount of content on Tableau Server
- Estimate how long initial ingestion will take
- Decrease the time of initial ingestion
- Activate the Data Management license
- Turn off Catalog capabilities
- Run the tsm maintenance metadata-services command
- Monitor initial ingestion progress and validate its status
- Configure SMTP
- Turn on Catalog capabilities
Note: Because indexing metadata about Tableau content on Tableau Sever is powered by the Metadata API, enabling the Metadata API is required to run and use Catalog.
Enable Catalog
Step 1: Determine the amount of content on Tableau Server
To determine the amount of content on Tableau Server, do the following:
-
Sign in to Tableau Server using your admin credentials.
-
Go to the Explore page.
-
Click the Top-Level Project drop-down menu and add the numbers next to All Workbooks, All Metrics, All Data Sources, and All Flows together. This is the total amount of content on Tableau Server.
Step 2: Estimate how long initial ingestion will take
To estimate the time it will take Catalog to ingest content on Tableau Server for the first time (initial ingestion), compare your Tableau Server setup to a baseline Tableau Server setup.
For a Tableau Server with the following setup, initial ingestion could take about 6 hours to complete.
Components | Baseline values |
---|---|
Content | 17,000 workbooks, metrics, published data sources, and flows |
Non-interactive microservice containers | 10 |
Ingestion | ~6 hours |
If you have roughly half the content in your Tableau Server environment, initial ingestion might take half the time to complete.
For example: 8,500 (workbooks, metrics, published data sources, and flows) + 10 non-interactive microservice containers = ~3 hours (initial ingestion)
If you have roughly double the content in your Tableau Server environment, initial ingestion might take double the time to complete.
For example: 34,000 (workbooks, metrics, published data sources, and flows) + 10 non-interactive microservice containers = ~12 hours (initial ingestion)
Step 3: Decrease the time of initial ingestion
As a general rule, the time it takes for Catalog to perform initial ingestion is correlated to the number of non-interactive microservice containers. To help decrease the time of initial ingestion, you can increase the number of non-interactive microservice containers.
Increase the number of non-interactive microservice containers
By default, one non-interactive microservice container is added to every node that has a backgrounder. To help decrease the time of initial ingestion, Tableau recommends that you increase the number of non-interactive microservice containers using the tsm topology set-process
command.
-
Open a command prompt as an admin on the initial node (where TSM is installed) in the cluster.
-
Run the command:
tsm topology set-process --count <process_count> --node <node_ID> --process <process_name>
For example, to increase the non-interactive microservice containers on the initial node to 4 containers, run the following command:
tsm topology set-process --count 4 –-node node1 --process noninteractive
For more information about running the command and its global options, see tsm topology.
Important: Before increasing the number of non-interactive microservice containers, review the following:
-
The recommendation for increasing non-interactive microservice containers is for the total number of non-interactive microservice containers, not total non-interactive microservice containers per node. For example, suppose you have 4 nodes but you want to increase the number of non-interactive microservice containers to 8. The
--count
value you use in the tsm command is 2. -
For each non-interactive microservice container added, 4 GB of additional memory will be used on the node and load will be added to the Tableau Server repository (PostreSQL database).
-
Tableau recommends that you incrementally increase non-interactive microservice containers by no more than 2 at a time while closely monitoring your Tableau Server environment to avoid issues with CPU utilization of the Tableau Server repository (PostgreSQL database).
-
Be aware that when too many non-interactive microservice containers are added, CPU utilization of the PostgreSQL database might spike and failover. Symptoms to watch for include SQLException errors in the vizportal logs. For more information, see Repository Failover(Link opens in a new window) topic.
-
Step 4: Activate the Data Management license
(Requires Data Management)
If not already done, you can activate Data Management. For more information, see License Data Management.
Step 5 (optional): Turn off Catalog capabilities for each site
(Requires Data Management)
As part of Data Management activation, Catalog capabilities are turned on by default. Because of the indexing process and the estimated time it takes to complete, consider temporarily turning off Catalog capabilities for each site so that Tableau Server users can't access Catalog capabilities until Catalog is ready and able to provide complete and accurate results.
-
Sign in to Tableau Server using your admin credentials.
-
From the left navigation pane, click Settings.
-
On the General tab, under Tableau Catalog, clear the Turn on Tableau Catalog check box.
-
Repeat steps 2-3 for each site on your Tableau Server.
Step 6: Run the tsm maintenance metadata-services command
Run the tsm maintenance metadata-services
command to enable the Tableau Metadata API. Running the command begins initial ingestion. If your Tableau Server is licensed with Data Management, running the command also turns on Catalog capabilities (if it wasn’t turned off earlier).
-
Open a command prompt as an admin on the initial node (where TSM is installed) in the cluster.
-
Run the command:
tsm maintenance metadata-services enable
For more information about running the tsm command, see tsm maintenance(Link opens in a new window).
Notes: When running this command, keep the following points in mind:
-
This command stops and starts some services used by Tableau Server, which causes certain functionality, such as the Recommendations capability, to be temporarily unavailable.
-
A new index of metadata is created at this time. Running this command any subsequent times will create and replace the previous index.
Step 7: Monitor initial ingestion progress and validate its status
Running the tsm command above starts the initial ingestion process. To ensure that the initial ingestion process is going smoothly, you can monitor its progress using the Backfill API. For more information, see Get Initial Ingestion Status.
Step 8: Configure SMTP Setup
If not already set up for Tableau Server, configure SMTP Setup. SMTP supports sending email to owners who need to be contacted about changes to data. For more information about configuring SMTP, see Configure SMTP Setup(Link opens in a new window).
Step 9 (optional): Turn on Catalog capabilities for each site
(Requires Data Management)
If you turned off Catalog capabilities before enabling Catalog in one of the procedures above, you must turn on Catalog to make its capabilities accessible to your users.
-
Sign in to Tableau Server using your admin credentials.
-
From the left navigation pane, click Settings.
-
On the General tab, under Tableau Catalog, select the Turn on Tableau Catalog check box.
-
Repeat steps 2-3 for each site on your Tableau Server.
Troubleshoot Catalog
You or your users might encounter one of the following issues when using Catalog.
Timeout limit and node limit exceeded messages
To ensure that Catalog tasks that have to return a large number of results don’t take up all
-
Timeout limit
When tasks in Catalog reach the timeout limit, you and your users see the following message:
“Showing partial results, Request time limit exceeded. Try again later.” or TIME_LIMIT_EXCEEDED
To resolve this issue, as a Tableau Server admin, you can increase the timeout limit using the
tsm configuration set –k metadata.query.limits.time
command. For more information, see the tsm configuration(Link opens in a new window) and tsm configuration set Options(Link opens in a new window) topics.Important: Increasing the timeout limit can utilize more CPU for longer, which can affect the performance of other processes on Tableau Server.
-
Node limit
When tasks in Catalog reach the node limit, you and your users see the following message:
NODE_LIMIT_EXCEEDED
To resolve this issue, as a Tableau Server admin, you can increase the node limit using the
tsm configuration set –k metadata.query.limits.count
command. For more information, see the tsm configuration(Link opens in a new window) and tsm configuration set Options(Link opens in a new window) topics.Important: Increasing the timeout limit can affect system memory.
Missing content
-
If you suspect, after initial ingestion, content is missing from Catalog, you can use the Eventing API to help troubleshoot. Eventing handles indexing content on Tableau Server after initial ingestion. For more information, see Get Eventing Status.
-
When the connection between an embedded external asset and its downstream Tableau content is removed, it remains in Catalog (or the Tableau Metadata API) until it’s automatically deleted by a backgrounder process that runs everyday at 22:00:00 UTC (coordinated universal time). For example, suppose a workbook, initially published with an embedded text file A, is republished with an embedded text file B. File A remains visible (or query-able) as an external asset until the backgrounder processes is able to delete it.
You can disable this backgrounder process from running if you do not want to remove these types of external assets or if you believe that it takes up system resources that you don’t want to dedicate to this process. Alternatively, you can adjust the number of external embedded assets that are deleted. For more information, see features.DeleteOrphanedEmbeddedDatabaseAsset and databaseservice.max_database_deletes_per_run.
You can monitor this process in one of two ways:
-
Filter on the One-time job re-canonicalize existing database/table assets after a canonicalization logic change task type in the Background Tasks for Non Extracts admin view.
-
Refer to the Finished removal of orphaned embedded databases or database_service_canonicalization_change events in the Tableau Server log files.
-
Performance after initial ingestion
In some Tableau Server environments where specific content that is updated very frequently (for example, through high-frequency schedules or command line or API requests), the indexing process might get over saturated. In these cases, as the server admin, you might consider enabling event throttling to better preserve Catalog performance. For more information, see metadata.ingestor.pipeline.throttleEventsEnable.
Note: When event throttling is enabled, users might notice an intended delay in content changes in Catalog.
Out of memory errors
In some cases, Tableau Server out of memory errors can occur as a result of problems with ingesting complex content. If you suspect ingestion is the cause of the out of memory errors on your Tableau Server, contact and work with Tableau Support to metadata.ingestor.blocklist from being ingested to help resolve the issue.
Disable Catalog
You can disable Catalog in one of two ways.
Turn off Catalog capabilities for each site
(Requires Data Management)
You can turn off Catalog capabilities at any time. When Catalog capabilities are turned off, the features of Catalog, such as adding data quality warnings or the ability to explicitly manage permissions to database and table assets, are not accessible. However, Catalog continues to index published content and the metadata is accessible from the Tableau Metadata API and metadata methods in the Tableau REST API.
- Sign in to Tableau Server using your admin credentials.
- From the left navigation pane, click Settings.
- On the General tab, under Tableau Catalog, clear the Turn on Tableau Catalog check box.
Stop indexing metadata
To stop indexing the published content on Tableau Server, you can disable the Tableau Metadata API. To disable the Metadata API, run the tsm maintenance metadata-services disable
command. For more information, see tsm maintenance(Link opens in a new window).