Enable Tableau Catalog

Tableau Catalog(Link opens in a new window) discovers and indexes all of the content on your Tableau Online site or Tableau Server, including workbooks, data sources, sheets, and flows. Indexing is used to gather information about the content, or metadata, about the schema and lineage of the content. Then from the metadata, Catalog identifies all of the databases, files, and tables used by the content on your Tableau Online site or Tableau Server.

Catalog is available with the Data Management Add-on. For more information, see About Data Management Add-on.

In addition to Catalog, metadata about your content can also be accessed from both the Tableau Metadata API(Link opens in a new window) and the Tableau Server REST API using Metadata Methods(Link opens in a new window).

Before enabling Catalog

As a Tableau Server admin, there are a few things that you need to consider before and while enabling Catalog to ensure optimal performance of Catalog in your Tableau Server environment.

Required versions

Before enabling Catalog, make sure you're running one of the following versions of Tableau Server:

  • Tableau Server 2019.3.4 or later
  • Tableau Server 2019.4.2 or later
  • Tableau Server 2020.1.0 or later
  • Tableau Server 2020.2.4 or later

For more information about why these versions are required, see the Tableau Knowledge Base.

What to expect when enabling Catalog

When Catalog is enabled, the content that already exists on your Tableau Server is immediately indexed.

Initial ingestion

The indexing process is comprised of two primary components, one of which is called initial ingestion. Initial ingestion can be broken down into two additional components:

  • Content backfill
  • Lineage backfill

The status of content backfill and lineage backfill are important to note later on when monitoring progress and validating that Catalog has been successfully turned on and is running in your Tableau Server environment.

Initial ingestion speed

The time it takes Catalog to index the content for the first time depends on a couple of factors:

  • Amount of content on Tableau Server: The amount of content is measured by the total number of workbooks, published data sources, and flows published to Tableau Server. For more information, see Disk space to store metadata.

  • Number of non-interactive microservices containers: Catalog uses the non-interactive microservices container to index all the content on Tableau Server. For more information, see Memory for non-interactive microservices containers.

Understanding the factors that impact initial ingestion can help you gauge how long it might take to enable and run Catalog in your environment.

Disk space to store metadata

During initial ingestion, metadata is generated and stored in the Tableau Server repository (“relationship” PostgreSQL database). The amount of disk space needed to store the metadata is roughly up to half of the disk space currently used by the repository ("workgroup" PostgreSQL database).

For example, suppose the repository uses 50 GB of disk space prior to enabling Catalog, the repository can use up to 75 GB of disk space after enabling Catalog.

Memory for non-interactive microservices containers

Initial ingestion runs inside of the non-interactive microservices container. The non-interactive microservices container is one of two Tableau Server microservices containers(Link opens in a new window) processes. By default, one instance of the non-interactive microservices container is added to every node that has a backgrounder process installed.

By default, initial ingestion on a single instance of the non-interactive microservices container can use up to 4 GB of memory on the backgrounder node. If the amount of content on Tableau Server exceeds 10,000, a non-interactive microservices container may require up to 16 GB of memory on the backgrounder node. Therefore, when enabling Catalog, ensure that every backgrounder node has the available capacity to support each non-interactive microservices container during the initial ingestion process. If capacity needs to be increased, you must update the JVM heap size for non-interactive containers to allocate up to 16 GB of memory on the backgrounder nodes. For more information, see noninteractivecontainer.vmopts.

If you are planning to add more non-interactive microservices containers to decrease the time of initial ingestion, first determine how many total containers are needed (using Step 2: Estimate how long initial ingestion will take, below) and then verify if your Tableau Server environment is configured with enough capacity to support all non-interactive microservices containers. Depending on how your Tableau Server environment is already configured, you might not be able to add all the additional non-microservices containers that you need to decrease initial ingestion time.

Best practices for enabling Catalog

Because the speed of initial ingestion and requirements are unique to each Tableau Server environment, Tableau recommends that when you enable Catalog you do one or more of the following:

  • Make sure there is enough disk space that the Tableau Server repository can use to support the additional metadata that initial ingestion will generate and store. As a general rule, the repository will need an additional 50% of disk space currently used by the repository. For more information about Tableau Server disk usage, see Server Disk Space.

  • Depending on the amount of content on Tableau Server, make sure each backgrounder node has at least 4-16 GB of available memory for each instance of a non-interactive microservices container during initial ingestion.

  • Perform the process over the weekend to allow initial ingestion to complete before your users start using Catalog capabilities.

  • Perform the process in a test environment with production content first. This is because the type of content that needs to be ingested can play a significant role on ingestion speed.

Summary of steps to enable Catalog

The following steps summarize the process to turn on and run Catalog on Tableau Server. The steps must be performed sequentially.

  1. Determine the amount of content on Tableau Server
  2. Estimate how long initial ingestion will take
  3. Decrease the time of initial ingestion
  4. Activate the Data Management Add-on
  5. Turn off Catalog capabilities
  6. Run the tsm maintenance metadata-services command
  7. Monitor initial ingestion progress and validate its status
  8. Configure SMTP
  9. Turn on Catalog capabilities

Note: These steps can also be used to enable the Tableau Metadata API when your Tableau Server is not licensed with the Data Management Add-on.

Enable Catalog

Step 1: Determine the amount of content on Tableau Server

To determine the amount of content on Tableau Server, do the following:

  1. Sign in to Tableau Server using your admin credentials.

  2. Go to the Explore page.

  3. Click the Top-Level Project drop-down menu and add the numbers next to All Workbooks, All Data Sources, and All Flows together. This is the total amount of content on Tableau Server.

Step 2: Estimate how long initial ingestion will take

To estimate the time it will take Catalog to ingest content on Tableau Server for the first time (initial ingestion), compare your Tableau Server setup to a baseline Tableau Server setup.

For a Tableau Server with the following setup, initial ingestion could take about 6 hours to complete.

Components Baseline values
Content 17,000 workbooks, published data sources, and flows
Non-interactive microservices containers 10
Ingestion ~6 hours

If you have roughly half the content in your Tableau Server environment, initial ingestion might take half the time to complete.

For example: 8,500 (workbooks, published data sources, and flows) + 10 non-interactive microservices containers = ~3 hours (initial ingestion)

If you have roughly double the content in your Tableau Server environment, initial ingestion might take double the time to complete.

For example: 34,000 (workbooks, published data sources, and flows) + 10 non-interactive microservices containers = ~12 hours (initial ingestion)

Step 3: Decrease the time of initial ingestion

As a general rule, the time it takes for Catalog to perform initial ingestion is correlated to the number of non-interactive microservices containers. To help decrease the time of initial ingestion, you can increase the number of non-interactive microservices containers.

Increase the number of non-interactive microservices containers

By default, one non-interactive microservices container is added to every node that has a backgrounder. To help decrease the time of initial ingestion, Tableau recommends that you increase the number of non-interactive microservices container using the tsm topology set-process command.

  1. Open a command prompt as an admin on the initial node (where TSM is installed) in the cluster.

  2. Run the command: tsm topology set-process –-count <process_count> --<node_ID> --process <process_name>

    For example, to increase the non-interactive microservices container on the initial node to 4 containers, run the following command:

    tsm topology set-process –-count 4 –-n node1 --process noninteractive

    For more information about running the command and its global options, see tsm topology.

Important: Before increasing the number of non-interactive microservices containers, review the following: 

  • The recommendation for increasing non-interactive microservices containers is for total number of non-interactive microservices containers, not total non-interactive microservices containers per node. For example, suppose you have 4 nodes but you want to increase the number of non-interactive containers to 8. The --count value you use in the tsm command is 2.

  • For each non-interactive microservices container added, 4 GB of additional memory will be used on the node and load will be added to the Tableau Server repository (PostreSQL database).

    • Tableau recommends that you incrementally increase non-interactive microservices containers by no more than 2 at a time while closely monitoring your Tableau Server environment to avoid issues with CPU utilization of the Tableau Sever repository (PostgreSQL database).

    • Be aware that when too many non-interactive microservices are added, CPU utilization of the PostgreSQL database might spike and failover. Symptoms to watch for include SQLException errors in the vizportal logs. For more information, see Repository Failover(Link opens in a new window) topic.

Step 4: Activate the Data Management Add-on

(Requires the Data Management Add-on)

If not already done, you can activate the Data Management Add-on. For more information, see License the Data Management Add-on.

Step 5 (optional): Turn off Catalog capabilities for each site

(Requires the Data Management Add-on)

As part of the Data Management Add-on activation, Catalog capabilities are turned on by default. Because of the indexing process and the estimated time it takes to complete, consider temporarily turning off Catalog capabilities for each site so that Tableau Server users can't access Catalog capabilities until Catalog is ready and able to provide complete and accurate results.

  1. Sign in to Tableau Server using your admin credentials.

  2. From the left navigation pane, click Settings.

  3. On the General tab, under Tableau Catalog, clear the Turn on Tableau Catalog check box.

  4. Repeat steps 2-3 for each site on your Tableau Server.

Step 6: Run the tsm maintenance metadata-services command

Run the tsm maintenance metadata-services command to enable the Tableau Metadata API. Running the command begins initial ingestion. If your Tableau Server is licensed with the Data Management Add-on, running the command also turns on Catalog capabilities (if it wasn’t turned off earlier).

  1. Open a command prompt as an admin on the initial node (where TSM is installed) in the cluster.

  2. Run the command: tsm maintenance metadata-services enable

    For more information about running the tsm command, see tsm maintenance(Link opens in a new window).

Notes: When running this command, keep the following points in mind:

  • This command stops and starts some services used by Tableau Server, which causes certain functionality, such as the Recommendations capability, to be temporarily unavailable.

  • A new index of metadata is created at this time. Running this command any subsequent times will create and replace the previous index.

Step 7: Monitor initial ingestion progress and validate its status

Running the tsm command above starts the initial ingestion process. To ensure that the initial ingestion process is going smoothly, you can monitor its progress using the Backfill API. For more information, see Get Initial Ingestion Status.

Step 8: Configure SMTP Setup

If not already set up for Tableau Server, configure SMTP Setup. SMTP supports sending email to owners who need to be contacted about changes to data. For more information about configuring SMTP, see Configure SMTP Setup(Link opens in a new window).

Step 9 (optional): Turn on Catalog capabilities

(Requires the Data Management Add-on)

If you turned off Catalog capabilities before enabling Catalog in one of the procedures above, you must turn on Catalog to make its capabilities accessible to your users.

  1. Sign in to Tableau Server using your admin credentials.

  2. From the left navigation pane, click Settings.

  3. On the General tab, under Tableau Catalog, clear the Turn on Tableau Catalog check box.

Troubleshoot Catalog

You or your users might encounter one of the following issues when using Catalog (or the Tableau Metadata API).

Timeout limit and node limit exceeded messages

To ensure that Catalog tasks or Metadata API queries that have to return a large number of results don’t take up all Tableau Sever system resources, Catalog implements both timeout and node limits.

  • Timeout limit

    When tasks in Catalog or queries in the Metadata API reach the timeout limit, you and your users see the following message:

    Showing partial results, Request time limit exceeded. Try again later.” or TIME_LIMIT_EXCEEDED

    To resolve this issue, as a Tableau Server admin, you can increase the timeout limit using the tsm configuration set –k metadata.query.limits.time command. For more information, see the tsm configuration(Link opens in a new window) and tsm configuration set Options(Link opens in a new window) topics.

    Important: Increasing the timeout limit can utilize more CPU for longer, which can the performance of other processes on Tableau Server.

  • Node limit

    When tasks in Catalog or queries in the Metadata API reach the node limit, you and your users see the following message:

    NODE_LIMIT_EXCEEDED

    To resolve this issue, as a Tableau Server admin, you can increase the node limit using the tsm configuration set –k metadata.query.limits.count command. For more information, see the tsm configuration(Link opens in a new window) and tsm configuration set Options(Link opens in a new window) topics.

    Important: Increasing the timeout limit can affect system memory.

Missing content

If you suspect, after initial ingestion, content is missing from Catalog or Metadata API, you can use the Eventing API to help troubleshoot. Eventing handles indexing content on Tableau Server after initial ingestion. For more information, see Get Eventing Status.

Disable Catalog

As a Tableau Server admin, you can disable Catalog in one of two ways.

Turn off Catalog capabilities for each site

(Requires the Data Management Add-on)

You can turn off Catalog capabilities at any time. When Catalog capabilities are turned off, the features of Catalog, such as adding data quality warnings or the ability to explicitly manage permissions to database and table assets, are not accessible through Tableau Server (or the Tableau Metadata API). However, Catalog continues to index published content and the metadata is accessible from the Tableau Metadata API and metadata methods in the Tableau Server REST API.

  1. Sign in to Tableau Server using your admin credentials.

  2. From the left navigation pane, click Settings.

  3. On the General tab, under Tableau Catalog, clear the Turn on Tableau Catalog check box.

Stop indexing metadata on Tableau Server

To stop indexing the published content on Tableau Server, you can disable the Tableau Metadata API. To disable the Metadata API, run the tsm maintenance metadata-services disable command. For more information, see tsm maintenance(Link opens in a new window).

Thanks for your feedback! There was an error submitting your feedback. Please try again.