Plan Your Data Connect Deployment
Data Connect is a paid subscription service available to Tableau Cloud Enterprise and Tableau+ customers. After you have purchased Data Connect, the site admin will be able to configure the service in the settings page of Tableau Cloud.
Infrastructure specifications
Compute: A location for hosting Data Connect. This can be a bare-metal or VM, and can be located in a private network or in the cloud.
Operating System (OS): An up-to-date and patched installation of a supported Linux distribution.
Storage: Allocated storage space to host the OS, Data Connect, and the extracts it creates when performing refreshes.
Network: The compute must be able to connect to your data source and two locations on the public internet.
Node specifications
Number of nodes | Production workload minimum: three nodes per network Development/test workload minimum: one node per network |
vCPU | Minimum: 8 vCPU
Recommended: 16 vCPU or more |
Memory | Minimum: 16 GB
Recommended: 64 GB or more |
Storage (two disk) Important: The secondary disk must be raw and unformatted. | Root disk Secondary disk |
Permission | Root access to host |
Linux Operating System
Supported distributions | RHEL-8
RHEL-9 Ubuntu-20.04 Ubuntu-22.04 |
Proxy filtering | See Optional forward proxy filtering. |
Outbound TLS client authentication on port 443 with mutual TLS authentication (Orchestration layer) | 52.42.211.235 52.10.6.79 35.167.70.143 |
Outbound listing of Fully Qualified Domain Names (FQDN) (Orchestration layer) | tunnel.rafay-edge.net
api.rafay.dev control.rafay.dev fluentd-aggr.rafay-edge.net influxdb01.core.rafay-edge.net debug.core.rafay-edge.net edge.core.rafay-edge.net registry.rafay-edge.net app.rafay.dev console.rafay.dev *.connector.kubeapi-proxy.rafay.dev *.user.kubeapi-proxy.rafay.dev event.core.rafay-edge.net repo.rafay-edge.net *.connector.cdrelay.rafay.dev *.user.cdrelay.rafay.dev *.connector.infrarelay.rafay.dev *.user.infrarelay.rafay.dev |
Internal network | The cluster nodes will need the same network access to the data source as is required by Tableau Desktop. |
Tableau Cloud permissions | Site Admin role and the credentials to access the data source. |
Data source | An authentication method for the data source that is currently supported by Data Connect and that is network accessible from the cluster. |
Database access
Data Connect uses Tableau connectors to connect to different databases to maintain data freshness. Some of those connectors require drivers to communicate with the databases. To get drivers for connectors that the Data Connect supports, go to Tableau Driver Download and filter to Linux. Data Connect only supports Linux drivers. Make sure to use the instructions listed for each database.
Database authentication
The underlying data that a data source, Prep Conductor flow, or virtual connection connects to often requires authentication. If authentication is required, the publisher or owner can configure how the database credentials are obtained.
Data sources
The authentication configuration options for data sources are Prompt user or Embedded password.
If the data source is set to prompt users, database credentials are not stored with the connection. This means, a user who opens the data source (or workbook that uses the data source) must enter their own database credentials to access the data.
If a data source is set up with the password embedded, database credentials are saved with the connection and used by anyone who accesses the data source (or refreshes the data source).
For more information, see Set Credentials for Accessing Your Published Data(Link opens in a new window).
Tableau Prep Conductor and virtual connections
Database credentials are stored in a Prep flow or a virtual connection. For Prep flows, the credentials are used when a Prep Conductor flow runs. For virtual connections, these credentials are used by anyone who accesses the virtual connection.
For more information on data source credential management, please refer to Data Connect Security.
Capacity planning
There are several tools available to administrators to manage capacity of their workloads on Data Connect.
Cluster management
Data Connect clusters are a cluster of nodes that access data within a specified private network. Those nodes can access any databases within their network that they have been configured to access. All Bridge clients on all nodes of that cluster can handle workloads for any sites that have access to the cluster. To increase capacity of a cluster you can add nodes to the cluster to increase throughput or increase the size of the computer (CPU, memory, etc).
Load balancing with pools
Each cluster uses a pool to load balance traffic across all Bridge clients in the cluster. Having multiple nodes within each cluster ensures that you have multiple Bridge clients available to handle traffic associated with the pool. For every domain added to a pool, all traffic to that domain is load balanced across the cluster. Domains cannot be added to more than one pool on a site. This design ensures traffic is routed appropriately.
Site limits
There are site limits enforced on jobs delivered by Data Connect to ensure a good experience for all customers on Tableau Cloud. The limits that are enforced are the same as those enforced for Tableau Bridge because Data Connect is deploying Tableau Bridge on your behalf to service database queries. For more information on those limits, seeBridge Site Capacity.
For more information about capacity planning, download the whitepaper, Accessing Your Private Network Data with Tableau Cloud.