About Data Connect

Data Connect allows Tableau Cloud users to access data sources on your private network or cloud service. Data Connect operates as a shared responsibility model. With this model, customers supply the physical or virtual compute resources, and Tableau hosts and manages the Data Connect Kubernetes cluster on those resources.

In your environment, the Data Connect Kubernetes cluster oversees a set of containers. The containers support the runtime environment that consists of one or more Bridge clients. The Bridge client is the program that runs tasks and enables secure communication across the firewall between your organization.

Data Connect services include:

  • Cluster monitoring and troubleshooting: Tableau monitors the health and usage of the Bridge client. Telemetry data are collected to ensure resources are used in the most effective and efficient manner.

  • Cluster maintenance: Upgrades are automatically deployed and the cluster operation and maintenance is owned and fully performed by Tableau. Data Connect automatically optimizes the deployment for your workload based on needs and available compute pool.

  • Alert monitoring: Incident management is provided continuously to quickly resolve issues to limit business impact.

Connector support

Data Connect supports the same Connectors that Tableau Bridge for Linux supports. For a full review of connectivity options please refer to Connectivity with Bridge.

Environment support

Data Connect currently supports on-premise and VCP environments: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Data Connect nodes are compatible with a single Tableau Cloud Site. Nodes must be installed in the same network as the data. Therefore customers should plan for at least three nodes per private network per Site to maintain availability of the service. Data Connect nodes must be dedicated to Data Connect. You cannot deploy any other containers to the Tableau-owned cluster. And you cannot use an existing cluster for Data Connect.

Architecture

The Data Connect architecture consists of three main components and responsibility boundaries. While there's some overlap, Tableau is primarily responsible for the application and orchestration layers and customers are responsible for the infrastructure (compute, OS, networking, and storage) and where it’s located.

  1. Tableau Cloud communicates with the Kubernetes orchestration service to deploy, monitor and manage the Kubernetes orchestration.

  2. When you initialize Data Connect, a secure connection is established with the orchestration provider service over port 443.

  3. After the service is configured, a Kubernetes cluster deploys a container(s) with Bridge client(s). These Bridge clients will be responsible for executing Tableau workloads.

  4. Tableau Cloud users sign in to Tableau Cloud to interact with the Data Connect service.

  5. On setup, Bridge clients initialize a connection with Tableau Cloud using HTTPS. After successful connection, Bridge clients initiate a secure, bidirectional communication to your Tableau Cloud environment using a WebSocket (wss://) connection.

  6. Queries initiated from Tableau Cloud are run against your database to support end user analysis.

Security

See Data Connect Security.

Data Connect components

The primary component of the Data Connect solution is a cluster. The cluster is a Kubernetes cluster that is made up of one or more nodes. Each Kubernetes node hosts at least one container, which in turn, hosts the Bridge client. The Bridge client performs live and extract queries.

A pool is a logical grouping of networking rules that specify which clusters should complete specific queries. In the context of deployment planning, a pool hosts a collection of endpoints (domains or IP addresses) for the purposes of load balancing. Domains include private cloud data, relational data, file data, etc.

To allow a cluster to access and refresh data sources, each pool is assigned to a cluster. To distribute load, you can add multiple pools to a cluster.

Deployment overview

To get started, run a script on each of your Linux servers. This script configures a Tableau-managed Kubernetes cluster in your environment. The Kubernetes cluster is managed by Tableau.

After Kubernetes is configured, you deploy a Docker container to the cluster. Tableau will then deploy and remotely manage the Bridge client within the container. After this configuration with Tableau is established, you will then map connections to your private network data sources.

For more information about deploying Data Connect, download the whitepaper, Accessing Your Private Network Data with Tableau Cloud - Best Practices for Data Connect and Tableau Bridge(Link opens in a new window).

Database connectivity

Queries are managed from the Bridge client in the cluster. Your data is transmitted directly from the Bridge client to Tableau Cloud. Data Connect doesn’t require external network access, firewall holes, or remote machine access.

  1. The Bridge client establishes a persistent connection to the Tableau Cloud Data Connect service using secure WebSockets (wss://). The client then waits for a request from Tableau Cloud.

    • For data sources with live connections or virtual connections, Tableau Cloud initiates a query to the Bridge client.
    • For data sources with extract connection that use refresh schedules, the client receives the refresh schedule request and contacts Tableau Cloud using a secure connection (https://) for the data source (.tds) files.
  2. The Bridge client connects to the private network data using the credentials included in the job request.

  3. The database returns the results of the query.

  4. The Bridge client receives the payload and returns it to the Data Connect service.