About Data Connect
Data Connect allows Tableau Cloud users to access data sources on your private network or cloud service. Data Connect operates as a shared responsibility model. With this model, customers supply the physical or virtual compute resources, and Tableau hosts and manages the Data Connect Kubernetes cluster on those resources.
In your environment, the Data Connect Kubernetes cluster oversees a set of containers. The containers support the runtime environment that consists of one or more agents. The agent is the program that runs tasks and enables secure communication across the firewall within your organisation.
Data Connect services include:
Cluster monitoring and troubleshooting: Tableau monitors the health and usage of the Data Connect agent. Telemetry data are collected to ensure resources are used in the most effective and efficient manner.
Cluster maintenance: Upgrades are automatically deployed and the cluster operation and maintenance is owned and fully performed by Tableau. Data Connect automatically optimises the deployment for your workload based on needs and the available compute pool.
Alert monitoring: Incident management is provided continuously to quickly resolve issues to limit business impact.
Connector support
Data Connect supports the same Connectors that Tableau Bridge for Linux supports. For a full review of connectivity options, please refer to Connectivity with Bridge.
Environment support
Data Connect currently supports on-premise and VCP environments: Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP). Data Connect nodes are compatible with a single Tableau Cloud Site. Nodes must be installed in the same network as the data. Therefore, customers should plan for at least three nodes per private network per Site to maintain availability of the service. Data Connect nodes must be dedicated to Data Connect. You cannot deploy any other containers to the Tableau-owned cluster. And you cannot use an existing cluster for Data Connect.
Architecture
The Data Connect architecture consists of three main components and responsibility boundaries. While there's some overlap, Tableau is primarily responsible for the application and orchestration layers, and customers are responsible for the infrastructure (compute, OS, networking and storage) and where it’s located.
Tableau Cloud → orchestration service
Kubernetes cluster → orchestration service
Kubernetes cluster → container
Tableau user → Tableau Cloud
Data Connect agent (container) → Tableau Cloud
Data Connect agent (container) → customer database
Security
Data Connect components
The primary component of the Data Connect solution is a cluster. The cluster is a Kubernetes cluster that is made up of one or more nodes. Each Kubernetes node hosts at least one container, which in turn, hosts the Data Connect agent. Agents perform live and extract queries.
A pool is a logical grouping of networking rules that specify which clusters should complete specific queries. In the context of deployment planning, a pool hosts a collection of endpoints (domains or IP addresses) for the purposes of load balancing. Domains include private cloud data, relational data, file data, etc.
To allow a cluster to access and refresh data sources, each pool is assigned to a cluster. To distribute load, you can add multiple pools to a cluster.
Deployment overview
To get started, run a script on each of your Linux servers. This script configures a Tableau-managed Kubernetes cluster in your environment. The Kubernetes cluster is managed by Tableau.
After Kubernetes is configured, you deploy a Docker container to the cluster. Tableau will then deploy and remotely manage the Data Connect agent within the container. After this configuration with Tableau has been established, you will then map connections to your private network data sources.
Database connectivity
Queries are managed from the Data Connect agent in the cluster. Your data is transmitted directly from the Data Connect agent to Tableau Cloud. Data Connect doesn’t require external network access, firewall holes or remote machine access.
The agent establishes a persistent connection to the Tableau Cloud Data Connect service using secure WebSockets (wss://). The client then waits for a request from Tableau Cloud.
- For data sources with live connections or virtual connections, Tableau Cloud initiates a query to the Data Connect agent.
- For data sources with extract connections that use refresh schedules, the client receives the refresh schedule request and contacts Tableau Cloud using a secure connection (https://) for the data source (.tds) files.
The agent connects to the private network data using the credentials included in the job request.
The database returns the results of the query.
The Data Connect agent receives the payload and returns it to the Data Connect service.