About Data Connect
Data Connect allows Tableau Cloud users to access data sources on your private network or cloud service. Data Connect operates as a shared responsibility model. With this model, customers supply the physical or virtual compute resources, and Tableau hosts and manages the Data Connect Kubernetes cluster on those resources.
In your environment, the Data Connect Kubernetes cluster oversees a set of containers. The containers support the runtime environment that consists of one or more agents. The agent is the program that runs tasks and enables secure communication across the firewall between your organization.
Data Connect services include:
-
Cluster monitoring and troubleshooting: Tableau monitors the health and usage of the Data Connect agent. Telemetry data are collected to ensure resources are used in the most effective and efficient manner.
-
Cluster maintenance: Upgrades are automatically deployed and the cluster operation and maintenance is owned and fully performed by Tableau. Data Connect automatically optimizes the deployment for your workload based on needs and available compute pool.
-
Alert monitoring: Incident management is provided continuously to quickly resolve issues to limit business impact.
Connector support
Data Connect supports the same Connectors that Tableau Bridge for Linux supports. For a full review of connectivity options please refer to Connectivity with Bridge.
Environment support
Data Connect currently supports on-premise and VCP environments: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Data Connect nodes are compatible with a single Tableau Cloud Site. Nodes must be installed in the same network as the data. Therefore customers should plan for at least three nodes per private network per Site to maintain availability of the service. Data Connect nodes must be dedicated to Data Connect. You cannot deploy any other containers to the Tableau-owned cluster. And you cannot use an existing cluster for Data Connect.
Architecture
The Data Connect architecture consists of three main components and responsibility boundaries. While there's some overlap, Tableau is primarily responsible for the application and orchestration layers and customers are responsible for the infrastructure (compute, OS, networking, and storage) and where it’s located.
-
Tableau Cloud → orchestration service
-
Kubernetes cluster → orchestration service
-
Kubernetes cluster → container
-
Tableau user → Tableau Cloud
-
Data Connect agent (container) → Tableau Cloud
-
Data Connect agent (container) → customer database
Security
Data Connect components
The primary component of the Data Connect solution is a cluster. The cluster is a Kubernetes cluster that is made up of one or more nodes. Each Kubernetes node hosts at least one container, which in turn, hosts the Data Connect agent. Agents perform live and extract queries.
A pool is a logical grouping of networking rules that specify which clusters should complete specific queries. In the context of deployment planning, a pool hosts a collection of endpoints (domains or IP addresses) for the purposes of load balancing. Domains include private cloud data, relational data, file data, etc.
To allow a cluster to access and refresh data sources, each pool is assigned to a cluster. To distribute load, you can add multiple pools to a cluster.
Deployment overview
To get started, run a script on each of your Linux servers. This script configures a Tableau-managed Kubernetes cluster in your environment. The Kubernetes cluster is managed by Tableau.
After Kubernetes is configured, you deploy a Docker container to the cluster. Tableau will then deploy and remotely manage the Data Connect agent within the container. After this configuration with Tableau is established, you will then map connections to your private network data sources.
Data query
Queries are managed from the Data Connect agent in the cluster. Your data is transmitted directly from the Data Connect agent to Tableau Cloud. Data Connect doesn’t require external network access, firewall holes, or remote machine access.