Cloudera Hadoop
This article describes how to connect Tableau to a Cloudera Hadoop database and set up the data source.
Note: For new connections to Impala databases, use the Impala connector rather than this one. (You can continue using this connector for existing connections.)
Before you begin
Before you begin, gather this connection information:
-
Name of the server that hosts the database you want to connect to and port number
-
Type of database: Hive Server 2 or Impala
-
Authentication method:
-
No Authentication
-
Kerberos
Note: Due to Kerberos Domain Controller (KDC) restrictions, connection with MIT Kerberos is not supported.
-
User Name
-
User Name and Password
-
Microsoft Azure HDInsight Service (starting in version 10.2.1)
-
-
Transport options depend on the authentication method you choose and can include the following:
-
Binary
-
SASL
-
HTTP
-
-
Sign-in credentials depend on the authentication method you choose and can include the following:
-
User name
-
Password
-
Realm
-
Host FQDN
-
Service name
-
HTTP path
-
-
Are you connecting to an SSL server?
-
(Optional) Initial SQL statement to run every time Tableau connects
Driver required
This connector requires a driver to talk to the database. If the driver is not installed on your computer, Tableau displays a message in the connection dialog box with a link to the Driver Download(Link opens in a new window) page where you can find driver links and installation instructions.
Note: Make sure you use the latest available drivers. To get the latest drivers, see Cloudera Hadoop(Link opens in a new window) on the Tableau Driver Download page.
Make the connection and set up the data source
-
Start Tableau and under Connect, select Cloudera Hadoop. For a complete list of data connections, select More under To a Server. Then do the following:
-
Enter the name of the server that hosts the database and the port number to use. If you are connecting using Cloudera Impala, you must use port 21050; this is the default port if you are using the 2.5.x driver (recommended).
-
In the Type drop-down list, select the type of database to connect to. Depending on the version of Hadoop and the drivers you have installed, you can connect to one of the following:
-
Hive Server 2
-
Impala
-
-
In the Authentication drop-down list, select the authentication method to use.
-
Enter the information that you are prompted to provide. The information you are prompted for depends on the authentication method you choose.
-
(Optional) Select Initial SQL to specify a SQL command to run at the beginning of every connection, such as when you open the workbook, refresh an extract, sign in to Tableau Server, or publish to Tableau Server. For more information, see Run Initial SQL.
-
Select Sign In.
Select the Require SSL option when connecting to an SSL server.
If Tableau can't make the connection, verify that your credentials are correct. If you still can't connect, your computer is having trouble locating the server. Contact your network administrator or database administrator.
-
-
On the data source page, do the following:
-
(Optional) Select the default data source name at the top of the page, and then enter a unique data source name for use in Tableau. For example, use a data source naming convention that helps other users of the data source figure out which data source to connect to.
-
From the Schema drop-down list, select the search icon or enter the schema name in the text box and select the search icon, and then select the schema.
-
In the Table text box, select the search icon or enter the table name and select the search icon, and then select the table.
-
Drag the table to the canvas, and then select the sheet tab to start your analysis.
Use custom SQL to connect to a specific query rather than the entire data source. For more information, see Connect to a Custom SQL Query.
Note: This database type supports only equal (=) join operations.
-
Sign in on a Mac
If you use Tableau Desktop on a Mac, when you enter the server name to connect, use a fully qualified domain name, such as mydb.test.ourdomain.lan, instead of a relative domain name, such as mydb or mydb.test.
Alternatively, you can add the domain to the list of Search Domains for the Mac computer so that when you connect, you need to provide only the server name. To update the list of Search Domains, go to System Preferences > Network > Advanced, and then open the DNS tab.
Work with Hadoop Hive data
Work with date/time data
Tableau supports TIMESTAMP and DATE types natively. However, if you store date/time data as a string in Hive, be sure to store it in ISO format (YYYY-MM-DD). You can create a calculated field that uses the DATEPARSE or DATE function to convert a string to a date/time format. Use DATEPARSE() when working with an extract, otherwise use DATE(). For more information, see Date Functions.
For more information about Hive data types, see Dates(Link opens in a new window) on the Apache Hive website.
NULL value returned
A NULL value is returned when you open a workbook in Tableau 9.0.1 and later and 8.3.5 and later 8.3.x releases that was created in an earlier version and has date/time data stored as a string in a format that Hive doesn't support. To resolve this issue, change the field type back to String and create a calculated field using DATEPARSE() or DATE() to convert the date. Use DATEPARSE() when working with an extract, otherwise use the DATE() function.
High latency limitation
Hive is a batch-oriented system and is not yet capable of answering simple queries with very quick turnaround. This limitation can make it difficult to explore a new data set or experiment with calculated fields. Some of the newer SQL-on-Hadoop technologies (for example, Cloudera's Impala and Hortonworks' Stringer project) are designed to address this limitation.
See also
- Set Up Data Sources – Add more data to this data source or prepare your data before you analyze it.
- Build Charts and Analyze Data – Begin your data analysis.