Spark SQL

This article describes how to connect Tableau to a Spark SQL database and set up the data source. Tableau can connect to Spark version 1.2.1 and later.

You can use the Spark SQL connector to connect to a Spark cluster on Azure HDInsight, Azure Data Lake, Databricks or Apache Spark.

Before you begin

Before you begin, gather this connection information:

  • Name of the server that hosts the database you want to connect to and port number
  • Authentication method:
    • No Authentication
    • Kerberos
    • Username
    • Username and Password
    • Microsoft Azure HDInsight Service
  • Transport. Your choices depend on the authentication method you choose, and include the following:
    • Binary
    • SASL
    • HTTP
  • Sign-in credentials. Your choices depend on the authentication method you choose, and can include the following:
    • Username
    • Password
    • Realm
    • Host FQDN
    • Service Name
    • HTTP Path
  • Are you connecting to an SSL server?
  • (Optional) Initial SQL statement to run every time Tableau connects

Driver required

This connector requires a driver to talk to the database. If the driver is not installed on your computer, Tableau displays a message in the connection dialog box with a link to the Driver Download(Link opens in a new window) page where you can find driver links and installation instructions.

Make the connection and set up the data source

  1. Start Tableau and under Connect, select Spark SQL. For a complete list of data connections, select More under To a Server. Then do the following:

    1. Enter the name of the server that hosts the database and the port number to use.
    2. Connect to the database using SparkThriftServer. Note that the legacy SharkServer and SharkServer2 connections are provided for your use, but are not supported by Tableau.
    3. Select the Authentication method. Then, based on your selection, enter the information you are prompted for.
    4. Select Sign In.
    • If the server is password protected, and you are not in a Kerberos environment, you must enter the username and password.
    • Tick the Require SSL box when connecting to an SSL server.
    • (Optional) Select Initial SQL to specify a SQL command to run at the beginning of every connection, such as when you open the workbook, refresh an extract, sign in to Tableau Server or publish to Tableau Server. For more information, see Run Initial SQL.
  2. If Tableau can't make the connection, verify that your credentials are correct. If you still can't connect, your computer is having trouble locating the server. Contact your network administrator or database administrator for more assistance.

  3. On the data source page, do the following: 

    1. (Optional) Select the default data source name at the top of the page, and then enter a unique data source name for use in Tableau. For example, use a data source naming convention that helps other users of the data source figure out which data source to connect to.

    2. From the Schema drop-down list, select the search icon or enter the schema name in the text box and select the search icon, and then select the schema.
    3. In the Table text box, select the search icon or enter the table name and select the search icon, drag the table to the canvas and then select the sheet tab to start your analysis.

      Use custom SQL to connect to a specific query rather than the entire data source. For more information, see Connect to a Custom SQL Query.

Note: Starting in 2018.3, Kerberos authentication for Spark SQL supports delegation. In 2018.2 and earlier, delegation isn't supported, which means that in earlier versions you can't use Viewer credentials as the Authentication method when you publish a workbook or data source to Tableau Server; you can only use Server Run As account.

Sign in on a Mac

If you use Tableau Desktop on a Mac, when you enter the server name to connect, use a fully qualified domain name, such as mydb.test.ourdomain.lan, instead of a relative domain name, such as mydb or mydb.test.

Alternatively, you can add the domain to the list of Search Domains for the Mac computer so that when you connect, you need to provide only the server name. To update the list of Search Domains, go to System Preferences > Network > Advanced, and then open the DNS tab.

 

See also

Thanks for your feedback!Your feedback has been successfully submitted. Thank you!