Creating and Updating Extracts

The Extract API 2.0 lets you create and populate Tableau extracts (.hyper files). Using the API, you can do the following:

  1. Open an existing .hyper file or create a new one.
  2. In the extract, if you are using a single table, you can open the existing table named Extract or create a new table named Extract. Starting with Tableau 2018.3, you can create extracts that use multiple table storage. Using the API, you can create and open tables using descriptive names.
  3. Add rows to a table.
  4. Close (save) the extract with the new data.

Flow for creating an extract

The general flow for creating a new extract is the following:

  1. Initialize the Extract API. This step is optional. The primary benefit is that calling the initialize method creates an entry in the TableauExtract APIExtract.log log file that lists the process ID.

  2. Initialize a new extract. Call the Extract class constructor and pass the name of the .hyper file to create.

    Note: The folder where the extract will be written must already exist, and the user who is running the code must have write permissions for the folder.

  3. Create a table definition. Call the TableDefinition constructor.

  4. Add column definitions to the table definition. Call the table definition object's addColumn method. For each new column, you specify a name and a type. An important point is that the order in which you define the columns determines an ordinal number that you must know later in order to add rows to the table.

  5. Initialize a new table in the extract. Call the extract object's addTable method. If you are using single table storage, you pass the table name (use the name Extract) in the method call. Starting with Tableau 2018.3, you can create extracts that can contain multiple tables. If you are using multiple table storage, use a unique table name for each table you plan to add, and repeat steps 3, 4, and 5.

    For information about choosing when to use a single table or to use multiple tables, see Decide how the extract data should be stored.

  6. Create a new row, for a single table (or the first table, if you are creating a multiple table extract), specifying the table definition to use. Call the Row class constructor.

  7. Populate each new row. Call a set* method (setInteger, setString, setDate, etc.) for each column value that you want to add. Repeat steps 6 and 7 if you are creating a multiple table extract.

  8. Save the table and extract. When you've finished adding rows and tables, call the extract object's close method. This writes out the extract and performs cleanup.

  9. Release the Extract API. If you explicitly initialized the API in step 1, you can release the API by calling the cleanup method.

 

Example: creating single and multiple table storage extracts (Python)

The following example illustrates the flow for creating extracts. This example shows how to create an extract using either single table storage (named Extract), or multiple table storage (with tables called Products and Suppliers).

#------------------------------------------------------------------------------
#
#   This file is the copyrighted property of Tableau Software and is protected
#   by registered patents and other applicable U.S. and international laws and
#   regulations.
#
#   Unlicensed use of the contents of this file is prohibited. Please refer to
#   the NOTICES.txt file for further details.
#
#------------------------------------------------------------------------------

from __future__ import print_function

import argparse
import sys
import textwrap

from tableausdk import *
from tableausdk.HyperExtract import *

#------------------------------------------------------------------------------
#   Parse Arguments
#------------------------------------------------------------------------------
def parseArguments():
parser = argparse.ArgumentParser( description='A simple demonstration of the Tableau SDK.', formatter_class=argparse.RawTextHelpFormatter )
# (NOTE: '-h' and '--help' are defined by default in ArgumentParser
	parser.add_argument( '-b', '--build', action='store_true', # default=False,
				help=textwrap.dedent('''\
				   If an extract named FILENAME exists in the current directory, extend it with sample data.
				   If no Tableau extract named FILENAME exists in the current directory, create one and populate it with sample data.
				   (default=%(default)s)
				   ''' ) )
	parser.add_argument( '-s', '--spatial', action='store_true', # default=False,
				help=textwrap.dedent('''\
				   Include spatial data when creating a new extract."
				   If an extract is being extended, this argument is ignored."
				   (default=%(default)s)
				    ''' ) )
	parser.add_argument( '-m', '--multitable', action='store_true', # default=False,
				help=textwrap.dedent('''\
				   Create multi-table extract."
				   (default=%(default)s)
				   ''' ) )
	parser.add_argument( '-f', '--filename', action='store', metavar='FILENAME', default='order-py.hyper',
				help=textwrap.dedent('''\
				   FILENAME of the extract to be created or extended.
				   (default='%(default)s')
				   ''' ) )
	return vars( parser.parse_args() )

#------------------------------------------------------------------------------
#   Create Extract
#------------------------------------------------------------------------------
#   (NOTE: This function assumes that the Tableau Extract API is initialized)
def createOrOpenExtract(
	filename,
	useSpatial,
	createMultipleTables
):
	try:
		# Create Extract Object
		# (NOTE: The Extract constructor opens an existing extract with the
		#  given filename if one exists or creates a new extract with the given
		#  filename if one does not)
		extract = Extract( filename )

		# Define `Extract`/`Products` table (If we are creating a new extract)
		tableName = 'Products' if createMultipleTables else 'Extract'
		if ( not extract.hasTable( tableName ) ):
			schema = TableDefinition()
			# NOTE: Collations make string operations very expensive and this may
			# slow down the overall workbook performance, so use them with care.
			schema.setDefaultCollation( Collation.BINARY )
			schema.addColumn( 'Purchased',              Type.DATETIME )
			schema.addColumn( 'Product',                Type.CHAR_STRING )
			schema.addColumn( 'uProduct',               Type.UNICODE_STRING )
			schema.addColumn( 'Price',                  Type.DOUBLE )
			schema.addColumn( 'Quantity',               Type.INTEGER )
			schema.addColumn( 'Taxed',                  Type.BOOLEAN )
			schema.addColumn( 'Expiration Date',        Type.DATE )
			schema.addColumnWithCollation( 'Produkt',   Type.CHAR_STRING, Collation.DE )
			if ( useSpatial ):
				schema.addColumn( 'Destination',        Type.SPATIAL )
			if ( createMultipleTables ):
				schema.addColumn( 'SupplierKey',            Type.INTEGER )
			table = extract.addTable( tableName, schema )
			if ( table == None ):
				print('A fatal error occurred while creating the table:\nExiting now\n.')
				exit( -1 )

		# Define `Suppliers` table (If we are creating a new multi-table extract)
		if ( createMultipleTables and not extract.hasTable( 'Suppliers' ) ):
			schema = TableDefinition()
			schema.addColumn( 'SupplierKey',            Type.INTEGER )
			schema.addColumn( 'Supplier',               Type.CHAR_STRING )
			schema.addColumn( 'Address',                Type.CHAR_STRING )
			table = extract.addTable( 'Suppliers', schema )
			if ( table == None ):
				print('A fatal error occurred while creating the `Suppliers` table:\nExiting now\n.')
				exit( -1 )

	except TableauException as e:
		print('A fatal error occurred while creating the new extract:\n', e, '\nExiting now.')
		exit( -1 )

	return extract

#------------------------------------------------------------------------------
#   Populate Extract
#------------------------------------------------------------------------------
#   (NOTE: This function assumes that the Tableau SDK Extract API is initialized)
def populateExtract(
	extract,
	useSpatial,
	createMultipleTables
):
    try:
		# Populate `Extract`/`Products` table

		# Get Schema
		tableName = 'Products' if createMultipleTables else 'Extract'
		table = extract.openTable( tableName )
		schema = table.getTableDefinition()

		# Insert Data
		row = Row( schema )
		row.setDateTime( 0, 2012, 7, 3, 11, 40, 12, 4550 )  # Purchased
		row.setCharString( 1, 'Beans' )                     # Product
		row.setString( 2, u'uniBeans'    )                  # Unicode Product
		row.setDouble( 3, 1.08 )                            # Price
		row.setDate( 6, 2029, 1, 1 )                        # Expiration Date
		row.setCharString( 7, 'Bohnen' )                    # Produkt
		if ( useSpatial ):
			row.setSpatial( 8, "POINT (30 10)" )   # Destination
		# in python2: use `xrange` instead of `range` here to reduce memory consumption
		for i in range( 10 ):
			row.setInteger( 4, i * 10 )                     # Quantity
			row.setBoolean( 5, i % 2 == 1 )                 # Taxed
			if ( createMultipleTables ):
				row.setInteger( 9 if useSpatial else 8, i % 3 )         # SupplierKey
			table.insert( row )

		# Populate `Suppliers` table
		if createMultipleTables:
			# Get Schema
			table = extract.openTable( 'Suppliers' )
			schema = table.getTableDefinition()

			# Insert Data
			row = Row( schema )
			row.setCharString( 1, 'Bean Supplier' )             # Supplier
			row.setCharString( 2, '42 Bean Street, Beantown' )  # Address
			# in python2: use `xrange` instead of `range` here to reduce memory consumption
			for i in range( 3 ):
			row.setInteger( 0, i  )                         # SupplierKey
			table.insert( row )

			except TableauException as e:
			print('A fatal error occurred while populating the extract:\n', e, '\nExiting now.')
			exit( -1 )

#------------------------------------------------------------------------------
#   Main
#------------------------------------------------------------------------------
def main():
	# Parse Arguments
	options = parseArguments()

	# Extract API Demo
	if ( options[ 'build' ] ):
		# Initialize the Tableau Extract API
		ExtractAPI.initialize()

		# Create or Expand the Extract
		extract = createOrOpenExtract( options[ 'filename' ], options[ 'spatial' ], options[ 'multitable' ] )
		populateExtract( extract, options[ 'spatial' ], options[ 'multitable' ] )

		# Flush the Extract to Disk
		extract.close()

		# Close the Tableau Extract API
		ExtractAPI.cleanup()

	return 0

if __name__ == "__main__":
	retval = main()
	sys.exit( retval )


		
		

Flow for updating an existing extract

The flow for updating an existing table in an extract is similar to the flow for creating a new extract. These are the steps:

  1. Initialize the API. This is optional.

  2. Open the extract file. Call the Extract class's constructor.

    Note: The user who is running the code must have write permissions for the folder where the .hyper file is located.

  3. Open an existing table. Call the extract object's openTable method. If you are working with an extract that uses single table storage, you can open the table named Extract. If you are updating an extract for Tableau 2018.3 or later that is using multiple table storage, open the name of the table you want to update. In all cases, it is a good practice to check first whether the table exists, by calling the extract object's hasTable method.

  4. Get the table definition. Call the table object's getTableDefinition method.

  5. Create new rows and add them to the existing table. You need the table definition from the preceding step in order to create a new row.

  6. Save changes. Call the extract object's close method.

  7. Release the Extract API. You need to do this only if you initialized the API in step 1.

More examples

For more examples of how to work with the Extract API, see Extract API 2.0 Samples.


Thanks for your feedback! There was an error submitting your feedback. Try again or send us a message.