Update Data in Published Live-to-Hyper Data Sources

You can modify (insert, update, upsert, replace, and delete) data in a published data source that has a live-to-Hyper connection. The origin of data for a live-to-Hyper connection is a Hyper or Parquet file/database. Incremental update is not currently supported for other forms of extracts. To update a published data source, use the Update Data in Hyper Data Source method, or for data sources with multiple connections, or the Update Data in Hyper Connection method.

Working with Parquet files
- When publishing a Parquet file to a data source through a live-to-Hyper connection, the files are converted to Hyper files. You may want to publish a copy of your original Parquet file to retain that data in its native format.
- A data source with a live-to-Hyper connection to data in a Hyper file converted from Parquet can only have a single connection.
- When incrementally updating that data from that connection, using the Update Hyper in Data Source method, the Parquet file is not converted to Hyper.

In brief, to make an incremental update to your published live-to-Hyper data, you upload current data from a specified source (payload), and then describe the actions you want to perform to update your published hyper data source based on that payload. Actions can impact fields, rows, columns, and tables based on conditions you describe, such as date range, field value, and row, column, or table name.

Being able to do incremental updates can help you manage data published to Tableau Server or Tableau Cloud more efficiently. They can be particularly helpful in scenarios where data changes frequently. Incremental update can enable you to:

  • Bring in bigger data over time
  • Achieve analytics on fresher data due to more frequent updates
  • Reduce the time spent on backgrounder jobs
  • Connect to new data sources that require update efficiency, for example, streaming systems, time-series databases, and change-data-capture databases
  • Use rolling windows by deleting the data that falls outside of the window
  • Efficiently update data and permissions efficiently and separately to support row-level security scenarios.

Upload Fresh Data to a Hyper Payload File

The payload file is a Hyper file containing data freshly uploaded data from your source database. Upload the payload files first using the file upload API. We recommend using the Hyper API(Link opens in a new window) to create the payload file. For ad-hoc experiments, you might also use files created through other means, such as Tableau Prep. To upload the payload to your server:

  • Create an "upload session" by using the Initiate File Upload endpoint.
  • Make repeated calls to Append to File Upload until the complete file is uploaded.
  • The returned upload-id will be used next with the Update Data Source endpoint which will consume the file.

Each uploaded payload can be consumed by a single Update Data request. Trying to reuse the same upload-id with multiple Update Data jobs will result in an error.

Note: While Hyper Update REST API methods make incremental changes much more efficient, multiple simultaneous update requests and large changes in large datasets can overwhelm Tableau server background processes.

To mitigate this issue, Hyper Update requests are limited by the server configuration setting api.server.update_uploaded_file.max_size_in_mb. Starting in Tableau Server 2022.1 the default for this setting is 100MB. In previous versions, the Tableau Server default limit was 10MB. In Tableau Cloud March 2022 and later, the default limit is fixed at 100MB.

Where heavy incremental update traffic is expected, we recommend you use a chunking strategy to limit the size of each update, and fine tune max_size_in_mb value to optimize performance for your exact conditions.

Update the Data from Your Payload to Your Published Hyper File

There are two endpoints to choose from, depending on how your data source connections are structured. All other aspects of the endpoints are the same, including request and response bodies, and permissions.

URI

For a data source with a single connection, use the PATCH method of the Update Data in Hyper Data Source endpoint.

PATCH /api/api-version/sites/site-id/datasources/datasource-id/data?uploadSessionId=upload-session-id

A data source can contain multiple connections and federate them. So, in cases where there is more than one connection, the PATCH method of the Update Data in Hyper Connection endpoint adds an additional connection-id path parameter, used to specify which Hyper file to modify:

PATCH /api/api-version/sites/site-id/datasources/datasource-id/connections/connection-id/data?uploadSessionId=upload-session-id

You also can use Update Data in Hyper Connection for single connection data sources where the connection-id is already known to you.

Request Header

For this PATCH request, the following fields must be specified in the request header:

  • x-tableau-auth: An authorization token, as described in Signing In and Signing Out (Authentication).
  • content-type: This must be application/json, because the REST API endpoint only accepts JSON.
  • RequestID: A user-generated identifier that uniquely identifies a request. If the server receives more than one request with the same ID within 24 hours, all subsequent requests will be treated as duplicates and ignored by the server. This can be used to guarantee idempotency of requests. In other words, it prevents the same job from being run more than once in case of events like network anomalies or client crashes.

Request body

The request body contains an action batch description of the operations to be performed on the target data. See the following sections for details and examples.

Response body

The response body returns a job-id for the async job that will carry out the data operations described in the action batch. Use the job-id to query the status of a job for a data update, or to cancel the scheduled job for an update.

The changes are executed asynchronously by an update job on a backgrounder.

Action batch descriptions

Action batch descriptions are the portion of a request body that are a JSON-encoded description of the operations to be performed on the Hyper file currently published on the server. Those actions cover INSERTs, UPDATEs and DELETEs, but not READ/SELECT. The action batch can reference the uploaded Hyper file as a source for the actions.

The actions inside an action batch are executed sequentially from first to last. Each action batch is executed atomically. This means that if any of the operations inside an action batch fails, all of the actions from the update batch will be discarded.

The top-level structure of this JSON request is illustrated by this example:

{
  "actions": [
    <action1>,
    <action2>,
    ...
    ]
}

At the top-level, there is only one parameter.

  • actions: (array, required): a non-empty list of actions to be performed. The format of the individual actions is described below.

At the next level, there are several parameters.

  • action (string; required): The type of action: insert, update, delete, replace, or upsert.
  • target-table (string; required): The table name inside the target database in which we will modify data.
  • target-schema (string; required): The name of a schema inside the target Hyper file published on the server. Where there is a single schema, no value is needed and the request will default to the name of that single schema. Where the target Hyper has multiple schemas, the name of the schema you want to use is required.
  • source-table (string; required): The table name inside the source database in which the data will be modified.
  • source-schema (string; required): The name of a schema inside the freshly uploaded source Hyper payload. Where there is a single schema, no value is needed, and the request will default to the name of that single schema. Where the target Hyper has multiple schemas, the name of the schema you want to use is required.
  • condition (condition-specification; required): specifies the condition used to select the columns to be modified. This is only used with the update, delete, and upsert actions.

By default, Hyper files created by Tableau use "Extract" for both schema and table name. For simplicity, the samples in this document do not have schemas. The source and target columns will be mapped to each other based on their names. Column names need to match exactly, including whitespace and case sensitivity.

Insert action

The "insert" action appends one or more rows from a table inside the uploaded Hyper file into the updated Hyper file on the server. For example:

{"action": "insert",
	"source-table": "added_users",
	"target-table": "current_users"
}

Update action

The "update" action updates existing tuples inside the target table. It uses a condition to decide which rows to update. For example:

{"action": "update",
	"target-table": "my_data",
	"source-table": "uploaded_table",
	"condition": {"op": "eq", "target-col": "row_id", "source-col": "update_row_id"}
}

Delete action

The "delete" action deletes tuples from the target table. It uses its condition to determine which rows to delete. It is an error if the source table contains any additional columns not referenced by the condition.

Example 1

{"action": "delete",
	"target-table": "my_extract_table",
	"condition": {
	"op": "lt",
	"target-col": "col1",
	"const": {"type": "datetime", "v": "2020-06-00T00:00:00.00Z"}}
}

Example 2

{"action": "delete",
	"target-table": "my_extract_table",
	"source-table": "deleted_row_id_table",
	"condition": {"op": "gt", "target-col": "id", "source-col": "deleted_id"}
}

Replace action

The "replace" action is syntactic sugar for a delete action removing all tuples from an existing table followed by an insert which copies all data from an uploaded table into an existing table.

The schema of the existing and the replacement table must match exactly. Columns are matched by name, column order is irrelevant. All matched column names and their types (ignoring NOT NULL modifiers) must be identical.

Upsert action

The "upsert" action updates a tuple inside the target table, if such a tuple exists. If no such tuple exists, a new tuple will be inserted. It uses a condition to decide which tuples to update. For example:

{"action": "upsert",
	"target-table": "my_data",
	"source-table": "uploaded_table",
	"condition": {"op": "gt", "target-col": "row_id", "source-col": "update_row_id"}
}

Conditions

Some actions, such as update, delete, and upsert require you to specify a condition based on which the affected rows will be selected. A condition can be:

  • a top-level constant true
  • a condition consists of one or multiple "predicate" statements combined with the logical operators OR and AND. OR and AND can be nested.
  • a "predicate statement using one of the following operators that is a comparison of one column from the source table against one column from the target table or a comparison of a column from the source or target table against a constant. The comparison operators are:
    • eq: equal to
    • neq: not equal to
    • gt: greater than
    • lt: less than
    • gte: greater than or equal to
    • lte: less than or equal to
    • is: a comparison for NULL handling
    • has: a comparison operator that checks whether a specified string is contained in a column - for example, the following action would delete all rows from a table in Hyper where the name field contains the string “ad”:

      {
        "action": "delete",
        "target-table": "users",
        "condition": {"op": "has", "source-col": "name", "const": {"type": "string", "v": "ad"}}
      }

Constants

Conditions either apply to two columns or a single column and a constant. A constant is a combination of two fields: type and v. The following types are supported: boolean, integer, double, string, and datetime.

Boolean specifies either “true” or “false”. It is not case sensitive.

Datetime. A date-time with a time-zone in the ISO-8601 calendar system, such as 2007-12-03T10:15:30Z (for UTC time) or 2007-12-03T10:15:30+0100 (for a timezone UTC+1:00).

Conditions Examples:

# Conditions for: target_table.date1 is less than 2018-12-06T19:21:12.12Z (the datetime is in ISO 8601 format)
{"op": "lt", "target-col": "date1", "const": {"type": "datetime", "v": "2018-12-06T19:21:12.12Z"}}

# Conditions for: source_table.id = target_table.id
{"op": "eq", "source-col": "id", "target-col": "id"}

# Conditions for: source_table.key_part1 = target_table.key_part1 and
# source_table.key_part2 = target_table.key_part2
{"op": "and",
"args":
	[{"op": "eq", "source-col": "key_part_1", "target-col": "key_part_1"},
	{"op": "eq", "source-col": "key_part_2", "target-col": "key_part_2"}]
}

# Conditions for: target_table.my_column IS NOT DISTINCT FROM NULL
# (where 'null' is a string value). type can be any type
{"op": "is", "target-col": "my_column", "const": {"type": "string", "v": null}}

Action Examples

Single-table append

A single-table append can be expressed as:

[
{"action": "insert",
	"source-schema": "extract",
	"source-table": "new_data",
	"target-schema": "Extract",
	"target-table": "Extract"}
]

The user supplies a Hyper file containing a table named "new_data", which has a compatible schema for insertion into the "Extract" table inside the existing "Extract".

Multi-table append

A multi-table append can be expressed as:

[
{"action": "insert",
	"source-schema": "extract",
	"source-table": "new_customers",
	"target-schema": "extract",
	"target-table": "customers"}
{"action": "insert",
	"source-schema": "extract",
	"source-table": "new_items",
	"target-schema": "extract",
	"target-table": "items"}
]

The user supplies a Hyper file containing two tables, the "new_customers" and the "new_items" table, both having schemas compatible with the "customers" and "items" tables inside the existing data source.

Incremental refresh on star-schema databases

In a star-schema, there is usually one large fact table and multiple smaller dimension tables. The fact tables are related to the dimension table based on a key-foreign key table. One common example is, for example, a schema with an "ad_views" table (fact table), a "marketing_campaigns" table (dimension table), and an "ad_customer" table.

In such cases, the fact table usually has an incremental key available. This key is usually a date, for example, the "ad_view_date", the "creation_date" or an incremental id. This allows newly added rows to be fetched for the fact table. The dimension tables, however, usually don’t provide such an incremental key, and hence it is not possible to determine which rows were added since the last refresh. As such, dimension tables usually require a full refresh. You can express this scenario using

[
{"action": "replace", "source-table": "campaigns", "target-table": "marketing_campaigns"},
{"action": "replace", "source-table": "ad_customers", "target-table": "ad_customers"},
{"action": "insert", "source-table": "new_ad_views", "target-table": "ad_views"},
]

and providing a Hyper file inside which

  • the "campaigns" table in the uploaded Hyper file contains a complete copy of the new list of campaigns. This list will completely replace the list inside the existing Hyper file.
  • same for the "ad_customer" table
  • the "new_ad_views" table contains only the new ad_views since the last refresh. Those ad views will be appended to the existing entries inside the existing Hyper file.

Incremental refresh on snowflake-schemas and multi-fact table schemas

In Snowflake schemas and multi-fact table schemas, there are usually multiple tables with incremental keys, which can be used to determine the set of new rows created since the last extract refresh. For example, assume that the "ad_customer" table also contains a "creation_date" column. In this case, your client-side logic can create incremental change sets not only for one, but for multiple tables. In such cases, you can use the following to add tuples to both the "ad_customers" and the "ad_views" table, while still doing a full refresh for the "marketing_campaigns" table.

[
{"action": "replace", "source-table": "campaigns", "target-table": "marketing_campaigns"},
{"action": "insert", "source-table": "new_ad_customers", "target-table": "ad_customers"},
{"action": "insert", "source-table": "new_ad_views", "target-table": "ad_views"},
]

Windowed extracts

Windowed extracts and rolling-window extracts usually contain records for recent events, but exclude historical data beyond a certain cutoff. For example, "extract all sales data for the last 3 months". A single-table windowed extract can be expressed as follows where you chose the deletion condition in accordance with the cutoff-criterium (in this example: remove all rows where the column "creation_date" contains a date before June 2020.)

[
{"action": "delete",
	"target-table": "extract",
	"condition": {
	"op": "lt",
		"target-col": "creation_date",
		"const": {"type": "datetime", "v": "2018-12-06T19:21:12.12Z"}}},
{"action": "insert", "source-table": "extract_increment", "target-table": "extract"}
]

You upload a file which contains all newly added rows as a payload Hyper file.

While the example above only shows how to achieve a rolling-window extract on a single file, this mechanism can be extended to star-schema, snowflake-schema, and multi-fact table Hyper files by specifying additional actions updating additional tables contained within the Hyper file.

Applying a change-data-capture retrieved from another database system

Change-Data-Capture (CDC) solutions provide technology to track changes to data contained in common database systems. CDC solutions usually capture their data in form of 3 types of tuples:

  1. one set of newly created tuples.
  2. one set of updated tuples, identified by some tuple key, together with the set of changed attributes and the new attribute values.
  3. one set of deleted tuples, identified by the tuple key

This functionality can be achieved with the following actions.

[
{"action": "insert",
	"source-table": "new_tuples",
	"target-table": "extract"},
{"action": "update",
	"source-table": "modified_tuples",
	"target-table": "extract",
	"condition": {"op": "eq", "target-col": "user_id", "source-col": "user_id"}},
{"action": "delete",
	"source-table": "deleted_tuples",
	"target-table": "extract",
	"condition": {"op": "eq", "target-col": "user_id", "source-col": "user_id"}}
]

In this case, the client

  • specifies the necessary condition to associate existing tuples with the corresponding tuples within the existing Hyper file as part of the "action description". In this case, "user_id" is the primary key and hence tuples are identified based on the equality of the "user_id" column.
  • uploads a Hyper file containing these tables:
    • a table "new_tuples" which contains all newly created tuples. Those will just be appended to the tuples already contained in the existing extract.
    • a table "updated_tuples". This table should contain the tuple key (in this example: "user_id") to determine the modified row, and should contain the new values for the columns of each modified tuple.
    • a table "deleted_tuples": This table contains the "user_id" of all deleted tuples.

    The example above assumes that the key consists of a single column "user_id". In some cases, keys consist of multiple columns. To cover this use case, conditions consisting of multiple ANDed predicates are supported. While the example above only covers a single-table scenario, the same mechanism can be used to also update tuples in multiple tables.


Thanks for your feedback!Your feedback has been successfully submitted. Thank you!