WDC Incremental Refresh


Tableau uses web data connectors to fetch data and store that data in an extract. You can always refresh the entire extract. However, if you implement incremental refresh, you can also fetch only the new data for the extract, which can greatly reduce the time required to download the data.

It’s possible to enable incremental refresh functionality for any table that is brought back by the web data connector.

To enable incremental refresh functionality on a table, you must set the tableInfo.incrementColumnId property on the tableInfo object for that table as defined in your getSchema function. The incrementColumnId property should be set to the ID of the column that will be used as the key for the incremental refresh.

For example, suppose you had a table with an ID field. For every new record in the table, the ID is incremented by 1, and no previous data is ever deleted or overwritten. In that scenario, you would want the ID column to be referenced by incrementColumnID. That way, when gathering data, you can fetch only the records that have an ID that’s larger than the largest ID you’ve fetched during the last gather data phase.

For example, here’s the getSchema method of the IncrementalRefreshConnector dev sample:

myConnector.getSchema = function(schemaCallback) {
    var cols = [
        { id: "id", dataType: tableau.dataTypeEnum.int },
        { id: "x", dataType: tableau.dataTypeEnum.string },
        { id: "day", dataType: tableau.dataTypeEnum.datetime },
        { id: "day_and_time", dataType: tableau.dataTypeEnum.datetime },
        { id: "true_or_false",  dataType: tableau.dataTypeEnum.bool  },
        { id: "color", dataType: tableau.dataTypeEnum.string }
    ];

    var tableInfo = {
        alias: "Incremental Refresh Connector",
        id: "mainTable",
        columns: cols,
        incrementColumnId: "id"
    };

    schemaCallback([tableInfo]);
};

When Tableau calls the getData method of the connector, it passes in a table object. If an incremental refresh is being requested by the end user in Tableau, and if the tableInfo.incrementColumnId was set during the getSchema function for that table, then the table object will contain a value in the table.incrementValue property. This value will contain the current largest value from the increment column.

For example, this is how this property is used in the IncrementalRefreshConnector dev sample:

myConnector.getData = function(table, doneCallback) {
    var lastId = parseInt(table.incrementValue || -1);

    // Gather only the most recent data with an ID greater than 'lastId'
    // ......

    table.appendRows(dataArray);
    doneCallback();
};

The WDC API supports three data types for the incremental refresh column: DateTime, Date, and integer. For incremental refresh, you typically use a field that represents a date, a timestamp, or a row number.