Predictive Modeling with Generated Marks
If you’ve been using Tableau for a while, you may have heard of the phrase “data densification.” This refers to a process in which marks are generated by Tableau and added to the view, even though those marks aren’t supported by records in the underlying data source. This might be done to extend a date axis, or if you’re working with predictive modeling functions, to show predictions.
Learn more: See this blog post on data densification from Data Plus Science.
Calculate predictions on missing values
For example, you might want to add predictions for future dates. By default, missing values in Tableau aren’t shown, but you can generate these marks as follows:
-
Right-click (control-click on Mac) the date or bin header.
-
Select Show Missing Values.
But this isn’t enough to let you make predictions on those generated marks. If you tried to perform a calculation on them (whether that’s a prediction calculation or not), Tableau would return null values. This is expected, since those marks are based on missing values that don’t exist.
To make predictions on those missing values, open the Analysis menu at the top, and then select Infer Properties from Missing Values.
Example of predictions on generated marks
Now let’s explore this behavior further. We’ll compare three different illustrations showing how the Show Missing Values and Infer Properties from Missing Values settings can affect your visualization, depending on whether one or both are turned on or off. To follow along, download the following workbook from Tableau Public: Predictions on Missing Values.
We’ve included predictions using ATTR(DAY([Order Date])) as a predictor. This isn’t the best predictor for the data (and yields inadequate predictions), but for the purposes of this article, it’s a good illustration of Infer Properties from Missing Values.
Each viz includes the same four measures on the Rows shelf, as outlined below:
- Row 1:
SUM([Profit])
- Row 2:
RUNNING_SUM(SUM([Profit]))
- Row 3:
ATTR(DAY([Order Date]))
- Row 4:
MODEL_QUANTILE(0.5, SUM([Profit]),ATTR(DAY([Order Date])))
Illustration 1
In the above image, both Show Missing Values and Infer Properties from Missing Values are turned off, which are the default settings in Tableau.
You would see the same viz if Infer Properties from Missing Values was turned on and Show Missing Values was turned off. This is because Infer Properties from Missing Values depends on Show Missing Values being turned on.
Illustration 2
In the above image, Show Missing Values is turned on and Infer Properties from Missing Values is turned off. The default setting is that Infer Properties from Missing Values is turned off, even when Show Missing Values is turned on.
Note that in this situation, we do not compute a value for ATTR on DAY([Order Date]) for the missing values (Row 3). We do generate a prediction for the densified dates, but they are identical for all missing dates, since we’re not able to infer the actual ATTR(DAY([Order Date])) as shown in Row 3. Effectively, those marks are being computed as if DAY([Order Date]) is null.
Illustration 3
In this image, both Show Missing Values and Infer Properties from Missing Values are turned on, illustrating the Infer Properties from Missing Values setting in action.
As you can see, since we’re able to infer the ATTR(DAY([Order Date])) (Row 3), we’re able to use it in predictions in Row 4, returning a nice smooth curve of predictions.