Analysis with the Second Date in Tableau Desktop

This is the second stage of the tutorial and assumes the first stage, Finding the Second Date with Tableau Prep, has been completed.

In the first stage, we took our original data set and shaped it to answer the following questions:

  1. What was the length of time in days between the first and second infraction for each driver?

  2. Compare the fine amounts for the first and second infractions. Are they correlated?

  3. Which driver paid the most overall? Who paid the least?

  4. How many drivers had multiple infraction types?

  5. What was the average fine amount for drivers who never attended traffic school?

As we now explore these questions, it becomes clear that there are some pros and cons to the first data structure we created. We'll go back into Tableau Prep Builder and do some additional reshaping, then see how that impacts the same analysis in Tableau Desktop. Finally, we'll look at a Tableau Desktop-only approach to the analysis using Level of Detail (LOD) expressions with the original data.

The goal of this tutorial is to present various concepts in the context of a real-life scenario and work through options—not prescriptively establishing which is best. At the end, you should have a better sense of how data structure impacts calculations and analysis, as well as greater familiarity with various aspects of Tableau Prep and calculations in Tableau Desktop.

Note: To complete the tasks in this tutorial, you need Tableau Prep Builder and optionally Tableau Desktop installed and the data downloaded.

To install Tableau Prep and Tableau Desktop before continuing with this tutorial, see the Tableau Desktop and Tableau Prep Deployment guide(Link opens in a new window). Otherwise you can download the Tableau Prep(Link opens in a new window) and Tableau Desktop(Link opens in a new window) free trials.

The data set is the output from Driver Infractions.tflx, as built in the first stage.

Analysis in Tableau Desktop

Now that we have our data configured, we'll bring it into Tableau Desktop. We can easily answer some questions, but others involve a few (or a lot of) calculations. Try your hand at the questions below; you can expand each one for basic information about how to proceed if you get stuck.

Note: You can download the workbook Driver Infractions.twbx(Link opens in a new window) to look at the solutions in context. Remember that there may be alternative ways to interpret the analysis or pursue answers.

1. What was the length of time in days between the first and second infraction for each driver?
  1. To answer this question in Tableau Desktop, we'll use the DATEDIFF function. This function takes three arguments—the date part, the start date, and the end date. Since we want to know the days between these events, we'll use the date part 'day'. Our start and end dates are in the data set as 1st Infraction Date and 2nd Infraction Date.

  2. The calculation is:

    Time Between Infractions = DATEDIFF('day', [1st Infraction Date], [2nd Infraction Date])

  3. We can plot that against Driver ID as a bar chart. Note that seven drivers had no second infraction, so there are seven nulls.

    Bar chart of drivers by time between infractions

2. Compare the fine amounts for the first and second infractions. Are they correlated?
  1. To answer this question in Tableau Desktop, we'll create a scatter plot of 1st Fine Amount and 2nd Fine Amount. By bringing Driver ID to the Detail shelf on the Marks card, we can create a mark for each driver.

  2. To add a trend line, use the Analytics tab in the left-hand pane and bring out a linear trend line. Hovering over the trend line, we can see the R-squared value is practically zero, and the p-value is far above any cutoff for significance. We can determine that there is no correlation between first and second fine amount.

    If we were to use this scatter plot in a dashboard, the trend line should be removed.

Scatterplot of 1st and 2nd fine amounts by Driver ID, with the trend line tooltip showing

3. Which driver paid the most overall? Who paid the least?

When we want to go deeper in our analysis, we may need to create some calculations.

  1. To answer this in Tableau Desktop, we need to add the fines for both infractions into a single field. Because some drivers may not have had a second infraction, we need to use the zero null ZN function to turn any nulls for 2nd Fine Amount into zeros. Failing to do this will result in nulls if there isn't a second fine.

  2. The calculation is:

    Total Amount Paid = [1st Fine Amount] + ZN([2nd Fine Amount])

  3. We can plot Total Amount Paid against Driver ID and sort the bar chart.

Bar chart of Amount Paid by Driver ID, with the calculation editor showing for Amount Paid

4. How many drivers had multiple infraction types?
  1. To answer this in Tableau Desktop, we need to do a fancier IF calculation, comparing if the first and second infraction types are the same. If they are, we want to assign the value "1". If they are not the same, we'll assign "2". Since we only care about multiple infraction types, any other result, such as a null second infraction type, will be assigned "1".

  2. The calculation is:

    Number of Infraction Types =

    IF [1st Infraction Type]=[2nd Infraction Type] THEN 1 
    ELSEIF [1st Infraction Type]!= [2nd Infraction Type] THEN 2 
    ELSE 1 END
  3. We can then plot Number of Infraction Types against Driver ID and sort the bar chart.

Bar chart of Number of Infraction Types by Driver ID, with the calculation editor showing for Number of Infraction Types

5. What was the average fine amount for drivers who never attended traffic school?
  1. To answer this in Tableau Desktop, we cannot simply divide the total fine amount by two, since some drivers only had one infraction. We also can't calculate the average fine per driver and take the average of those values, because averaging averages can lead to inconsistencies. Instead, we need to calculate the total amount paid by drivers who never attended traffic school, then divide by the total number of infractions associated with those fines.

    1. First, we need to determine if each driver had a second infraction. We can leverage the fact the information in all the "2nd" fields will be null if there was no second infraction and start building the calculation:

      IFNULL([2nd Infraction Type], 'no')

      This will return an infraction type if it exists, or "no" if there was no second infraction.

    2. Next, we need to turn this information into the number of infractions, 1 or 2. If the result of our IFNULL calculation is "no", then the driver should be marked as having one fine. Any other result should be marked as having two fines. The calculation is:

      Number of Infractions =

      IF IFNULL([2nd Infraction Type], 'no') = 'no' THEN 1 
      ELSE 2 
      END
    3. Now we need to consider the total fine amount. Similarly to question 3 above, we'll add the first and second fine amounts, with a ZN function around the second. However, because we want this to be computed at the level of the entire data set, it's a best practice to specify the aggregations, SUM, in the calculation itself. The calculation is: 

      SUM([1st Fine Amount]) + SUM( ZN([2nd Fine Amount]) )

    4. To bring it all together, we'll take this total fine amount and divide it by our new Number of Infractions calculated field to determine the average fine amount:

      Average Fine = ( SUM([1st Fine Amount]) + SUM( ZN([2nd Fine Amount]) ) ) / SUM([Number of Infractions])

  2. We also need to filter out drivers who ever attended traffic school—but that information is also stored across two fields.

    1. Tableau is very efficient at numerical calculations. We'll phrase this with numbers to help performance as much as we can. To combine these two fields, we'll create a calculation for each one that says "Yes = 1" and "No = 0" (null should also = 0, for drivers with no second infraction). By summing the outcome of these calculations, any driver with an overall value of 0 never went to traffic school (and a value of 1 or 2 represents how many times they went). We can then filter to keep only drivers with a value of 0.

    2. This time, we'll use a CASE statement instead of IF. They function very similarly but have different syntax. The start of the calculation should look like this:

      CASE [1st Traffic School]
      WHEN 'Yes' THEN 1
      WHEN 'No' THEN 
      ELSE 0
      END
    3. And then we'll do the same thing for 2nd Traffic School. We can add both pieces in the same calculation by wrapping each case statement in parentheses and adding a plus between them. Removing some of the line breaks, it looks like this:

      Number of Traffic School Attendances =

      (CASE [1st Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END)
      +
      (CASE [2nd Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END)
      
    4. If we drag Number of Traffic School Attendances to the Dimensions area of the Data pane (above the line), the values 0–2 will become discrete.

    5. Now if we filter on Number of Traffic School Attendances, we can select just the 0 and know we're getting drivers who have never attended traffic school.

  3. To answer the original question, we'll simply bring Average Fine to the Textshelf on the Marks card.

    Because we built the aggregations into the calculation, the aggregation on the pill will be AGG and we cannot change it. This is as expected.

Single number view of the average amount paid with the calculation editor open to show Number of Traffic School Attendances

Go Further—Pivoted Data

While the data we've been working with is well structured to address questions specifically around first and second infractions, it isn't the standard structure recommended for use with Tableau Desktop. The more our analysis diverges from basic questions around the infraction dates, the more complicated our calculations become to combine the relevant information into useable form.

Usually, when data is stored with multiple columns for the same type of data (such as two columns for date, two columns for fine amount, etc.) and unique information is stored in the field name (such as whether it's the first or second infraction), this is an indication the data should be pivoted.

Performing a multiple pivot in Tableau Prep Builder can handle this nicely. We can work from the end of the Driver Infraction Tableau Prep flow created in the previous tutorial Finding the Second Date with Tableau Prep.

Tip: Make sure you're back in Tableau Prep for these next steps.

  1. From the final clean step, add a Pivot step that pivots by every duplicated field. Use the plus icon in the upper right corner of the Pivoted Fields area to add more Pivot Values. Each set of fields (such as 1st and 2nd Fine Amounts) should be pivoted together.

    For more information about pivoting, see Clean and Shape Data.

  2. In the Pivoted Fields area, under the Pivot1 Names column, double click each value and rename them to 1st and 2nd.

    Pivot 1 configuration pane in Tableau Prep showing the five pivots

The results can be tidied by removing null dates as well as renaming and reordering fields.

  1. Add a cleaning step after the pivot. In the Infraction Date column, right-click on the null bar and choose Exclude.

  2. Double-click the field name Pivot1 Names and rename it Infraction Number.

  3. Drag fields as appropriate to reorder them as below:

    Preview of the pivoted and cleaned up data

  1. From the new, pivoted data, create an output named Pivoted Driver Infractions and bring it into Tableau Desktop. (Don't forget to run the flow after adding the Output step.)

Now we can look at our five questions again with this pivoted data structure; you can expand each one for basic information about how to proceed if you get stuck.

Note: You can download the completed flow file Pivoted Driver Infractions.tflx to check your work, or download the workbook Pivoted Driver Infractions.twbx to look at the solutions in context. Remember that there may be alternative ways to interpret the analysis or pursue answers.

1. What was the length of time in days between the first and second infraction for each driver?
  1. To answer this in Tableau Desktop, as we did with the first data set, we'll use the DATEDIFF function. This function requires a start date and an end date. This information is present in our data, but all in one field. We need to pull it out into two fields.

    1. Create two preliminary calculated fields:

      1st Infraction Date = IF [Infraction Number] = "1st" THEN [Infraction Date] END

      2nd Infraction Date = IF [Infraction Number] = "2nd" THEN [Infraction Date] END

    2. Because we want to make sure both of these values are available to be compared for each driver, we need to fix them to the level of Driver ID.

      Note: Don't believe me? Try to do a DATEDIFF calculation with these two fields as they are: Time Between Infractions = DATEDIFF('day', [1st Infraction Date], [2nd Infraction Date])
      You'll get null results everywhere, because Tableau is trying to compare across a data structure that looks like this:
      text table showing null results
      Here, the row that knows what the first date is doesn't know what the second date is, and vice versa. To get around this, we'll use a FIXED Level of Detail expression to force these first and second dates to be related by Driver ID.

    3. Edit each calculation as follows:

      1st Infraction Date = { FIXED [Driver ID] : MIN ( IF [Infraction Number] = "1st" THEN [Infraction Date] END ) }

      2nd Infraction Date = { FIXED [Driver ID] : MIN ( IF [Infraction Number] = "2nd" THEN [Infraction Date] END ) }

      Note: The original IF calculation must be aggregated when embedded in an LOD expression. We can use any basic aggregation that will preserve the date value (so aggregations like SUM, AVG, or MIN work, but not CNT or CNTD).

      Note: These calculations can also be created in Tableau Prep Builder. For more information on LOD expressions in Prep, see Create Level of Detail, Rank, and Tile Calculations .

    4. Now we can create the DATEDIFF calculation as follows:

      Time Between Infractions = DATEDIFF('day', [1st Infraction Date], [2nd Infraction Date])

      • If we want to look at weeks or months, simply modify the date part (currently 'day').

      • It would also be possible to create a single calculation for the entire thing by placing the FIXED calcs inside the DATEDIFF directly: 

        DATEDIFF ( 'day',

        { FIXED [Driver ID] : MIN ( IF [Infraction Number] = "1st" THEN [Infraction Date] END ) },

        { FIXED [Driver ID] : MIN ( IF [Infraction Number] = "2nd" THEN [Infraction Date] END ) }

        )

    5. Plot Time Between Infractions on Columns and Driver ID on Rows.

The results will be identical to the outcome with the unpivoted data structure.

2. Compare the fine amounts for the first and second infractions. Are they correlated?
  1. To answer this in Tableau Desktop, we'll use very similar logic to the previous question. We'll use Infraction Number to identify if a given row is the first or second infraction, then pull out the fine amount accordingly.

    1. If all we want to do is make a scatter plot, we can skip the LOD portion and just use the IF calculation:

      1st Fine Amount = IF [Infraction Number] = "1st" THEN [Fine Amount] END

      2nd Fine Amount = IF [Infraction Number] = "2nd" THEN [Fine Amount] END

    2. However, if we wanted to compare and see the difference in amount between the first and second fines for a single driver, we'd run into the same null issue as with the dates. It can't hurt to wrap these calculations in a FIXED LOD, so it might be good just to do so from the start:

      1st Fine Amount = { FIXED [Driver ID] : MIN ( IF [Infraction Number] = "1st" THEN [Fine Amount] END ) }

      2nd Fine Amount = { FIXED [Driver ID] : MIN ( IF [Infraction Number] = "2nd" THEN [Fine Amount] END ) }

      These calculations can also be created in Tableau Prep Builder. For more information on LOD expressions in Prep, see Create Level of Detail, Rank, and Tile Calculations .

    3. Create a scatterplot with 1st Fine Amount on Columns and 2nd Fine Amount on Rows and bring out a linear trend line as before.

The results will be identical to the outcome with the unpivoted data structure.

3. Which driver paid the most overall? Who paid the least?
  1. To answer this question in Tableau Desktop, the pivoted data structure is ideal. All we need to do is bring out Driver ID and Fine Amount into a bar chart. The default aggregation is already SUM, so the total amount paid by the driver will automatically be plotted.

The results will be identical to the outcome with the unpivoted data structure.

4. How many drivers had multiple infraction types?
  1. To answer this question in Tableau Desktop, the pivoted data structure is ideal. All we need to do is bring out Driver ID and a Count Distinct of Infraction Type as a bar chart, and we'll have our answer.

The results will be identical to the outcome with the unpivoted data structure.

5. What was the average fine amount for drivers who never attended traffic school?
  1. To answer this in Tableau Desktop, we cannot simply divide the total fine amount by two, since some drivers only had one infraction. We also can't calculate the average fine per driver and take the average of those values, because averaging averages can lead to inconsistencies. Instead, we need to calculate the total amount paid by drivers who never attended traffic school, then divide by the total number of infractions associated with those fines.

    1. First, we need to determine if each driver had a second infraction. We can leverage the fact 2nd Infraction Date will be null if there was no second infraction and start building the calculation:

      IFNULL(STR([2nd Infraction Date]), 'no')

      This will return the date of the second infraction if it exists, or "no" if there was no second infraction.

      Note: The STR portion of this calculation is necessary because IFNULL needs consistency of data type in its arguments. Because we want to return the string "no" for null values, we need to convert the date to a string as well.

    2. Next, we need to turn this information into the number of infractions, 1 or 2. If the result of our IFNULL calculation is "no", then the driver should be marked as having one fine. Any other result should be marked as having two fines. The calculation is:

      Number of Infractions =

      IF IFNULL(STR([2nd Infraction Date]), 'no')= 'no' THEN 1 
      ELSE 2 
      END
      
    3. Now we need to consider the average fine amount. We already have a single field for Fine Amount. All we need to do is divide that by our new Number of Infractions field, wrapping both in SUM

      Average Fine = (SUM([Fine Amount]) / SUM([Number of Infractions])

  2. We also need to filter out drivers who attended traffic school. It looks like we could use the Traffic School field and filter on Traffic School = no. However, this would filter on infractions not associated with traffic school, not drivers who never went to traffic school. If a driver went to traffic school for one infraction but not the other, we don't want either infraction to be considered here—that driver has been to traffic school and therefore doesn't fit the parameters of the question.

    What we want to do is filter out any driver who attended traffic school. In terms of the data, we want to filter out any driver who has a "Yes" for Traffic School on any row, regardless of which infraction it's associated with. Let's build our calculation in stages, using a simple view to help keep track of what's happening:

    1. First, we want to know if a driver has a "Yes" for Traffic School. Drag Driver ID to Rows and Traffic School to Columns. We'll get a text table with placeholder "Abc" text indicating the relevant values for each driver.

    2. Next, we want to build a calculation that will identify if the value of Traffic School is "Yes". The first stage of the calculation is:

      Attended Traffic School = CONTAINS([Traffic School), 'Yes')

      If we bring Attended Traffic School to the Color shelf on the Marks card, we see it accurately labels "False" for every mark in the "No" column, and "True" for every mark in the "Yes" column.

    3. However, what we really want is this information at the level of the driver, not the infraction. An LOD expression is a natural fit when trying to compute a result at a different level of detail than the basic structure of the data. We'll make this a FIXED LOD expression. But, as we know, the aggregate expression portion of an LOD must be aggregated. Previously, we've used MIN. Will that work here? We'll modify the calculation to be: 

      Attended Traffic School = { FIXED [Driver ID] : MIN( CONTAINS([Traffic School], 'Yes'))}

      A view of Driver ID by Traffic School, with Attended Traffic School on Color

      With that change applied in the view, we see the opposite of what we want. Any driver that has a "No" is marked as "False" across the board. Instead, we want to carry the "Yes" as a "True" for every record for that driver. What is MIN doing here? It's picking the first response alphabetically, that is, "No".

    4. What if we changed it to MAX? Would that take the last response alphabetically? We'll modify the calculation to be: 

      Attended Traffic School = { FIXED [Driver ID] : MAX ( CONTAINS( [Traffic School], 'Yes') ) }

      A view of Driver ID by Traffic School, with Attended Traffic School on Color

      And here we have it: if a driver has "Yes" anywhere in the data, they are marked as "True" for having attended traffic school, even on the infraction that didn't involve traffic school.

    5. If we bring Attended Traffic School to the Filter shelf and select only "False", we'll be left with only drivers who never attended traffic school.

  3. To answer the original question, with our filter in place we'll simply bring Average Fine to the Textshelf on the Marks card. Because we built the aggregations into the calculation, the aggregation on the field will be AGG and we cannot change it. This is as expected.

The results will be identical to the outcome with the unpivoted data structure.

The benefits of pivoted data

We could stick with the original data structure from the tutorial if we know we'd only need to answer questions that are easy to answer with that structure. However, the pivoted data format is more flexible. Even though it requires some calculations, once they're in place the resulting data set is well suited to answer broader questions.

Go Further Still—Calculations Only

What if you don't have access to Tableau Prep Builder? Are you out of luck entirely if you're stuck with the original data? Not at all!

Tableau Desktop and LOD expressions can answer all of our analytical questions. If we connect to the original Traffic Violations.xlsx(Link opens in a new window), it looks very similar to the pivoted data set—just without the crucial Infraction Number field. We'll need to mimic the outcome of the aggregation steps via LOD expressions.

Note: You can download the workbook LOD Driver Infractions.twbx(Link opens in a new window) to look at the solutions in context. Remember that there may be alternative ways to interpret the analysis or pursue answers.

1. What was the length of time in days between the first and second infraction for each driver?
  1. To answer this in Tableau Desktop, we'll again use the DATEDIFF function. This function requires a start date and an end date. This information is present in our data, but all in one field. We need to pull it out into two fields. Because we want to make sure both of these values are available to be compared for each driver, we need to fix them to the level of Driver ID.

    1. To find the first infraction date, we use the calculation:

      1st Infraction = { FIXED [Driver ID] : MIN ( [Infraction Date] ) } 

    2. We'll do the second infraction date in stages.

      1. To start, we need to look at just the dates that are larger than the first date:

        IF [Infraction Date] > [1st Infraction] THEN [Infraction Date] END

      2. But this will give us every infraction after the first, and we only want the second. So we want the smallest of these dates. Wrap the whole thing in MIN:

        MIN( IF [Infraction] : [1st Infraction] THEN [Infraction Date] END )

      3. We also want to recalculate the second infraction date for each driver. That's where LOD expressions come in. We'll fix this to the level of Driver ID:

        2nd Infraction = { FIXED [Driver ID] : MIN ( IF [Infraction Date] > [1st Infraction] THEN [Infraction Date] END ) }

    3. And we can now create the DATEDIFF calculation:

      Time Between Infractions = DATEDIFF('day', [1st Infraction], [2nd Infraction])

The results will be identical to the outcomes with the other two data structures.

2. Compare the fine amounts for the first and second infractions. Are they correlated?
  1. To answer this in Tableau Desktop, we'll use similar logic to the pivoted data version of this question. We'll use the 1st Infraction and 2nd Infraction fields we created for question I to identify if a given row is the first or second infraction, then pull out the fine amount accordingly.

    1. If all we want to do is make a scatter plot, we can skip the LOD portion and just use an IF calculation:

      1st Fine Amount = IF [1st Infraction] = [Infraction Date] THEN [Fine Amount] END

      2nd Fine Amount = IF [2nd Infraction] = [Infraction Date] THEN [Fine Amount] END

    2. However, if we want to compare and see the difference in amount between the first and second fines for a single driver, we'd run into issues with nulls, as in the first data structure. It can't hurt to wrap these calculations in a FIXED LOD, so it might be good just to do so from the start:

      1st Fine Amount = { FIXED [Driver ID] : MIN ( IF [1st Infraction] = [Infraction Date] THEN [Fine Amount] END ) }

      2nd Fine Amount = {FIXED [Driver ID] : MIN( IF [2ndInfraction] = [Infraction Date] THEN [Fine Amount] END ) }

The results will be identical to the outcomes with the other two data structures.

3. Which driver paid the most overall? Who paid the least?
  1. To answer this in Tableau Desktop, we need to first realize something about the LOD-only method. Both methods using Tableau Prep filter out records that are not the first or second infraction for a driver. The LOD method in Tableau Desktop keeps all records. This means that if we were to create a viz of SUM(Amount Paid) by Driver ID, the Tableau Desktop-only version will show higher amounts for drivers with more than two infractions. To get a Total Amount Paid value from the complete data that matches the other methods, instead of using the original Fine Amount field, we instead need to sum the first and second fines like we did with the first data structure.

  2. Using the fields we created for question 2, we'll add the two fine amounts. ZN is necessary to prevent a null result for any drivers who only had one infraction. The calculation is: 

    Total Amount Paid = [1st Fine Amount] + ZN([2nd Fine Amount])

The results will be identical to the outcomes with the other two data structures.

4. How many drivers had multiple infraction types?
  1. To answer this question in Tableau Desktop, we can't simply bring out Driver ID and a Count Distinct of Infraction Type. Because this data set has infractions beyond the second, some drivers may have more than two infraction types. To match the results with the other methods, we need to limit the scope to just the first two infractions.

  1. We can to pull out the 1st and 2nd infraction types, wrap them in LOD expressions to make them FIXED to the driver, then use an IF calculation to count the types:

    1. 1st Infraction Type = { FIXED [Driver ID] : MIN ( IF [1st Infraction] = [Infraction Date] THEN [Infraction Type] END ) }

    2. 2nd Infraction Type = { FIXED [Driver ID] : MIN ( IF [2nd Infraction] = [Infraction Date] THEN [Infraction Type] END ) }

    3. Number of Infraction Types =

      IF [1st Infraction Type] = [2nd Infraction Type] THEN 1 
      ELSEIF [1st Infraction Type] != [2nd Infraction Type] THEN 2
      ELSE 1 END 

      Note: It's also possible to create many of these calculations as a single field by nesting the initial calculations directly in the larger calculation. Here, the combined calculation would look like this:
      IF
      {FIXED [Driver ID] : MIN(IF [1st Infraction]=[Infraction Date] THEN [Infraction Type] END)}
      =
      {FIXED [Driver ID] : MIN(IF [2nd Infraction]=[Infraction Date] THEN [Infraction Type] END)}
      THEN 1

      ELSEIF
      {FIXED [Driver ID] : MIN(IF [1st Infraction]=[Infraction Date] THEN [Infraction Type] END)}
      !=
      {FIXED [Driver ID] : MIN(IF [2nd Infraction]=[Infraction Date] THEN [Infraction Type] END)}
      THEN 2

      ELSE 1
      END


      Which is a bit harder to make sense of, but works if preferred. (Note that line breaks and some spaces do not impact how a calculation is interpreted by Tableau.)

  2. We can then plot Number of Infraction Types against Driver ID and sort the bar chart.

The results will be identical to the outcomes with the other two data structures.

5. What was the average fine amount for drivers who never attended traffic school?
  1. To answer this in Tableau Desktop, we cannot simply divide the total fine amount by two, since some drivers only had one infraction. We also can't calculate the average fine per driver and take the average of those values, because averaging averages can lead to inconsistencies. Instead, we need to calculate the total amount paid by drivers who never attended traffic school, then divide by the total number of infractions associated with those fines.

    1. First, we need to determine if each driver had a second infraction. We can leverage the fact the information in all the "2nd" fields will be null if there was no second infraction and start building the calculation:

      IFNULL([2nd Infraction Type], 'no')

      This will return an infraction type if it exists, or "no" if there was no second infraction.

    2. Next, we need to turn this information into the number of infractions, 1 or 2. If the result of our IFNULL calculation is "no", then the driver should be marked as having one fine. Any other result should be marked as having two fines. The calculation is:

      Number of Infractions =

      IF IFNULL([2nd Infraction Type], 'no') = 'no' THEN 1
      ELSE 2
      END
    3. For the Total Amount Paid, we can use the calculation from question 3. To bring it all together, we'll take this total fine amount and divide it by our new Number of Infractions calculated field to determine the average fine amount:

      Average Fine = SUM([Total Amount Paid]) / SUM([Number of Infractions])

  2. We also need to filter out drivers who attended traffic school. Because this data set contains some drivers with a third or fourth infraction, we can't use the same method as the pivoted data structure. Instead, we'll follow the same method as the unpivoted data, summarized here:

    1. First, we need to built two calculations identifying if the first and second infractions involved traffic school or not: 

      1st Traffic School = { FIXED [Driver ID] : MIN (IF [1st Infraction] = [Infraction Date] THEN [Traffic School] END ) }

      2nd Traffic School = { FIXED [Driver ID] : MIN (IF [2nd Infraction] = [Infraction Date] THEN [Traffic School] END ) }

    2. Then we'll add those values to get the overall number of traffic school attendances: 

      Number of Traffic School Attendances =

      (CASE [1st Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END)
      +
      (CASE [2nd Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END)
    3. If we drag Number of Traffic School Attendances to the Dimensions area of the Data pane, the values 0–2 become discrete.

    4. Now if we filter on Number of Traffic School Attendances, we can select just the 0 and know we're getting drivers who have never attended traffic school.

  3. To answer the original question, we'll simply bring Average Fine to the Textshelf on the Marks card. Because we built the aggregations into the calculation, the aggregation on the field will be AGG and we cannot change it. This is as expected.

The results will be identical to the outcomes with the other two data structures.

It's important to remember that this solution has a lot of nested calculations and LOD expressions. Depending on the size of the data set and the complexity of the data, performance could be an issue.

Reflection on Methods

So which route should you go? That's entirely up to you and the tools at your disposal.

  • If you want to steer clear of LODs, there's a data-shaping solution, though calculations might be necessary for some analysis (Analysis in Tableau Desktop).

  • If you can shape the data and are comfortable with calculations—including LODs—the middle-of-the-road option provides the best flexibility (Go Further—Pivoted Data).

  • If you're comfortable with LODs, there's minimal impact on performance, and/or you don't have access to Tableau Prep, solving this with LODs alone is a viable option (Go Further Still—Calculations Only).

At the very least, it's valuable to understand how aggregation in Tableau Prep and Level of Detail expressions in Tableau Desktop are interrelated and impact data analysis. As with most things in Tableau, there's more than one way to do anything. Exploring all the various options can help bring concepts together and let you pick the best solution for you.

Calculations used:

Driver Infractions

  • Time Between Infractions = DATEDIFF('day', [1st Infraction Date], [2nd Infraction Date])

  • Total Amount Paid =[1st Fine Amount] + ZN([2nd Fine Amount])

  • Number of Infraction Types = IF [1st Infraction Type]=[2nd Infraction Type] THEN 1 ELSEIF [1st Infraction Type]!= [2nd Infraction Type] THEN 2 ELSE 1 END

  • Number of Infractions = IF IFNULL([2nd Infraction Type], 'no') = 'no' THEN 1 ELSE 2 END

  • Average Fine = ( SUM([1st Fine Amount]) + SUM( ZN([2nd Fine Amount]) ) ) / SUM([Number of Infractions])

  • Number of Traffic School Attendances = (CASE [1st Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END) + (CASE [2nd Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END)

Pivoted Driver Infractions

  • 1st Infraction = {FIXED [Driver ID] : MIN(IF [Infraction Number] = "1st" THEN [Infraction Date] END)}

  • 2nd Infraction = {FIXED [Driver ID] : MIN(IF [Infraction Number] = "2nd" THEN [Infraction Date] END)}

  • Time Between Infractions = DATEDIFF('day', [1st Infraction], [2nd Infraction])

  • 1st Fine Amount = {FIXED [Driver ID] : MIN( IF [Infraction Number] = "1st" THEN [Fine Amount] END ) }

  • Number of Infractions = IF IFNULL(STR([2nd Infraction]), 'no')= 'no' THEN 1 ELSE 2 END

  • Average Fine = SUM([Fine Amount])/SUM([Number of Infractions])

  • Attended Traffic School = { FIXED [Driver ID] : MAX( CONTAINS([Traffic School], 'Yes'))}

LOD Driver Infractions

  • 1st Infraction = {FIXED [Driver ID] : MIN([Infraction Date])}

  • 2nd Infraction = { FIXED [Driver ID] : MIN( IF [Infraction Date] > [1st Infraction] THEN [Infraction Date] END ) }

  • Time Between Infractions = DATEDIFF('day', [1st Infraction], [2nd Infraction])

  • 1st Fine Amount = {FIXED [Driver ID] : MIN( IF [1st Infraction] = [Infraction Date] THEN [Fine Amount] END ) }

  • 2nd Fine Amount = {FIXED [Driver ID] : MIN( IF [2nd Infraction] = [Infraction Date] THEN [Fine Amount] END ) }

  • Total Amount Paid = [1st Fine Amount] + ZN([2nd Fine Amount])

  • 1st Infraction Type = {FIXED [Driver ID] : MIN( IF [1st Infraction] = [Infraction Date] THEN [Infraction Type] END ) }

  • 2nd Infraction Type = {FIXED [Driver ID] : MIN( IF [2nd Infraction] = [Infraction Date] THEN [Infraction Type] END ) }

  • Number of Infraction Types = IF [1st Infraction Type]=[2nd Infraction Type] THEN 1 ELSEIF [1st Infraction Type]!= [2nd Infraction Type] THEN 2 ELSE 1 END

  • Number of Infractions = IF IFNULL([2nd Infraction Type], 'no') = 'no' THEN 1 ELSE 2 END

  • Average Fine = SUM ([Total Amount Paid]) / SUM([Number of Infractions])

  • 1st Traffic School = {FIXED [Driver ID] : MIN (IF [1st Infraction] = [Infraction Date] THEN [Traffic School] END ) }

  • 2nd Traffic School = {FIXED [Driver ID] : MIN (IF [2nd Infraction] = [Infraction Date] THEN [Traffic School] END ) }

  • Number of Traffic School Attendances = (CASE [1st Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END) + (CASE [2nd Traffic School] WHEN 'Yes' THEN 1 WHEN 'No' THEN 0 ELSE 0 END)

Note: Special Thanks to Ann Jackson's Workout Wednesday topic Do Customers Spend More on Their First or Second Purchase?(Link opens in a new window) and Andy Kriebel's Tableau Prep Tip Returning the First and Second Purchase Dates(Link opens in a new window) that provided the initial inspiration for this tutorial. Clicking these links will take you away from the Tableau website. Tableau cannot take responsibility for the accuracy or freshness of pages maintained by external providers. Contact the owners if you have questions regarding their content.

Thanks for your feedback!Your feedback has been successfully submitted. Thank you!