RLS Best Practices for Data Sources and Workbooks
Row-level security (RLS) in Tableau restricts the rows of data a certain user can see in a workbook. This differs from Tableau permissions, which control access to content and feature functionality. For example, permissions control whether a user can comment on or edit a workbook, while row-level security enables two users viewing the same dashboard to see only the data each user is allowed to see.
There are several ways to implement RLS in Tableau. For example, you can set RLS at the data source or workbook level, or you can set RLS at the connection level using a virtual connection with a data policy (requires Data Management). See the Overview of Row-Level Security Options in Tableau for details about alternatives.
Note: This topic focuses on RLS best practices for data sources and workbooks. For more in-depth examples of the concepts outlined in this topic, refer to the whitepaper Best Practices for Row Level Security with Entitlement Tables(Link opens in a new window) or How to Set Up Your Database for Row Level Security in Tableau(Link opens in a new window) on the blog Tableau and Behold.
For live connections and multi-table extracts, the basic RLS workflow is:
- The user is identified by logging into Tableau Server or Tableau Cloud
- This requires a distinct username per user and secure single sign-on (SSO)
- Active Directory, LDAP, or the Tableau REST API can be used to synchronize user names and establish permissions
- The set of data entitlements for the user is retrieved from all possible data entitlements
- This requires a data structure that can link entitlements to the Tableau username
- The data is filtered by the entitlements for that user
- This often requires using user functions in a calculated field
- The published, filtered data is used to build content
- Using a published (rather than embedded) data source with a data source filter ensures the RLS cannot be modified by downloading or web editing the workbook
How the joins, calculated fields, and filters are set up depends on the structure of the data and how users are managed.
Any unique combination of attributes that the data can be filtered on is an entitlement. Most commonly, there are separate tables for specifying the entitlements themselves and mapping those entitlements to users or user roles. Denormalizing is recommended from a performance standpoint because joins are expensive operations.
The entitlements view, consisting of the entitlements mapped to users or roles, is joined with the data. A user-based data source filter is then applied, acting as a WHERE clause that brings in only the entitlements—and therefore the appropriate data rows—for the relevant user. (Query optimization should ensure the filtering occurs before joining when the query is processed to minimize data duplication. For more information, see Performance and processing order of operations.)
Entitlement table models
Generally, there are two models for representing entitlements:
Full mapping to the deepest level of granularity
- Entitlements are defined fully for every column.
- There is one row in the mapping table for every possible entitlement the user has.
- This model requires fewer join clauses.
- Entitlements are defined for every level of hierarchy, with NULL used to represent an “all” state.
- There is a single row in the mapping table for a particular level in the entitlement hierarchy, which vastly reduces the number of entitlement rows for users at high levels in a hierarchy.
- This model requires more complex joins and filters.
Users and roles
Combinations of entitlements are commonly represented as roles, which are then linked to users in a many-to-many mapping table. This allows for easily changing or removing a user from the role, while still maintaining a record of the role and its entitlements.
Alternatively, a many-to-many mapping table can be created that instead assigns users directly to entitlements as opposed to going through joining a role table. It will require managing the values more directly in the table but does eliminate a join.
Note: The user values associated with a role or entitlement need to match the username or full name on the Tableau site in order to take advantage of the user functions in Tableau Desktop.
Regardless of the model used to represent the entitlements, it is advisable to join all entitlements and mapping tables together into a single denormalized entitlements view. While at first this will cause a “blowup” (highly duplicative) version of the entitlements, the data source filter on the user will reduce it back down. You will also want this view if you plan on using an extract.
The deepest granularity method can have a performance benefit when everything is hierarchical—you only need to do a single join on the deepest level of the hierarchy. This only works if all of the attributes at the lowest level are distinct. If there is a chance for duplication (for example, a Central sub-region in more than one region), then you’ll need to join on all the columns to achieve the effect of a distinct key value.
The actual details and their performance characteristics depend on the data system and require testing. For example, using a single key could potentially improve the performance because the join is then only executing on one column, but correctly indexing all of the columns may give equal performance when other factors are taken into consideration.
Implement row-level security
After the denormalized view of mapped entitlements is created, an inner join is set up between the view and the data in the Tableau data connection dialog. The data can remain in a traditional star schema. Alternatively, the dimension and fact tables can be materialized together into two views. Multi-table extracts will build extract tables to match the joins, so creating the two views will simplify the resulting extract. The SQL will follow this basic pattern:
SELECT * FROM data d INNER JOIN entitlements e ON d.attribute_a = e.attribute_a AND d.attribute_b = e.attribute_b AND ... WHERE e.username = USERNAME()
If your entitlements more closely resemble the sparse entitlements model, then the custom SQL to join the data to the entitlements would be a little more complex because of the NULL values. Conceptually, it would look like the following:
SELECT * FROM data d INNER JOIN entitlements e ON (e.region_id = d.region_id OR ISNULL(e.region_id) AND (e.sub_region_id = d.sub_region_id OR ISNULL(e.sub_region_id) AND (e.country_id = d.country_id OR ISNULL(e.country_id)
Without using custom SQL, this can be done with a cross join and additional filters in Tableau Desktop. Create a join calculation on both sides of the join dialog that simply consists of the integer 1 and set them equal. This joins every row from the data table with every row in the entitlements table.
Then you need a calculation (or individual calculations) to account for the levels in the hierarchy. For example, you could have several calculations that follow this format:
[region_id] = [region_id (Entitlements View)] OR ISNULL([region_id (Entitlements View)]
Or you could have a combined calculation for all levels in one:
([region_id] = [region_id (Entitlements View)] OR ISNULL([region_id (Entitlements View)]) AND ([sub_region_id] = [sub_region_id (Entitlements View)] OR ISNULL([sub_region_id (Entitlements View)]) AND ([country_id] = [country_id (Entitlements View)] OR ISNULL([country_id (Entitlements View)])
The ISNULL function matches any entitlement column to all items in the other column. As always with RLS, these calculations should be added as data source filters.
Data source filter
For both approaches, once the entitlements are correctly joined with the data, a filter needs to be set up to limit the data for a specific user. A calculated field should be created with a user function. For example, a simple Boolean comparison of whether the user listed in the Username field is the same as the username of the person logged into the Tableau site:
[Username] = USERNAME()
This calculation should be used as a data source filter (with TRUE selected).
If the data source is embedded and a user has permissions to web edit or download the workbook, then the RLS is nonexistent since the filters enforcing it can be easily removed. The Tableau data source should be published separately as opposed to being left embedded in the workbook.
All access with deepest granularity
There is also a common scenario in which there are two access levels within the organization: people who can see everything (“all access”) or people with some reasonably definable subset of entitlements. This is most commonly seen for embedded applications—the organization hosting the data can see everything, but each client can only see their own data. In this case, you need a way to return the full data for the “all access” users, while maintaining the deepest granularity joins for all other users.
For this technique, you will use Tableau groups to create an override using a calculation in the join condition.
- Create a group for users who should see all the data (here called All Access)
- From the fact view, create a left join with two join conditions
- The first join condition should be on the column that represents the deepest level of granularity
- The second join condition should be two calculations:
- On the left side (the fact view), for the calculation, enter
- On the right side (the entitlements view), the calculation should be:
IF ISMEMBEROF('All Access') THEN False ELSE True END
- On the left side (the fact view), for the calculation, enter
- On a sheet, create a calculation structured as:
[Username] = USERNAME() OR ISMEMBEROF(['All Access'] ([Entitlements View)])
- Create a data source filter on the username calculation
If a user is a member of the All Access group, then the join becomes a left join on
True = False. This means there are no matches at all in the entitlements view, so the entire fact view is returned with NULLs for the columns from the entitlements view (zero duplication). In the case where the user is not part of the All Access group, the
True = True join condition doesn’t change anything and the join will function as expected.
The user calculation used as a data source filter is true for all rows when the group override is working, or it will filter down to only the user’s deepest granularity in the hierarchy.
When a visualization is viewed in Tableau Desktop, Tableau Server, or Tableau Cloud, Tableau sends an optimized query to the RDBMS which then processes the query and sends results back to Tableau to render the visualization with the resulting data. The order of operations for when joins, calculations, and filters are carried out depends on the query optimizer and how the query is executed.
When using a live connection to a data source in Tableau, the performance of the query execution is dependent on the query optimizer which translates the incoming SQL into an efficient plan for retrieving the data.
There are two ways the query can be processed:
- Filter the entitlement rows to the user then join to the fact table
- Join the entitlements to the fact table then filter to the user’s rows
In an ideal situation, the query optimizer will ensure the database processes the query by filtering then joining. If a user is entitled to everything, this means the maximum number of rows processed will be the number of rows in the data table.
If the database processes the query by joining then filtering, there may be duplication of data. The maximum number of rows processed will be the number of users entitled to see that particular row times each row in the data table.
It will be clear if this second scenario happens: your queries take a long time to finish, you get errors, or there is an indication of performance issues in the database. Your total data volume will expand exponentially, which could cause inordinate system strain on the backend.
When the data source in Tableau is a live connection, Tableau sends every query that is necessary to render a particular viz or dashboard to the RDBMS. When the data source is an extract, the process of querying data from the underlying data source only happens at extract creation and refresh. All of the individual queries for visualizations are answered by the extract engine from the extract file.
The same order of operations issue is present when building single table extracts. However, the “blowup” will happen both on the underlying data source and within the resulting extract itself.
Considerations with extracts
Starting in Tableau 2018.3, the data engine can create a multi-table extract and RLS can be implemented as described above. Using multiple table extracts reduces the time it takes to generate an extract with many-to-many relationships by not materializing the join.
The extract should be built with a data object and an entitlements object. This is the simplest storage in the extract and results in the best performance.
- The data object is the table, view or custom SQL query that represents the denormalized combination of the fact and necessary dimension tables
- The entitlements object is a denormalized table, view or custom SQL query of whatever entitlements are necessary to filter the data at the most granular level, which requires:
- A column for username matching the exact usernames in Tableau Server or Tableau Cloud
- A row for each of the most granular entitlements to the data object
This format is laid out in the deepest granularity method above. Multi-table extracts use the same method, with the caveat that only two data objects are being joined and any field-specific filtering is already applied within the object.
Because multiple table extracts have extract filters disabled, you can filter either in the views or tables you connect to in the data source, or define the filters in custom SQL objects in the Tableau data connection dialog.
Note: As with live connections, if the data source is embedded and a user has permissions to web edit or download the workbook, then the RLS is nonexistent since the filters enforcing it can be easily removed. The extract should be published separately as opposed to being left embedded in the workbook.
Single table extracts
The following method is only recommended when using a version of Tableau prior to 2018.3—multiple table extracts are preferable if available.
Single table extracts materialize any joins you build when constructing the Tableau data source and stores everything as a single table through one query, the results of which are transformed in a single table in the extract file. This denormalization carries the risk of causing massive data duplication, as every row that was allocated to more than one entitlement or user would be duplicated as a result of the many-to-many relationship.
To prevent this duplication:
- Create a Security Users Field that contains the usernames for that entitlement
- for example, a value may be “bhowell|mosterheld|rdugger”
- Use the CONTAINS() function within Tableau to correctly identify individual users
- For example,
CONTAINS([Security Users Field], USERNAME())
- For example,
This method obviously has some caveats. It requires that you go from your entitlements in rows to a single column separated correctly using SQL, and that column can only contain so many characters. Partial matches can be trouble, and you need to use separators that will never be valid in the IDs themselves. Although it is performant within the Tableau Data Engine, as a string calculation it will be very slow for most databases. This limits your ability to switch back to a live connection.
Alternatively, you can take different extracts per “role” or entitlement level, so that only the data appropriate to that person or level is contained within the extract, but this will require processes to appropriately permission and leverage template publication within Tableau Server, generally via the APIs.
Use built-in row-level security in a database
Many databases have mechanisms for RLS built in. If your organization has already put effort into building row-level security in a database, you might be able to take advantage of your existing RLS. It's not necessarily easier or better to implement a built-in RLS model vs. building it with Tableau in mind; these techniques are generally leveraged when an organization has already invested in these technologies and they want to take advantage of the investment. The main benefit of using built-in RLS is that administrators can implement and control their data security policy in one place: their databases. For more information, see Row-Level Security in the Database.