Tableau Server Data Engine
Hyper is Tableau's in-memory Data Engine technology optimized for fast data ingests and analytical query processing on large or complex data sets. Hyper powers the Data Engine in Tableau Server, Tableau Desktop, Tableau Cloud, and Tableau Public. The Data Engine is used when creating, refreshing or querying extracts. It is also used for cross-database joins to support federated data sources with multiple connections.
|Status of the
|Logs generated by the
. For more information, see Tableau Server Logs and Log File Locations
The Data Engine is designed to leverage all available CPU and memory on the machine to provide the fastest response times.
Hyper technology leverages the new instruction sets in CPU and is capable of parallelizing and scaling to all the available cores. Hyper technology is designed to scale to many cores efficiently, and also to maximize the use of each single core as much as possible. This means that you can expect to see up to an average of 75% use of total CPU per hour during query processing. Adding more CPU should result in performance improvement.
Note: The 75% hourly average usage is the default, and should be left unchanged unless you are running Data Engine on a dedicated server node. For more information about running Data Engine on a dedicated node, see Optimize for Extract Query-Heavy Environments.
Modern operating systems such as Microsoft Windows, Apple macOS, and Linux have mechanisms to make sure that even if a CPU is fully used, incoming and other active processes can run simultaneously. In addition, to manage overall resource consumption and to prevent overloading and completely starving other processes running on the machine, the Data Engine monitors itself to stay within the limits set in the Tableau Server Resource Manager (SRM). Tableau Server Resource Manager monitors the resource consumption and notifies Data Engine to reduce the usage when it exceeds the predefined limit.
Since the Data Engine is designed to utilize the available CPU, it is normal to see spikes in CPU usage at times. If however, you see high CPU usage (ex: 95%) for extended periods of time (an hour or more), this can mean a couple of things:
There is a high load of queries. This can happen if a server is under stress due to overload of multiple client requests and the queries are queuing up. If this happens often, it is an indication that more hardware is required to serve the clients. Adding more CPU in this case should help to improve performance.
There is one long running query. In this case, the Tableau Server resource Manager will stop long running queries based on the timeout settings. This was also true for the Tableau Server versions earlier than version 10.5
For more information on Tableau Server Resource Manager, see General Performance Guidelines.
Memory usage of the Data Engine depends on the amount of data required to answer the query. The Data Engine will try to run this in-memory first. A working set memory is allocated to store an intermediate data structure during query processing. In most cases, systems have enough memory to do these types of processing, but if there isn't enough available memory, or if more than 80% of RAM is utilized, the Data Engine shifts to spooling by temporarily writing to disk. The temporary file get deleted after the query has been answered. Therefore, spooling is an indication that more memory may be needed. Memory usage should be monitored and upgraded appropriately to avoid performance issues caused by spooling.
To manage memory resources on the machine, the maximum memory limit for Data Engine is set by Tableau Server Resource Manager (SRM).
A single instance of Data Engine is automatically installed per node where an instance of File Store, Application Server (VizPortal), VizQLServer, Data Server, or Backgrounder is installed on Tableau Server. The Data Engine can scale by itself and uses as much CPU and memory as needed, thus removing the need for multiple instances of the Data Engine. For more information on the server processes, see Tableau Server Processes.
The instance of Data Engine installed on the node where File Store is installed is used for querying data for view requests. The instance of Data Engine installed on the node where backgrounder is installed is used for extract creation and refreshes. This is an important consideration when you are doing performance tuning. For more information, see Performance Tuning.
Data Server, VizQL Server, and the Application Server (VizPortal) all use the local instance of Data Engine to do cross-database joins and create shadow extracts. Shadow extract files are only created when you work with workbooks that are based on non-legacy Excel or text, or statistical files. Tableau creates a shadow extract file in order to load the data more quickly.
In Tableau Server 10.5 one instance of Data Engine is installed automatically when you install backgrounder. The backgrounder process uses the single instance of Data Engine (hyperd.exe) installed on the same node.
Important! There are exceptions to when the Data Engine is installed on the same node as File Store. When File Store is configured external to Tableau Server, Data Engine is no longer installed with File Store. In this configuration where Tableau Server is configured with an External File Store, Data Engine, will continue to be installed with the other process as noted above. In addition, you can also configure Data Engine on a node without other processes - but only when File Store is configured externally. For more information on External File Store, see Tableau Server External File Store.
You can scale up with the new Data Engine: Since cores are fully utilized, adding more cores makes individual queries execute faster which in turn allows for more queries to execute in less time.
Memory usage should be monitored and upgraded appropriately to avoid the performance issues caused by spooling.
For more information on Scalability, see Tableau Server Scalability.
Starting in 10.5, Hyper technology has been integrated with Tableau Data Engine to give you the following key benefits:
Faster extract creation: With Hyper technology, extracts are generated almost as fast as the source system can deliver data, no sorting needed.
Support for larger extracts: Prior to this release, you might have not been able to get all your data into a single extract. With Hyper technology, much larger amounts of data can be included in a single extract.
Faster analysis of extracts: In many cases you will see faster querying of data for larger extracts, or workbooks with complex calculations.
Here are some reasons why the Data Engine powered by Hyper performs better on larger or complex extracts and is optimized for faster querying:
Hyper technology is designed to consume data faster. Unlike in previous versions, the Data Engine does not do any post processing like sorting. With Hyper, post processing steps like sorting are not needed giving the Data Engine the ability to perform better with larger extracts.
Hyper technology is memory-optimized. This means that when needed, all data lives in memory. This results in fast data access times.
Hyper technology is CPU optimized. This means that Data Engine now fully parallelizes the query execution and utilizes available CPU in such a way that query execution time scales almost linearly with the number of cores in the machine.
Hyper is a compiling query engine. Queries are either interpreted or compiled to the machine code for maximum performance and allowing the Data Engine to get most performance out of modern hardware (CPU, large main-memory capacities).
Hyper technology uses advanced query optimizations to make queries faster. Along with many additional advanced techniques such as, materializing min and max values for each column, mini-indices to optimize search ranges, more granular data block-level dictionaries, advanced logic for join and sub-query performance optimizations, the new Data Engine offers many improvements over the previous Tableau Data Engine in terms of performance and scalability.