Hardware Incidents
Hardware incidents monitor the server itself. These can be used to help identify server issues that may affect Tableau Server’s performance.
You can set thresholds for the following issues:
- CPU Usage
- Available Memory
- Memory Usage
- Free Disk Space
The following incidents are configured by default when you install a new Tableau Resource Monitoring Tool:
- When the available disk space falls below 10 GB for 10 minutes or more a warning incident is logged, and when available disk falls below 5 GB for 10 minutes or more, a critical incident is logged.
- When available memory falls below 8 GB for over 10 minutes, a warning incident is logged.
- When the CPU usage for the entire server is 80% or more for 5 minutes, a warning incident is logged.
Note: Memory related incidents are configured in binary multiples of bytes.
You can configure thresholds using the RMT Server web interface or by updating the configuration file config.json
.
To set the thresholds for hardware incidents, under the Admin menu, select Configuration, and go to the Incidents tab.
For CPU Usage, set the following:
To set the thresholds for hardware incidents, under the Admin menu, select Configuration, and go to the Incidents tab.
Key | Required? | Description |
---|---|---|
Severity | Required | |
Process | Required | The threshold applies to the entire Tableau Server or for a single process as specified. |
Start Threshold | Required | The CPU usage must surpass the value specified before an incident is created and monitored. Set the percent and the duration for this threshold. |
End Threshold | Optional | The CPU usage that must fall below the value specified before an incident is considered resolved. |
For Available Memory, set the following:
Key | Required? | Description |
---|---|---|
Severity | Required | |
Start Threshold | Required | The available memory must fall below the value specified before an incident is created and monitored. Set the percent and the duration for this threshold. |
End Threshold | Optional | The available memory must be above the value specified before an incident is considered resolved. |
For Memory Usage, set the following:
Key | Required? | Description |
---|---|---|
Severity | Required | |
Process | Required | The threshold applies to the entire Tableau Server or for a single process as specified. |
Start Threshold | Required | The memory usage must be equal to the value specified before an incident is created and monitored. Set the percent and the duration for this threshold. |
End Threshold | Optional | The memory usage must be below the value specified before an incident is considered resolved. |
For Free Disk Space, set the following:
Key | Required? | Description |
---|---|---|
Severity | Required | |
Start Threshold | Required | The free disk space must fall below the value specified before an incident is created and monitored. Set the percent and the duration for this threshold. |
End Threshold | Optional | The free disk space must be above the value specified before an incident is considered resolved. |
For Disk Queue Length, set the following:
Key | Required? | Description |
---|---|---|
Severity | Required | |
Start Threshold | Required | The disk queue length must be equal to the value specified before an incident is created and monitored. Set the percent and the duration for this threshold. |
End Threshold | Optional | The disk queue length must be below the value specified before an incident is considered resolved. |
An example config.json
snipped defining two hardware incidents:
{
"monitoring": {
"incidents": {
"triggers": [
{
"counter": "DiskSpaceAvailableKB",
"severity": "warning",
"threshold": 1048576
},
{
"counter": "ProcessorTimePercent",
"severity": "warning",
"threshold": 0.95,
"thresholdDuration": 300000,
"endThreshold": 0.90,
"endThresholdDuration": 5000
}
]
}
}
}
- The DiskSpaceAvailableKB incident will trigger a warning once the available disk space falls below 10 GB.
- The ProcessorTimePercent incident will trigger a warning once the CPU has had at least 95% utilization for over 5 minutes. The incident will be considered resolved once the CPU is below 90% utilization for 5 seconds.
The default settings may or may not meet your requirements, and can be changed based on your environment. As an example, for an environment whose identifier was “staging-environment” to trigger a warning when the available disk space falls below 2 GBs, the configuration would look like:
{
"environments": {
"staging-environment": {
"monitoring": {
"incidents": {
"triggers": [
{
"counter": "DiskSpaceAvailableKB",
"severity": "warning",
"threshold": 2097152
}
]
}
}
}
}
}
Key | Data Type | Required? | Description |
---|---|---|---|
counter
|
String | Required |
The identifier for the hardware incident to monitor. available options are:
|
severity
|
String | Optional |
See Incident Severity Level. Default value: |
threshold
|
Number | Required | The threshold that must be surpassed before an incident is monitored. |
thresholdDuration
|
Number | Optional | The amount of time in milliseconds to monitor the situation before triggering an incident. If not specified, an incident will be triggered as soon as the threshold is reached. |
endThreshold
|
Number | Optional | The threshold that must be surpassed before an incident is considered resolved. |
endThresholdDuration | Number | Optional | The amount of time in milliseconds to monitor the situation before completing the incident. If not specified, an incident will be resolved as soon as the endThreshold is reached. If endThreshold is not defined, then threshold is used. |
Who can do this
Resource Monitoring Tool Administrator or a Resource Monitoring Tool user with Server/Environment Management role.