KeySafe 5 v1.6.1 Metrics
Introduction
The following tables display metric exposure in KeySafe 5, the source of the metric, and whether it was available in nShield Monitor or SNMP.
In the tables, the term "resource" refers to whether a statistic belongs to an HSM or a host.
The HSM metrics can apply to the actual HSM ("module") or the HSM "chassis".
OpenMetrics
nshield_error_conditions
| Type | Details | Labels | stattree node | stattree ID |
|---|---|---|---|---|
stateset |
|
source="psu_failed" |
|
|
nshield_uptime_seconds
The length of elapsed time since the HSM was last reset. This does not include the HSM chassis.
| Type | Unit | Details | stattree node | stattree ID | SNMP MIB |
|---|---|---|---|---|---|
counter |
seconds |
|
|
|
nshield_commands
The total number of commands sent for processing from a client to the server or from the server to an HSM.
The number of commands currently being processed is the CmdCount minus the ReplyCount.
For an HSM, this is the number of commands received from any client.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_replies
The total number of replies sent from a server to a client or from an HSM to a server.
For an HSM, this is the number of replies sent to any client.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_objects_stored
The number of times the object store has had a new object put into it.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_objects_destroyed
The number of items that have been deleted from the HSM’s object store and had their corresponding memory released.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
ModuleObjStats |
|
nshield_current_clients
The number of client connections currently made to the Connect hardserver.
| Type | stattree node | stattree ID |
|---|---|---|
gauge |
|
|
nshield_current_clients_limit
The number of licensed client connections available.
| Type | stattree node | stattree ID |
|---|---|---|
counter |
|
|
nshield_current_crypto_clients
The number of licensable clients that are currently connected, including both active and parked sessions.
This is only relevant when reported from a hardserver with remote clients that have an image version of 13.5 or later.
| Type | stattree node | stattree ID |
|---|---|---|
gauge |
|
|
nshield_audit_db_free_bytes
| Type | Unit | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
Host statistics
nshield_current_clients
The number of client connections currently made to the server.
| Type | stattree node | stattree ID |
|---|---|---|
gauge |
|
|
nshield_current_crypto_clients
The number of licensable clients that are currently connected, including both active and parked sessions.
This is only relevant when reported from a hardserver with remote clients that have an image version of 13.5 or later.
| Type | Unit | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_audit_db_free_bytes
| Type | Unit | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_audit_db_used_bytes
| Type | Unit | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_connection_commands
The total number of commands sent for processing from a client to the server.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
Has label "connection". |
|
|
System load
nshield_cpu_load_per_hsm
The current processing load on the HSM. This is represented as a number from 0 to 100.
HSMs typically contain several different types of processing resources, such as the main CPU and RSA acceleration. This means that reporting on HSM load can be imprecise.
Normally, HSMs report 100% CPU load when all RSA processing capacity is occupied. When performing non-RSA tasks, the main CPU and other resources, such as the random number generator, can become saturated without this metric reaching 100%.
| Type | Details | Labels | stattree node | stattree ID |
|---|---|---|---|---|
gauge |
A value between 0 and 1. Has label "source". |
source="module" |
|
|
source="chassis" |
|
|
Temperatures
nshield_temperature_celsius
The current temperature of different parts of the HSM.
| Type | Units | Details |
|---|---|---|
gauge |
celsius |
Has label "sensor". |
See the following sections for more details about the labels.
nshield_temperature_celsius: Module
The current temperature of the main circuit board. First-generation HSMs do not have a temperature sensor and so do not return temperature statistics.
| Labels | stattree node | stattree ID |
|---|---|---|
sensor="module_cpu_temp" |
|
|
sensor="module_msp_temp" |
|
|
sensor="module_crypto_co_proc_temp" |
|
|
nshield_temperature_limit_celsius
The minimum and maximum acceptable temperature values for each sensor.
| Type | Units | Details | Labels |
|---|---|---|---|
gauge |
celsius |
Has labels "sensor" and "limit". |
For sensor, see nshield_temperature_celsius limit="maximum" |
nshield_max_temperature_celsius
The maximum temperature recorded by the HSM’s temperature sensor. This is stored in non-volatile memory and is cleared when the unit is initialized.
| Type | Units | Details |
|---|---|---|
gauge |
celsius |
Has label "sensor". |
See the following table for more details about the labels:
| Labels | stattree node | stattree ID |
|---|---|---|
sensor="module_cpu_temp" |
|
|
sensor="chassis_left" |
|
|
sensor="chassis_right" |
|
|
nshield_min_temperature_celsius
The minimum temperature recorded by the HSM’s temperature sensor. This is stored in non-volatile memory and is cleared when the unit is initialized.
| Type | Units | Details |
|---|---|---|
gauge |
celsius |
Has label "sensor". |
See the following table for more details about the labels:
| Labels | stattree node | stattree ID |
|---|---|---|
sensor="module_cpu_temp" |
|
|
sensor="chassis_left" |
|
|
sensor="chassis_right" |
|
|
Electrical
nshield_platform_voltage_volts
stattree mapping
| Type | Units | Details |
|---|---|---|
gauge |
volts |
Has label "voltage_sensor". |
See the following table for more details about the labels:
| Labels | stattree node | stattree ID |
|---|---|---|
voltage_sensor="cpu_core" |
|
|
voltage_sensor="t1022_ifc_io" |
|
|
voltage_sensor="t1022_serdes" |
|
|
voltage_sensor="t1022_serdes_io" |
|
|
voltage_sensor="fpga_serdes_core" |
|
|
voltage_sensor="fpga_serdes_io" |
|
|
voltage_sensor="msp_avcc" |
|
|
voltage_sensor="ddr4_access" |
|
|
voltage_sensor="ddr4_io" |
|
|
voltage_sensor="pci_bus" |
|
|
voltage_sensor="module_battery" |
|
Fans
nshield_fan_speed_rpm
The fan speed for each fan in the HSM.
KeySafe 5 assumes fan speeds greater than 120,000rpm are errors, and instead reports a speed of zero.
| Type | Units | Details |
|---|---|---|
gauge |
rpm |
Has label "fan_id". |
| Labels | stattree node | stattree ID | EnvMon |
|---|---|---|---|
fan_id="chassis1" |
|
|
|
fan_id="chassis2" |
|
|
|
fan_id="chassis3" |
|
|
|
fan_id="chassis4" |
|
|
nshield_fan_speed_limit_rpm
The fan speed limits for each fan in the HSM.
These are hardcoded for each HSM type.
| Type | Units | Details | Labels |
|---|---|---|---|
gauge |
rpm |
Has labels "fan_id" and "limit". |
For fan_id, see nshield_fan_speed_rpm. limit = "maximum" and "minimum". |
Memory
nshield_module_mem_bytes
The total amount of RAM, allocated and free, available to the HSM. This is equal to the installed RAM size, minus various fixed overheads.
It is a static value that is calculated by KeySafe 5.
| Type | Units | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_module_mem_alloc_kernel_bytes
The total amount of RAM allocated for kernel use, or non-SEE use, in a module. This is mainly used for the object store, for example, for keys and logical tokens, and for big-number buffers.
| Type | Units | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_chassis_mem_alloc_kernel_bytes
The total amount of RAM allocated for kernel use, or non-SEE use, in a module. This is mainly used for the object store, for example, for keys and logical tokens, and for big-number buffers.
| Type | Units | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_module_mem_alloc_user_bytes
The total amount of RAM allocated for user-mode processes in the module. This will be zero for non-SEE use.
This value includes the size of the SEE Machine image and the total heap space available to it. The module’s kernel does not know, and therefore cannot report, how much of the user-mode’s heap is currently free and how much is in use.
| Type | Units | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
nshield_module_mem_alloc_user_bytes
The total amount of RAM allocated for user-mode processes in the module. This will be zero for non-SEE use.
This value includes the size of the SEE Machine image and the total heap space available to it. The module’s kernel does not know, and therefore cannot report, how much of the user-mode’s heap is currently free and how much is in use.
| Type | Units | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
Storage
nshield_module_nvram_free_bytes
The total amount of free space in the NVRAM of the HSM.
This is only available on XC and nShield 5 HSM variants.
| Type | Units | stattree node | stattree ID |
|---|---|---|---|
gauge |
bytes |
|
|
Internal software statistics
nshield_pci_irqs
The number of interrupts from the host.
This is approximately equal to the total of HostReadCount and HostWriteCount.
This is only applicable to PCI HSMs.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_pci_unhandled_irqs
The number of unidentified interrupts from the host. If this reports a nonzero value, it is likely that there is a problem with a driver or the PCI bus.
This is only applicable to PCI HSMs.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_pci_read_reconnect
The number of deferred reads that have now completed.
This should be the same as HostReadDeferred, or one less than it if there is a currently deferred read.
This is only applicable to PCI HSMs.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_AIS31_preliminary_alarms
The number of times the AIS31 random number test has failed. Because this test is a statistical test, a small number of failures is expected. If it fails too often, it will trigger a SOS-HRAO alarm and the module will fail.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_correctable_memory_errors
The number of correctable memory errors that have been corrected by the error checking and correction (ECC) mechanisms. Typically, this count should be 0, although a small number of errors are to be expected occasionally. If this count increases rapidly, by multiple thousands per second, there has been a malfunction.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|
nshield_spi_communication_attempts
The number of times the main processor on an XC module has had to repeat an attempt to communicate with the security processor due to a communication failure. Loss of communication between the main processor and the security processor cause the module to enter an alarm state and fail. This sometimes triggers an SOS-HV alarm.
| Type | Details | stattree node | stattree ID |
|---|---|---|---|
counter |
|
|