KeySafe 5 v1.6.1 Metrics

Introduction

The following tables display metric exposure in KeySafe 5, the source of the metric, and whether it was available in nShield Monitor or SNMP.

In the tables, the term "resource" refers to whether a statistic belongs to an HSM or a host.

The HSM metrics can apply to the actual HSM ("module") or the HSM "chassis".

OpenMetrics

nshield_hsm

The labels on the HSM.

Type Details

Information

Has the label "label".

nshield_error_conditions

Type Details Labels stattree node stattree ID

stateset

  • failed

  • okay

source="psu_failed"

HostEnvStats

PSUFailure

nshield_uptime_seconds

The length of elapsed time since the HSM was last reset. This does not include the HSM chassis.

Type Unit Details stattree node stattree ID SNMP MIB

counter

seconds

ModuleEnvStats

Uptime

moduleStatsTable > uptime

nshield_commands

The total number of commands sent for processing from a client to the server or from the server to an HSM. The number of commands currently being processed is the CmdCount minus the ReplyCount.

For an HSM, this is the number of commands received from any client.

Type Details stattree node stattree ID

counter

ModuleJobStats

CmdCount

nshield_replies

The total number of replies sent from a server to a client or from an HSM to a server.

For an HSM, this is the number of replies sent to any client.

Type Details stattree node stattree ID

counter

ModuleJobStats

ReplyCount

nshield_objects_stored

The number of times the object store has had a new object put into it.

Type Details stattree node stattree ID

counter

ModuleObjStats

ObjectsCreated

nshield_objects_destroyed

The number of items that have been deleted from the HSM’s object store and had their corresponding memory released.

Type Details stattree node stattree ID

counter

ModuleObjStats

ObjectsDestroyed

nshield_current_clients

The number of client connections currently made to the Connect hardserver.

Type stattree node stattree ID

gauge

ServerGlobals

ClientCount

nshield_current_clients_limit

The number of licensed client connections available.

Type stattree node stattree ID

counter

ServerGlobals

MaxClients

nshield_current_crypto_clients

The number of licensable clients that are currently connected, including both active and parked sessions.

This is only relevant when reported from a hardserver with remote clients that have an image version of 13.5 or later.

Type stattree node stattree ID

gauge

ServerGlobals

CryptoClientCount

nshield_audit_db_free_bytes

Type Unit stattree node stattree ID

gauge

bytes

ServerGlobals

AuditDBFreeSpaceMB

nshield_audit_db_used_bytes

Type Unit stattree node stattree ID

gauge

bytes

ServerGlobals

AuditDBUsedSpaceMB

nshield_queue_in_progress

All jobs currently in progress on the HSM, including jobs from the SEE machine.

Type stattree node stattree ID

gauge

ModuleServerStats

JobsOutstanding

Host statistics

nshield_current_clients

The number of client connections currently made to the server.

Type stattree node stattree ID

gauge

ServerGlobals

ClientCount

nshield_current_crypto_clients

The number of licensable clients that are currently connected, including both active and parked sessions.

This is only relevant when reported from a hardserver with remote clients that have an image version of 13.5 or later.

Type Unit stattree node stattree ID

gauge

bytes

ServerGlobals

CryptoClientCount

nshield_audit_db_free_bytes

Type Unit stattree node stattree ID

gauge

bytes

ServerGlobals

AuditDBFreeSpaceMB

nshield_audit_db_used_bytes

Type Unit stattree node stattree ID

gauge

bytes

ServerGlobals

AuditDBUsedSpaceMB

nshield_connection_commands

The total number of commands sent for processing from a client to the server.

Type Details stattree node stattree ID

counter

Has label "connection".

Connections

CmdCount

nshield_connection_replies

The total number of replies sent from the server to the client.

Type Details stattree node stattree ID

counter

Has label "connection".

Connections

ReplyCount

nshield_host

The labels on the host.

Type Details

info

Has label "label".

System load

nshield_cpu_load_per_hsm

The current processing load on the HSM. This is represented as a number from 0 to 100.

HSMs typically contain several different types of processing resources, such as the main CPU and RSA acceleration. This means that reporting on HSM load can be imprecise.

Normally, HSMs report 100% CPU load when all RSA processing capacity is occupied. When performing non-RSA tasks, the main CPU and other resources, such as the random number generator, can become saturated without this metric reaching 100%.

Type Details Labels stattree node stattree ID

gauge

A value between 0 and 1.

Has label "source".

source="module"

ModuleJobStats

CPULoadPercent

source="chassis"

HostEnvStats

CPULoadPercent

Temperatures

nshield_temperature_celsius

The current temperature of different parts of the HSM.

Type Units Details

gauge

celsius

Has label "sensor".

See the following sections for more details about the labels.

nshield_temperature_celsius: Module

The current temperature of the main circuit board. First-generation HSMs do not have a temperature sensor and so do not return temperature statistics.

Labels stattree node stattree ID

sensor="module_cpu_temp"

ModuleEnvStats

CurrentCPUTemp1

sensor="module_msp_temp"

ModuleEnvStats

TempSP

sensor="module_crypto_co_proc_temp"

ModuleEnvStats

CurrentCPUTemp2

nshield_temperature_celsius: Connect (chassis)

The ambient sensors for the chassis.

Labels stattree node stattree ID

sensor="chassis_left"

HostEnvStats

CurrentTempC

sensor="chassis_right"

CurrentTemp2C

nshield_temperature_limit_celsius

The minimum and maximum acceptable temperature values for each sensor.

Type Units Details Labels

gauge

celsius

Has labels "sensor" and "limit".

For sensor, see nshield_temperature_celsius

limit="maximum"

nshield_max_temperature_celsius

The maximum temperature recorded by the HSM’s temperature sensor. This is stored in non-volatile memory and is cleared when the unit is initialized.

Type Units Details

gauge

celsius

Has label "sensor".

See the following table for more details about the labels:

Labels stattree node stattree ID

sensor="module_cpu_temp"

ModuleEnvStats

MaxTempC

sensor="chassis_left"

HostEnvStats

MaxTempC

sensor="chassis_right"

HostEnvStats

MaxTemp2C

nshield_min_temperature_celsius

The minimum temperature recorded by the HSM’s temperature sensor. This is stored in non-volatile memory and is cleared when the unit is initialized.

Type Units Details

gauge

celsius

Has label "sensor".

See the following table for more details about the labels:

Labels stattree node stattree ID

sensor="module_cpu_temp"

ModuleEnvStats

MinTempC

sensor="chassis_left"

HostEnvStats

MinTempC

sensor="chassis_right"

HostEnvStats

MinTemp2C

Electrical

nshield_platform_voltage_volts

stattree mapping

Type Units Details

gauge

volts

Has label "voltage_sensor".

See the following table for more details about the labels:

Labels stattree node stattree ID

voltage_sensor="cpu_core"

ModuleEnvStats

CPUVoltage1

voltage_sensor="t1022_ifc_io"

CPUVoltage2

voltage_sensor="t1022_serdes"

CPUVoltage3

voltage_sensor="t1022_serdes_io"

CPUVoltage4

voltage_sensor="fpga_serdes_core"

CPUVoltage5

voltage_sensor="fpga_serdes_io"

CPUVoltage6

voltage_sensor="msp_avcc"

CPUVoltage7

voltage_sensor="ddr4_access"

CPUVoltage8

voltage_sensor="ddr4_io"

CPUVoltage9

voltage_sensor="pci_bus"

CPUVoltage10

voltage_sensor="module_battery"

CPUVoltage11

Fans

nshield_fan_speed_rpm

The fan speed for each fan in the HSM.

KeySafe 5 assumes fan speeds greater than 120,000rpm are errors, and instead reports a speed of zero.

Type Units Details

gauge

rpm

Has label "fan_id".

Labels stattree node stattree ID EnvMon

fan_id="chassis1"

HostEnvStats

CurrentFanRPM

fan1_rpm

fan_id="chassis2"

CurrentFan2RPM

fan2_rpm

fan_id="chassis3"

CurrentFan3RPM

fan3_rpm

fan_id="chassis4"

CurrentFan4RPM

fan4_rpm

nshield_fan_speed_limit_rpm

The fan speed limits for each fan in the HSM.

These are hardcoded for each HSM type.

Type Units Details Labels

gauge

rpm

Has labels "fan_id" and "limit".

For fan_id, see nshield_fan_speed_rpm.

limit = "maximum" and "minimum".

Memory

nshield_module_mem_bytes

The total amount of RAM, allocated and free, available to the HSM. This is equal to the installed RAM size, minus various fixed overheads.

It is a static value that is calculated by KeySafe 5.

Type Units stattree node stattree ID

gauge

bytes

ModuleEnvStats

MemTotal

nshield_module_mem_alloc_kernel_bytes

The total amount of RAM allocated for kernel use, or non-SEE use, in a module. This is mainly used for the object store, for example, for keys and logical tokens, and for big-number buffers.

Type Units stattree node stattree ID

gauge

bytes

ModuleEnvStats

MemAllocKernel

nshield_chassis_mem_alloc_kernel_bytes

The total amount of RAM allocated for kernel use, or non-SEE use, in a module. This is mainly used for the object store, for example, for keys and logical tokens, and for big-number buffers.

Type Units stattree node stattree ID

gauge

bytes

HostEnvStats

MemAllocKernel

nshield_module_mem_alloc_user_bytes

The total amount of RAM allocated for user-mode processes in the module. This will be zero for non-SEE use.

This value includes the size of the SEE Machine image and the total heap space available to it. The module’s kernel does not know, and therefore cannot report, how much of the user-mode’s heap is currently free and how much is in use.

Type Units stattree node stattree ID

gauge

bytes

ModuleEnvStats

MemAllocUser

nshield_module_mem_alloc_user_bytes

The total amount of RAM allocated for user-mode processes in the module. This will be zero for non-SEE use.

This value includes the size of the SEE Machine image and the total heap space available to it. The module’s kernel does not know, and therefore cannot report, how much of the user-mode’s heap is currently free and how much is in use.

Type Units stattree node stattree ID

gauge

bytes

HostEnvStats

MemAllocUser

Storage

nshield_module_nvram_free_bytes

The total amount of free space in the NVRAM of the HSM.

This is only available on XC and nShield 5 HSM variants.

Type Units stattree node stattree ID

gauge

bytes

ModuleEnvStats

NVMFreeSpace

nshield_module_nvram_erase_per_endurance

The wear level of the HSM’s NVRAM, expressed as a percentage of the "erase count:endurance" ratio.

This is only available on XC and nShield 5 HSM variants.

Type Details stattree node stattree ID

gauge

A value between 0 and 1

ModuleEnvStats

NVMWWearLevel

nshield_module_worn_blocks_per_nvram

The percentage of worn blocks in the NVRAM of the HSM

This is only available on XC and nShield 5 HSM variants.

Type Details stattree node stattree ID

gauge

A value between 0 and 1

ModuleEnvStats

NVMWornBlocks

Internal software statistics

nshield_pci_irqs

The number of interrupts from the host. This is approximately equal to the total of HostReadCount and HostWriteCount.

This is only applicable to PCI HSMs.

Type Details stattree node stattree ID

counter

ModulePCIStats

HostIRQs

nshield_pci_unhandled_irqs

The number of unidentified interrupts from the host. If this reports a nonzero value, it is likely that there is a problem with a driver or the PCI bus.

This is only applicable to PCI HSMs.

Type Details stattree node stattree ID

counter

ModulePCIStats

HostUnhandledIRQs

nshield_pci_read_reconnect

The number of deferred reads that have now completed. This should be the same as HostReadDeferred, or one less than it if there is a currently deferred read.

This is only applicable to PCI HSMs.

Type Details stattree node stattree ID

counter

ModulePCIStats

HostReadReconnect

nshield_AIS31_preliminary_alarms

The number of times the AIS31 random number test has failed. Because this test is a statistical test, a small number of failures is expected. If it fails too often, it will trigger a SOS-HRAO alarm and the module will fail.

Type Details stattree node stattree ID

counter

ModuleEnvStats

AIS31PrelimAlarms

nshield_correctable_memory_errors

The number of correctable memory errors that have been corrected by the error checking and correction (ECC) mechanisms. Typically, this count should be 0, although a small number of errors are to be expected occasionally. If this count increases rapidly, by multiple thousands per second, there has been a malfunction.

Type Details stattree node stattree ID

counter

ModuleEnvStats

MceCount

nshield_spi_communication_attempts

The number of times the main processor on an XC module has had to repeat an attempt to communicate with the security processor due to a communication failure. Loss of communication between the main processor and the security processor cause the module to enter an alarm state and fail. This sometimes triggers an SOS-HV alarm.

Type Details stattree node stattree ID

counter

ModuleEnvStats

SpiRetries