Debugging SEE machines

This chapter provides some guidance on debugging an SEE machine.

Debugging settings and output

To debug an SEE application effectively, you must have:

  • Enabled SEE debugging when creating the Security World in which the application is to run.

    SEE debugging is enabled when creating a Security World by specifying the new-world command-line utility’s dsee or dseeall features; for more information, see the User Guide. We do not recommend specifying the dseeall feature for Security Worlds in a production system.
  • Set Cmd_CreateSEEWorld_Args_flags_EnableDebug when creating the SEE World.

    If you try to set the Cmd_CreateSEEWorld_Args_flags_EnableDebug flag in a Security World that does not allow SEE debugging, the CreateSEEWorld command returns AccessDenied. This also occurs if you call CreateSEEWorld in a Security World where SEE debugging is restricted and an appropriate certifier is not present.

Debugging authorization

Access to the SEE trace buffer is controlled by the Security World in which the SEE machine runs. Every Security World has exactly one of the following properties:

  • Restricted SEE debugging

    This is the default setting. When SEE debugging is restricted, there is no delegation key from KNSO for accessing the SEE trace buffer. All Security Worlds created by software released before the introduction of SEE have restricted SEE debugging. A full quorum of Administrator Cards is required to access the SEE trace buffer in such Security Worlds.

  • Authorized SEE debugging

    In this case, a delegation key from KNSO exists to allow access to the SEE trace buffer. A subset of a full quorum of the Administrator Cards is required to access the SEE trace buffer in such Security Worlds. This delegation key must have been created and the number of cards required to authorize access to the SEE trace buffer must have been specified when the Security World was created.

  • No access-control SEE debugging

    In this case, no authorization of any kind is required for accessing the SEE trace buffer. No cards are required to access the SEE trace buffer in such Security Worlds. This property must have been specified when the Security World was created.

Obtaining debugging output

For SEE machines that require support from a host-side see-*-serv utility, you can run the see-*-serv utilities with the --trace or --plain-trace option to perform tracing automatically.

For SEE machines using the SEElib architecture, the TraceSEEWorld() command can be used to return debugging information. An example of this is provided in the a3a8 host-side example code. See A3A8 example.

Data written to standard output and standard error on the HSM is written to the SEE World’s Trace Buffer. The Trace Buffer is a 3000 character circular buffer: if more than 3000 characters are written to it without being retrieved, information is lost on a first-in/first-out basis. The TraceSEEWorld command retrieves the contents of the buffer so that the host can analyze or display them.

If the SEE machine crashes, a SEE register dump is printed to the SEE Trace Buffer for the nShield Solo, but not for the nShield Solo XC.

For example, assume that the HSM code calls the following command:

printf("Hello World!\n");

The string Hello World!\n is pushed into the Trace Buffer. A host-side call to TraceSEEWorld would then return this string and empty the buffer.

If a SEE World is terminated by the HSM (for instance, if its last remaining thread exits or it causes a fatal signal to be raised), a diagnostic message is usually sent to the Trace Buffer to help debug the problem.

Example Debug

If an illegal access violation (segmentation fault) occurs, the tail of the Trace Buffer looks similar to this:

   *** World exits: thread 28 caused CPU exception
DSI exception:
Exception vector 00300h
   r0 =001D9E40h r1 =001D9F38h r2 =00C4E090h r3 =00000008h
   r4 =00000000h r5 =00C00444h r6 =00000000h r7 =001C21B1h
   r8 =00C39CB8h r9 =00000019h r10=40000000h r11=00002000h
   r12=00000000h r13=00D08048h r14=00000000h r15=00000000h
   r16=00000000h r17=00000000h r18=00000000h r19=00000000h
   r20=00000000h r21=00000000h r22=00000000h r23=00C40000h
   r24=FFFC5CD0h r25=00C3A750h r26=00C40000h r27=00C40000h
   r28=00000000h r29=00000000h r30=00000000h r31=00D00000h
   XER=20000000h CR =20000000h LR =00C00444h CTR=00C39B9Ch
   PC =00C00448h MSR=0000F030h
   f0 =0000000000000000h f1 =0000000000000000h
   f2 =0000000000000000h f3 =0000000000000000h
   f4 =0000000000000000h f5 =0000000000000000h
   f6 =0000000000000000h f7 =0000000000000000h
   f8 =0000000000000000h f9 =0000000000000000h
   f10 =0000000000000000h f11 =0000000000000000h
   f12 =0000000000000000h f13 =0000000000000000h
   f14 =0000000000000000h f15 =0000000000000000h
   f16 =0000000000000000h f17 =0000000000000000h
   f18 =0000000000000000h f19 =0000000000000000h
   f20 =0000000000000000h f21 =0000000000000000h
   f22 =0000000000000000h f23 =0000000000000000h
   f24 =0000000000000000h f25 =0000000000000000h
   f26 =0000000000000000h f27 =0000000000000000h
   f28 =0000000000000000h f29 =0000000000000000h
   f30 =0000000000000000h f31 =0000000000000000h
   FPSCR=00000000h

The program counter, which is currently at position 00C00448h in the PowerPC-based compilation shows where this access occurs.

The following excerpt from the PowerPC based map file created at application link time (by specifying the -map option to the linker) indicates that the problem address is in main.o:

.text 0x00c00000                 0x3a0ac
  *(.text.stub.text.*.gnu.linkonce.t.*)
    .text 0x00c00000             0xa5c usermain.o
          0x00c00160             main
    .text 0x00c00a5c             0x544 .\lib-ppc-gcc\seelib.a(nfstrerr.o)
          0x00c00a5c             NFast_StrError

To find out which instruction is causing the segmentation fault, calculate the offset into main.o. The formula is:

program_counter - object_base_address

The calculation is as follows:

00C00448h -
00C00000
--------
0x00448h

Once the location of the problem is located in this way, investigate it as follows:

  1. Recompile the source with the -g option and no optimization (if you did not originally compile it with these options).

  2. Run an object dump utility on the object files powerpc-codesafe-linux-gnu-objcopy.

The head of the generated object is now similar to the following for PowerPC based objects:

434:    38 7a 03 34   addi    r3,r26,820
438:    38 80 00 08   li      r4,8
43c:    4c c6 31 82   crclr   4*cr1+eq
440:    48 00 00 01   bl      440 <main+0x2e0>
444:    38 60 00 08   li      r3,8
448:    80 03 00 00   lwz     r0,0(r3)
44c:    4b ff fe 74   b       2c0 <main+0x160>
450:    3c 80 00 00   lis     r4,0

From this output is it possible to see that the segmentation fault is caused by an illegal access to the pointer held in R4 (which the register dump showed to be 80000004h, an obviously invalid user mode memory address). The source shows plainly that the instruction at offset 0458h in usermain.o is trying to assign to *i, but i has not been allocated. The bug can now be fixed and the program rebuilt.

Finding memory leaks with stattree

You can use the stattree command-line utility to find memory leaks. Run the command as follows:

> stattree | find "Mem"

For each HSM in the Security World, this command produces output that reports values for the total memory (MemTotal), the memory currently allocated to the kernel (MemAllocKernel), and the memory currently allocated to the loaded SEE machine (MemAllocUser).

If no SEE machine is loaded, the output from this stattree command (if there is only one HSM) looks similar to the following:

-MemTotal             128921600
-MemAllocKernel       1355776
-MemAllocUser         0

If an SEE machine is loaded, the output from this stattree command (if there is only one HSM) looks similar to the following:

-MemTotal             128921600
-MemAllocKernel       1355776
-MemAllocUser         1032192

You can monitor a loaded SEE machine’s memory usage by either repeatedly running and checking output from stattree or by writing code to call the nCore statistics APIs directly. In any case, if any reported memory value appears to being growing continuously over time, this probably indicates some kind of memory leak.

Segment addresses for Solo

SEE executables are non-relocatable; that is, they are loaded in memory at the addresses specified in the image. Ensure that you choose these addresses carefully so that they map onto usable RAM and do not overlap with memory being used by the kernel. Typically, this means you must choose an address at the high end of RAM.

Different HSM types have different mappable memory ranges.

  • The CodeSafe compiler sets all values for Solo XC and later HSM models.

  • You have to set the ranges in the CodeSafe application code if you are developing for Solo +.

    The rest of this section describes guidelines for Solo +.

To determine your HSM type, run the enquiry command-line utility and check the SEE Machine Type output. You can then determine where the mappable memory range starts from this table:

SEE Machine Type Start of mappable range

PowerPCSXF

0x00400000

These ranges follow the approximately 4MB of RAM reserved for use by the kernel.

You can use the stattree command-line utility to find the total length of the mappable range. Run the command:

> stattree | find "MemTotal"

This command produces output that reports values for the total memory (MemTotal) for each HSM in the Security World.

For Solo +, we recommend the following segment addresses as starting points:

SEE Machine Type

PowerPCSXF

text segment start

0xa00000

data segment start

0x00d00000

Arguments to the linker

-Ttext 0xa00000 -Tdata 0xd00000

For large SEE machines more space may be needed in the text segment, causing a linker error of the following form:

powerpc-codesafe-linux-gnu-ld: section .data [00d00000 -> 00d0327f] overlaps section .text [00c00000 -> 00d7bd8b]
powerpc-codesafe-linux-gnu-ld: section .sdata [00d03280 -> 00d035ef] overlaps section .text [00c00000 -> 00d7bd8b]
powerpc-codesafe-linux-gnu-ld: section .sbss [00d035f0 -> 00d036ab] overlaps section .text [00c00000 -> 00d7bd8b]
powerpc-codesafe-linux-gnu-ld: section .bss [00d036b0 -> 00d0854f] overlaps section .text [00c00000 -> 00d7bd8b]

To resolve this example error, you could move the data segment start point upward (for example, to 0x00e00000) as necessary to prevent the overlap. Alternatively (or additionally), you could move the text segment start point downward.

Vulnerability test harness

We supply a test harness called vulnerability.o that can be used for debugging SEE machines. It supplies a standard set of command-line arguments and environment variables to the SEE environment, as well as providing the standard stdioe and socket support.

Because the vulnerability.o test harness is insecure, we recommend that you not link vulnerability.o into a production SEE machine.

Troubleshooting guide

Symptom Possible problems Solution

SEEJob takes a long time then fails with HardwareFailed

The SEE machine has deadlocked or entered an infinite loop which prevents the job from returning and causes the SEEJob to trigger the command time-out.

Check the code for possible deadlocks or infinite loops. Non-obvious problems can be debugged by writing progress reports to the Trace Buffer and calling TraceSEEWorld after the job returns HardwareFailed.

CreateSEEWorld fails with BadMachineImage

No SEE machine is loaded.

Load an SEE machine

SEE machine loading fails with BadMachineImage

The file being loaded is not a correctly formatted SAR file.

Ensure that the correct SEE machine file is being loaded. Ensure that the SEE machine has been properly processed by the Trusted Code Tool into a SAR file.

The SEE machine file is corrupted.

Rebuild the SEE machine, or revert to a known good back-up.

The SEE machine has been compiled or linked with the wrong options.

SEE machines must be nonexecutable, uncompressed, non-relocatable AIFs or SXFs, packaged as SAR files.

CreateSEEWorld fails with InvalidCertificate

The machine signing hash on userdata signatures does not match any signature hash on the currently loaded machine.

Ensure the correct SEE machine with the correct signatures is loaded.

Ensure the correct user data is being passed to CreateSEEWorld.

Ensure the user data signatures are correct.

SEE machine loading fails with InvalidCertificate.

The SEE machine signatures were created incorrectly.

SEE machine signatures must be created with the machine key specification --is-machine. Recreate the SEE machine SAR with correct signatures.

The SEE machine crashes, and Trace Buffer output shows raised signal.

Dependent on signal number.

Check stdh.h and signal.h for signal descriptions then check the code to see how that signal could be raised.

AccessDenied from CreateSEEWorld.

SEE World debugging is not available in Security World.

Check the Security World’s SEE debugging policy.

SEE machine is returning AccessDenied in SEElib_initComplete.

Check the SEE machine set-up code to see where it might be passing AccessDenied to SEElib_initComplete, and fix the cause of that, if necessary.

All SEEJobs return with Status_Cancelled.

SEElib transaction listener is not running.

If you are using SEElib_Transact you must call SEElib_StartTransactListener before making use of SEElib_Transact.

NoModuleMemory is returned from the CreateSEEWorld command.

Segment addresses clash with kernel pages.

Adjust segment positions away from kernel RAM; see Segment addresses for Solo
.

Segment addresses overlap.

Adjust segment away from each other; see Segment addresses for Solo
.

Segment addresses are not usable RAM.

Adjust segment positions to usable RAM; see Segment addresses for Solo
.

NoModuleMemory is returned when loading a SEE machine.

Userdata has been specified but is not expected.

Exclude the userdata.

The previous SEE machine has not been cleared

Error from link: section .data [hhhhhhhh → hhhhhhhh] overlaps section .text [hhhhhhhh → hhhhhhhh]

Segment addresses overlap.

Adjust segment away from each other; see Segment addresses for Solo
.