How to use truss, pstack, and DTrace to Debug High CPU Type Problems on Solaris

This post describes ways to begin to diagnose why a process is using more CPU than normal.

Why is this abnormal?

First, it is important to determine why you think the CPU usage is unusual. How much is it normally? Why do you think what it is doing now is worth investigating? That may seem obvious, but understanding it will help the support engineer help you with your problem.

About collecting the following outputs

There are several options for collecting the various outputs requested in the following sections. You could pipe the output of each command to the “tee” command so as to both see it and record it to a file. You could redirect the output of each command to a separate file.

Cut-n-paste of the terminal output into a word processing program is probably the least favored. The easiest is probably to start a “script” session in each window, each specifying a separate output file name.

$ script terminal-1
Script started, file is terminal-1
$ pwd
[DIR]
  .
  . various commands and output omitted
  .
$ exit
Script done, file is terminal-1

$ ls -l terminal-1
-rwxrwxrwx+ 1 USER GROUP 225 Mar 3 18:11 terminal-1
$

Also note that you can then issue the “date” command in the window to record the time when one command finished or the next is about to be entered. Then you can compress and upload the individual output files. Do not forget to exit the script session at the end. Try to avoid using editors during the script sessions.

Does it stay this way?

If the process continues consuming an unusual amount of CPU so that you can observe it and take some action:

In one window, start prstat:

# prstat -cL -n30 5 12

If it is specific processes, also use -p [pid-list]

Here, The -c means to continue appending output to the bottom of the screen in non-graphics mode. This makes the output easier to collect to a file and cut-n-paste. The -L means to report the top LWPs rather than top processes. The -n30 means to report the top 30. The “5 12” means to report at 5-second intervals for 12 times - ie, 1 minute.

In another window do pstack, truss, and pstack again:

pstack [pid]
pstack [pid]
pstack [pid]
truss -o truss.out -vall -alefDd -p [pid]
    *** wait 30 seconds and then Ctrl/C
pstack [pid]
pstack [pid]
pstack [pid]

While the prstat is running for 1 minute in the first window, in the second window do the above sequence. This will show 3 snapshots of the state of all the threads in the process, capture truss of everything the process is doing for 30 seconds, and then examine the state of the threads again.

With that, we can see which LWPs are using the most CPU and then look at the pstack and truss outputs to identify what those threads are doing. This will probably not lead immediately to a resolution of the problem (if it is a problem at all). But it should help determine how to proceed with the investigation.

Or is it intermittent, periodic, otherwise hard to catch? hot-smtp script

If the problem comes and goes - you only notice it via reporting after the fact or it stops before you can catch it using the above - you could use a DTrace script like the attached hot-smtp.d.

The attached script has two lines which define what processes it will monitor:

/execname=="tcp_smtp_server" && arg0==0/
/execname=="tcp_smtp_server" && arg1==0/

To use this script to monitor a different processes, change the process name on those two lines.

This script uses the DTrace profile provider to monitor when processes with the specified name are on the CPU and record the user mode or system mode thread stacks. Once per minute, it displays the top 20 (or less) thread stacks and a count of how many times each occurred. For example:

# ./hot-smtp.d
2014 Jan 21 10:16:28 monitoring... will report once per minute. Ctrl/C to end.
^C
2014 Jan 21 10:16:45
top 20 on-CPU user-mode stacks:

libc.so.1`__posix_asctime_r+0x120
libc.so.1`ctime_r+0x2c
libimta.so`os_time_t_to_timestr+0x38
libimta.so`os_deliver_start_timestr+0x1c
libimta.so`os_get_deliver_start_timestr+0x24
libimta.so`mmc_winit+0x140
libimta.so`mail+0x1430
libimta.so`smtpc_enqueue+0x17e0
tcp_smtp_server`tcp_smtp_slave+0x22c
tcp_smtp_server`tcp_smtp_slave_pre+0x40
libimta.so`dispatcher_newtcp+0x564
libc.so.1`_lwp_start
1

libimta.so`ap_parse+0x224
libimta.so`ap_fixup_parse+0xe8
libimta.so`mmc_determine_url+0x30ac
libimta.so`mmc_alias_lookup_url+0x6c4
libimta.so`mm_wadr2+0x7c14
libimta.so`mm_wadr1+0x1580
libimta.so`mmc_wadr+0x928
libimta.so`receipt+0xa38
libimta.so`smtpc_enqueue+0x1818
tcp_smtp_server`tcp_smtp_slave+0x22c
tcp_smtp_server`tcp_smtp_slave_pre+0x40
libimta.so`dispatcher_newtcp+0x564
libc.so.1`_lwp_start
1

libimta.so`pattern_match_wild+0x1c4
libimta.so`mmc_bigapply_mapping+0x63c
libimta.so`mmc_apply_mapping+0x40
libimta.so`mm_rewrite_domain+0x6984
libimta.so`inner_tree_to_mailbox+0x754
libimta.so`mmc_tree_to_mailbox+0x170
libimta.so`mmc_parse_address+0x170
libimta.so`switchchannel+0x1c8
libimta.so`hello+0x730
libimta.so`smtpc_enqueue+0x17a4
tcp_smtp_server`tcp_smtp_slave+0x22c
tcp_smtp_server`tcp_smtp_slave_pre+0x40
libimta.so`dispatcher_newtcp+0x564
libc.so.1`_lwp_start
1
   .
   .
   .

top 20 on-CPU system-mode stacks:

unix`sfmmu_modifytte_try
unix`current_thread+0x164
unix`sfmmu_tteload_addentry+0x3f4
unix`sfmmu_tteload_array+0x6c
unix`hat_do_memload+0xe0
genunix`segvn_faultpage+0x578
genunix`segvn_fault+0xbf0
genunix`as_fault+0x4c8
unix`pagefault+0xac
unix`trap+0xd50
unix`utl0+0x4c
1
   .
   .
   .

It must be run as root and if zones are involved, it must be run in the global zone.

The above is a poor example. The system was not particularly busy. Only a single SMTP session started while it was run. It shows that each of the thread stacks shown occurred only a single time. On a large production system where you think tcp_smtp_server is using more CPU than you think it should, the output would be more interesting. It would show only the top 20 thread stacks. You would then look into what those functions are doing and determine what protocol exchange might have that result.

In either case, the next step depends on what you find here.

Messages with many recipient headers

In the original issue, the vast majority of the thread stacks were related to locking within the libc memory allocation functions called by recipeint address processing like this:

libc.so.1`mutex_trylock_adaptive+0x1b4
libc.so.1`mutex_lock_impl+0x1e8
libc.so.1`free+0x28
libimta.so`ap_pop+0x84
libimta.so`ap_parse+0x514
libimta.so`ap_fixup_parse+0xd8
libimta.so`mmc_address_to_tree+0x2f4
libimta.so`mmc_sender_rewrite+0x188
libimta.so`mmc_rewrite_header+0x570
libimta.so`mm_build_header_instance+0x3994
libimta.so`mmc_wtend+0x121b0
libimta.so`data+0x170c
libimta.so`smtpc_enqueue+0x1714
tcp_smtp_server`tcp_smtp_slave+0x224
tcp_smtp_server`tcp_smtp_slave_pre+0x34
libimta.so`dispatcher_newtcp+0x470
libc.so.1`_lwp_start

Which is an indication that messages are being sent with header fields containing at least a thousand and more likely tens of thousands of addresses. Possibly the thousands of addresses aren’t actually on a recipient header line, but are merely appearing in some other, non-recipient header line, such as message-id’s on a References: header line – but the more common case is the user is actually sending a message to/cc/bcc lots and lots of recipients, whether or not the submission has been broken up into multiple, separate SMTP transactions.

Overview of syscalls

Another option, to get an overview of syscall usage by a type of process, is the syscalls.d script. In the attached version, you will need to change a line near the top. It currently looks like:

syscall::*:entry
/ execname == "mmap_munmap_test" /
{
  self->syscallstart = timestamp;
}

Change “mmap_munmap_test” to the name of the type of process you want to monitor - “ims_master”, for example.

As root, in the global zone. Let it run for a minute or two, then prcess Ctrl/C.

# ./syscalld

^C
    munmap called  557 times, total 10690 ms
     fstat called  558 times, total     7 ms
      mmap called  558 times, total   217 ms
      open called  558 times, total   284 ms
     close called  558 times, total   890 ms
     lseek called 1116 times, total     8 ms
     write called 1116 times, total   233 ms
    fdsync called 1116 times, total  4683 ms
Total time in syscalls 17015 ms

This type of DTrace script works by watching the entry and return from all syscalls matching the criteria in the predicate (the execname == test, in this case). So you need to let it run long enough to catch both the entry and return of syscalls which are taking a long time. If some syscalls are taking 10 or 20 seconds and you only let this run for a few seconds, you won’t see those long ones. So a minute or two or more is recommended.