prstat(1M) output
Two weeks ago, I wrote about the /proc
tools, butdidn't talk about one of the most useful tools that use /proc
on SmartOS. This is prstat(1M).This tool is analagous in many ways to top(1) on Linux. Note thattop(1)
does work on SmartOS/illumos/Solaris basedsystems. Brendan Gregg wrote an article comparing the performanceimpact of prstat(1M)
versus top(1)
a fewyears back, available here.Rather than concentrate on performance, I want to spend some timegoing into details of prstat(1M)
output, and how it canbe useful. This blog will also show some DTrace, because it is such auseful tool, and I like it so much.
First, if you want to use top(1)
on a SmartOSSmartMachine, do the following:
# pkgin install top-3.6.1nb1calculating dependencies... done.nothing to upgrade.1 packages to be installed: top-3.6.1nb1 (77K to download, 0B to install)proceed ? [Y/n] Ydownloading packages...top-3.6.1nb1.tgz 100% 37KB 37.5KB/s 37.5KB/s 00:00 installing packages...installing top-3.6.1nb1...pkg_install warnings: 0, errors: 0reading local summary...processing local summary...updating database: 100%marking top-3.6.1nb1 as non auto-removable#
Now, on to prstat(1M)
.
In its simplest form, prstat
reports statistics aboutprocesses currently on the system, sorted by cpu usage and updatedevery 5 seconds. By default, it will show as many processes as fit inthe window or terminal where it is being run.
# prstat PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 7160 root 3864K 2772K cpu0 1 0 0:00:00 0.2% prstat/1 6919 root 3176K 2180K sleep 1 0 0:00:00 0.0% bash/1 3708 root 4020K 1520K sleep 59 0 0:00:00 0.0% sshd/1 3680 root 2376K 1424K sleep 56 0 0:00:00 0.0% ttymon/1 3571 root 5500K 3500K sleep 56 0 0:00:18 0.0% nscd/30 3675 root 2120K 1324K sleep 1 0 0:00:00 0.0% sac/1 3683 root 4076K 2624K sleep 56 0 0:00:00 0.0% inetd/3 3676 root 3276K 2352K sleep 59 0 0:00:00 0.0% rsyslogd/5 3681 root 1632K 920K sleep 1 0 0:00:01 0.0% utmpd/1 3094 root 6648K 5072K sleep 59 0 0:00:09 0.0% svc.startd/12 3107 root 8136K 6976K sleep 56 0 0:00:18 0.0% svc.configd/14 3674 root 1860K 1212K sleep 1 0 0:00:00 0.0% cron/1 3056 root 2136K 1536K sleep 1 0 0:00:02 0.0% init/1 3697 root 1932K 1372K sleep 59 0 0:00:00 0.0% ttymon/1 3446 root 2528K 1644K sleep 59 0 0:00:00 0.0% pfexecd/3Total: 18 processes, 81 lwps, load averages: 0.00, 0.00, 0.00
The above output shows the process id, user, size of the virtualaddress space, the resident set size (size currently in memory), state(on which processor the process is running, or sleep if the process iswaiting), scheduling priority (higher is run before lower values),nice value, time spent by the process, percentage of cpu used by theprocess, the process "name", and the number of threads in the processFor instance, nscd
, process id 3571 has 30 threads.
Unlike top
, prstat
has few interactivecommands. Typing 'q' causes prstat to quit. Other input causes it torefresh the display. Also, you can specify an interval (called"delay" in top) of 0 seconds, which causes prstat
to continuously update the output(Try prstat 0
to see this). The default interval is 5 seconds.
prstat can be useful in determining not only which processes/threadsare using memory and cpu resources on your system. You can also use it to focuson specific users, processes, zones, and tasks. (Tasks on SmartOS arenot the same as processes. See newtask(1) fordetails.)
Like top
, prstat
redraws the screen everyinterval. To get "continuous" output, use the -c
option.Unlike top
, prstat
can be used to reportmicrostate process accounting information. This option reportspercentage of time spent in user mode, kernel mode, and sleeping. Italso reports the number of voluntary and involuntary context switches,the number of system calls, and the number of signals received.In addition, this reports time spent processing traps, text pagefaults, data page faults, waiting for user level lcoks, and waitingfor the CPU (shown as LAT
).
Let's look at an example.
# prstat -cPlease wait... PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 76798 root 10M 9380K sleep 59 0 0:05:05 3.2% tserver/1001 16145 103 56M 46M sleep 59 0 0:43:11 0.0% beam.smp/51 95 root 0K 0K sleep 99 -20 0:47:33 0.0% zpool-zones/162 5879 root 9680K 4976K sleep 57 0 0:23:24 0.0% zoneadmd/5 16790 nobody 58M 36M sleep 59 0 0:11:53 0.0% node/4 16610 nobody 61M 42M sleep 59 0 0:22:39 0.0% node/3 17119 907 45M 24M sleep 59 0 0:14:41 0.0% node/4 15050 root 58M 39M sleep 59 0 0:11:56 0.0% node/4 15119 nobody 141M 104M sleep 1 0 0:25:05 0.0% node/4 3570 root 26M 18M sleep 100 - 0:05:40 0.0% node/6 14710 root 22M 17M sleep 1 0 0:01:26 0.0% node/3 14489 root 43M 20M sleep 1 0 0:03:30 0.0% node/5 14661 root 45M 23M sleep 59 0 0:03:25 0.0% node/5 14107 root 18M 14M sleep 58 0 0:06:32 0.0% node/6 14022 root 22M 17M sleep 1 0 0:01:35 0.0% node/3Total: 462 processes, 3132 lwps, load averages: 0.91, 0.75, 0.50 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 76798 root 10M 9380K sleep 59 0 0:05:08 3.4% tserver/1001 95 root 0K 0K sleep 99 -20 0:47:33 0.0% zpool-zones/162 16145 103 56M 46M sleep 59 0 0:43:11 0.0% beam.smp/51 16610 nobody 62M 43M sleep 59 0 0:22:39 0.0% node/3 16790 nobody 58M 36M sleep 59 0 0:11:53 0.0% node/4 17119 907 45M 24M sleep 59 0 0:14:41 0.0% node/4 15050 root 58M 39M sleep 59 0 0:11:56 0.0% node/4 5879 root 9680K 4976K sleep 57 0 0:23:24 0.0% zoneadmd/5 15119 nobody 141M 104M sleep 1 0 0:25:05 0.0% node/4 3180 root 53M 30M sleep 59 0 0:11:54 0.0% node/7 14489 root 43M 20M sleep 1 0 0:03:30 0.0% node/5 3570 root 26M 18M sleep 100 - 0:05:40 0.0% node/6 16055 root 45M 24M sleep 1 0 0:03:26 0.0% node/5 14107 root 18M 14M sleep 58 0 0:06:32 0.0% node/6 14022 root 22M 17M sleep 1 0 0:01:35 0.0% node/3Total: 462 processes, 3132 lwps, load averages: 0.91, 0.75, 0.50...
From the above output, the tserver
process is spending3.2% of its time on the cpus. tserver
is a multithreadedprocess (1001 threads) that accepts connection requests andimmediately closes the connection. It was written specifically forthis blog, along with a client running on other machine(s).Let's take a look at microstate information. (There is a detaileddescription of microstate accounting in msacct.c.)
# prstat -cmLp 76798Please wait... PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 76798 root 7.8 0.7 0.0 0.0 0.0 3.8 88 0.1 1K 124 8K 0 tserver/1 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 131 0 133 0 tserver/661 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 103 0 104 0 tserver/271 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 108 0 109 0 tserver/460 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 107 0 108 0 tserver/374 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 182 0 185 0 tserver/487 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 132 0 133 0 tserver/526 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 84 0 84 0 tserver/663 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 102 0 103 0 tserver/604 76798 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 160 0 163 0 tserver/710 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 95 0 96 0 tserver/229 76798 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 99 0 100 0 tserver/202 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 91 0 91 0 tserver/389 76798 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 89 0 89 0 tserver/288 76798 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 129 0 131 0 tserver/507Total: 1 processes, 1001 lwps, load averages: 1.08, 1.07, 0.96 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 76798 root 12 0.7 0.0 0.0 0.0 6.9 80 0.1 2K 212 8K 0 tserver/1 76798 root 0.2 0.0 0.0 0.0 0.0 100 0.0 0.0 121 0 121 0 tserver/487 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 121 0 121 0 tserver/712 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 121 0 121 0 tserver/460 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 118 0 118 0 tserver/661 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 122 1 122 0 tserver/374 76798 root 0.0 0.1 0.0 0.0 0.0 100 0.0 0.0 318 3 356 0 tserver/788 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 120 0 120 0 tserver/670 76798 root 0.0 0.1 0.0 0.0 0.0 100 0.0 0.0 383 0 408 0 tserver/682 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 120 0 120 0 tserver/710 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 118 1 119 0 tserver/648 76798 root 0.0 0.1 0.0 0.0 0.0 100 0.0 0.0 337 6 353 0 tserver/542 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 117 1 118 0 tserver/271 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 113 1 114 0 tserver/202 76798 root 0.1 0.0 0.0 0.0 0.0 100 0.0 0.0 107 0 108 0 tserver/518Total: 1 processes, 1001 lwps, load averages: 1.09, 1.07, 0.96...
From the above output, lwpid 1 is spending 12% of its time in usermode (USR
), 0.7% in kernel mode (SYS
). Itis not processing traps, text page faults, or data page faults(TRP
, TFL
, and DFL
respectively. Lwpid 1 is waiting on user level locks 6.9% of itstime, the other lwps (i.e., threads) in the process are spending allof their time waiting for user level locks. Lwpid 1 is sleeping 80%of its time (SLP
). There appears to be almost 0contention for the cpus on the system (LAT
is 0 exceptfor 0.1% for lwpid 1. Lwpid 1 has make ~2000 voluntary contextswitches, the others between ~100 and ~400 (VCX
).Involuntary context switches are close to 0 for everyone except lwpid1 (ICX
). Voluntary context switches occur when a threadgives up the cpu because it can not do further work (for instance,waiting for a lock or waiting for I/O). Involuntary context switchesoccur when a thread is preempted or has used up its time slice. Thethreads are making a few hundred system calls (except for lwpid 1,with 8K of system calls (SCL
). No thread is handlingany signals.
Perhaps the most interesting thing to note in the output is thatexcept for lwpid 1, the threads (lwpids) shown are different for eachinterval.
Let's use DTrace to see scheduling activity for the process's threads.
# dtrace -n 'sched:::on-cpu/execname == "tserver"/{@[tid]=count();} tick-1sec{printa(@); clear(@);}'dtrace: description 'sched:::on-cpu' matched 4 probesCPU ID FUNCTION:NAME 8 67387 :tick-1sec... <-- almost 1000 lines omitted 498 126 271 128 799 128 458 132 78 133 482 136 644 143 206 158 931 162 1 2236
From this, every thread in the process is getting on cpu in a 1 secondinterval (the total number of lines of output every second is 1001,the total number of threads), yet prstat shows 100% of time for almostall of these threads to be waiting for user level locks.
As mentioned above, tserver
accepts connection requestsand immediately closes the connection. Let's see how many accept(3SOCKET)calls it handles per second.
# dtrace -qn 'syscall::accept:return/pid == 76798 && arg0 >= 0/{@ = count();} tick-1sec{printa(@); clear(@);}' 351 765 729 747 587 1164 737 839 1074 554...
When tserver
accepts a connection, it wakes up a threadto handle the connection (again, handling the connection means simplyclosing it). Incoming connections are placed in a job queue, and aworker thread removes jobs from the queue, closing each connection asit does so. Let's see how many close(2)
calls are madeper worker thread. The "worker" threads are the other 1000 threads.The main thread (lwpid 1) handles the accept calls and wakes up aworker thread to handle it.
# dtrace -qn 'syscall::close:return/pid == 76798 && arg0 >= 0/{@[tid] = count();} tick-1sec{printa(@); trunc(@);}' 3 1 6 1 17 1 24 1 25 1... 957 12 270 19 778 23 493 41 411 52
Again, almost all of the threads call close at least once.Let's use the pid
provider to take a closer look at howthreads are being woken up.
# dtrace -qn 'pid76798:::entry/tid == 1/{@[probefunc] = count();} tick-1sec{printa(@); trunc(@);}' __lwp_park 5... setsockopt 560 __so_accept 561 _so_accept 561 accept 561 cond_broadcast 561... enqueue 19494 queue_unlink 19765
Every time an accept call is made, the thread usescond_broadcast
to do the wakeup. This will cause allof the worker threads to wake up. It is a classic case of thundering herd. We'll change the code to wake up only 1 worker, (cond_signal
) to see how that effects the behavior.
# prstat -cPlease wait... PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 98575 root 10M 9380K sleep 59 0 0:00:02 0.2% tserver_sig/1001 95 root 0K 0K sleep 99 -20 0:48:09 0.1% zpool-zones/162 16145 103 56M 46M sleep 59 0 0:43:44 0.0% beam.smp/51 5879 root 9680K 4976K sleep 57 0 0:24:08 0.0% zoneadmd/5 15119 nobody 128M 100M sleep 59 0 0:25:46 0.0% node/4 16610 nobody 68M 48M sleep 59 0 0:23:00 0.0% node/3 17119 907 45M 25M sleep 59 0 0:14:56 0.0% node/4 15050 root 54M 33M sleep 1 0 0:12:13 0.0% node/4 16790 nobody 59M 38M sleep 59 0 0:12:10 0.0% node/4 3180 root 51M 29M sleep 1 0 0:12:03 0.0% node/7 99141 root 4540K 3432K cpu1 59 0 0:00:00 0.0% prstat/1 14837 root 43M 20M sleep 1 0 0:03:30 0.0% node/5 14320 root 42M 21M sleep 56 0 0:03:26 0.0% node/5 13422 root 46M 23M sleep 59 0 0:03:29 0.0% node/5 14107 root 18M 14M sleep 58 0 0:06:36 0.0% node/6Total: 462 processes, 3132 lwps, load averages: 0.16, 0.41, 0.80 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 98575 root 10M 9380K sleep 59 0 0:00:02 0.1% tserver_sig/1001 95 root 0K 0K sleep 99 -20 0:48:09 0.1% zpool-zones/162 16145 103 56M 46M sleep 59 0 0:43:44 0.0% beam.smp/51 16610 nobody 69M 49M sleep 59 0 0:23:00 0.0% node/3 15119 nobody 128M 100M sleep 59 0 0:25:46 0.0% node/4 5879 root 9680K 4976K sleep 57 0 0:24:08 0.0% zoneadmd/5 16790 nobody 59M 38M sleep 59 0 0:12:10 0.0% node/4 17119 907 45M 25M sleep 59 0 0:14:56 0.0% node/4 15050 root 54M 33M sleep 1 0 0:12:13 0.0% node/4 3180 root 51M 30M sleep 1 0 0:12:03 0.0% node/7 99141 root 4684K 3568K cpu1 59 0 0:00:00 0.0% prstat/1 14837 root 43M 20M sleep 1 0 0:03:30 0.0% node/5 14320 root 42M 21M sleep 56 0 0:03:26 0.0% node/5 14107 root 18M 14M sleep 58 0 0:06:36 0.0% node/6 14537 root 43M 22M sleep 1 0 0:03:32 0.0% node/5Total: 462 processes, 3132 lwps, load averages: 0.15, 0.41, 0.79
The main thing to note in the above output is that now the server isusing 0.1% of the cpu, as opposed to ~3.0% in the broadcast version,more than an order of magnitude in reduced resource usage.
The source code for the client and servers are here. Of course,use at your own risk.
Post written by Mr. Max Bruning