Debugging with DTrace

Recently I spoke at LSPE about using DTrace to do debugging. Videos of my talk, as well as video and/or slides of talks by Brendan Gregg, Adam Leventhal,and Dan Kimmel are available on that web site. I thought I would share my debugging talk in this blog post. I should mention that most of this blog covers debugging code written in C. I talk briefly at the end about debugging higher level (i.e., interpreted languages) code.

What is the most useful debugging tool? Some would say it depends on the programming language, some would say a specific debugger (gdb, for instance), and some might say code reviews. But generally, I think the debugging method that everyone uses most is to insert print statements into the code to be debugged. The most common thing that is done is to print information when a routine is entered and when it returns. For instance, you might print out the values of the arguments to the routine on entry, and print out the return value (and possibly where in the code the return is occuring) on return. While this is very useful, it can cause problems:

  • If the routine is called very frequently (tens or hundreds or more times per second), this leads to a huge amount of output.
  • The print statements can effect the timing of the program in such a way that the bug no longer occurs. Problem solved! Justleave in the print statements. Of course, this is not a solution.
  • Many times, print statements are added by enclosing them in an #ifdef DEBUG...#endif block. Once the code is debugged, the DEBUG flag is turned off, and the code is recompiled. A problem with this is that if/when a bug surfaces once the code is in production, you need to turn the flag back on, stop the running (buggy) instance, and restart with the new debug instance. This can be problematic for a couple of reasons: you need to make sure the code is otherwise the same as is being run at the customer's site where the bug has surfaced. And you need to stop the running (buggy) instance, which may or may not be something that the customer wants. Using a flag (perhaps via a "-d" option) means that a check is always made to see if the flag is set. For time critical applications, this may not be desireable.
  • Finally, the list of information you print is, in effect, hard coded into your print statements. During debugging, this list may change depending on the error you are encountering.

Using DTrace, we can address all of the issues mentioned above, without adding a line of code. DTrace allows you to trace entry into functions, and print out the arguments to the function. It also lets you trace return and return value from a function. You can use it safely in a production environment. No need to add ifdefs, debug flags, or anything to the existing code. Here's a simple example.

Let's say you are encountering an error that causes a segmentation violation in a library function. For instance, strcmp(3c) (note that this is a made up example. If you are encountering a segmentation violation in strcmp(3c), the bug itself is almost certainly not within strcmp.

First, we'll run the program.

# ./foo worldfound worldSegmentation Fault (core dumped)#

We can see it dies in strcmp(3c) by runningpstack(1) on the core(4) file.

# pstack corepstack: cannot examine core: no such process or core file# lsfoo        foo.c#

Where's the core file? The coreadm(1M) command should show us.

# coreadm     global core file pattern: /%Z/cores/core.%f.%p     global core file content: default       init core file pattern: /%Z/cores/core.%f.%p       init core file content: default            global core dumps: enabled       per-process core dumps: disabled      global setid core dumps: disabled per-process setid core dumps: disabled     global core dump logging: disabled

This says that core files are stored in /zonename/cores/core.program_name process_id. We're running in the "global" zone (from zonename(1)), so the core file should be in /zones/global/cores/core.foo.pid.

Let's take a look:

# ls /zones/global/corescore.foo.5711#

And try pstack(1) with that file:

# pstack /zones/global/cores/core.foo.5711core '/zones/global/cores/core.foo.5711' of 5711:       ./foo world fffffd7fff2264da strcmp () + 1a 00000000004011cd sub2 () + 18 00000000004011fc main () + 2d 0000000000400f9c _start () + 6c#

According to pstack(1), the program is dying at strcmp () + 1a, which is called fromsub2(). Without looking at the source code for strcmp(), the most likely reason for it to get a segmentation violation is for one of its arguments to be an invalid address. We can use DTrace to print when a call is made to strcmp(), and to print the arguments passed to it.

First, let's see what probes exist for entry to strcmp().

# dtrace -l -n 'pid$target::strcmp:entry{}' -c "./foo hello"   ID   PROVIDER            MODULE                          FUNCTION NAME64804    pid8490       LM1`ld.so.1                            strcmp entry64805    pid8490         libc.so.1   strcmp entry

The strcmp() function exists within the runtime linker (see ld.so.1(1)), and in libc.so.1. Why there are 2 functions will have to wait for a future blog.

There are 2 arguments to strcmp(3c) (see the manual page). We'll trace the arguments.

# dtrace -n 'pid$target::strcmp:entry{printf("strcmp: called in %s module, arg0 = %lx, arg1 = %lx\n", probemod, arg0, arg1);}' -c "./foo world"dtrace: description 'pid$target::strcmp:entry' matched 2 probesfound worlddtrace: pid 8506 terminated by SIGSEGVCPU     ID      FUNCTION:NAME  0  64804      strcmp:entry strcmp: called in LM1`ld.so.1 module, arg0 = 4008a5, arg1 = fffffd7fff3e5b0e...  0  64805      strcmp:entry strcmp: called in libc.so.1 module, arg0 = fffffd7fffdffdee, arg1 = 40126c  0  64805      strcmp:entry strcmp: called in libc.so.1 module, arg0 = fffffd7fffdffdee, arg1 = 361616161#

The last call is the one that is having the segmentation violation. The second argument does not look like a valid address. Since strcmp(3c) takes null-terminated strings as arguments, let's try to print them out as strings. Here, we need to copy the strings to the kernel (they exist in the user address space of the application).

# dtrace -n 'pid$target::strcmp:entry{printf("strcmp: called in %s module, arg0 = %s, arg1 = %s\n", probemod, copyinstr(arg0), copyinstr(arg1));}' -c "./foo world"dtrace: description 'pid$target::strcmp:entry' matched 2 probesfound worlddtrace: pid 8495 terminated by SIGSEGVCPU     ID      FUNCTION:NAME  0  64804      strcmp:entry strcmp: called in LM1`ld.so.1 module, arg0 = environ, arg1 = environ...  0  64805      strcmp:entry strcmp: called in libc.so.1 module, arg0 = world, arg1 = how are you?dtrace: error on enabled probe ID 1 (ID 64805: pid8495:libc.so.1:strcmp:entry): invalid address (0x361616260) in action #3 at DIF offset 12#

Here's the code for the buggy application.

    1    2  #include     3  #include     4  #include     5    6  int    7  sub1(void)    8  {    9          int i;   10          char buf[128];   11   12          for (i = 0; i < sizeof(buf); i++)   13                  buf[i] = 'a';   14          return(time(0));   15  }   16   17  void   18  sub3(char *p)   19  {   20        char *strings[] = {   21                  "hello",   22                  "world",   23                  "how are you?",   24        };   25        int i;   26   27        for (i = 0; strings[i] != NULL; i++) {   28                 if (strcmp(p, strings[i]) == 0)   29                        printf("found %s\n", p);   30        }   31  }   32   33  void   34  sub2(char *p)   35  {   36          sub3(p);   37  }   38   39  int   40  main(int argc, char *argv[])   41  {   42          time_t t;   43   44          t = sub1();   45   46          sub2(argv[1]);   47   48  }

A quick look at the code shows that the strings[] array is not null-terminated. Between line 23 and 24 there should be a 0 to terminate the array of pointers.

As mentioned earlier, DTrace can also be used to trace return values. Let's say you have a program that you suspect has a memory leak (the size of the address space, as returned from, for instance, pmap(1), is steadily growing over time. Generally, programs use some form of malloc(3c) and free(3c) to dynamically allocate and free memory.

For C++, this would be new and delete (which would be mangled names, see Using DTrace to Profile and Debug a C++ Program). We'll trace calls to and returns from malloc(3c), and calls to free(3c) for the application, and try to "see" the leak by finding calls to malloc() that do not have matching free() calls.

First, we'll just count calls to malloc and free. Generally, when a program starts it may make several malloc() calls without corresponding free() calls (in order to set up space that the program needs). Once the initial allocations are made, further allocations are typically at some point freed up. Note that this really depends on what the program is doing. Regardless, if a program allocates enough memory over time that it does not free up, there is a memory leak. Memory leaks generally effect the program that has the leak, but may effect other processes in the system (everyone if the program is running in the global zone, otherwise other processes running in the same non-global zone).

Here's the D script:

#!/usr/sbin/dtrace -s/* leak.d */#pragma D option quietBEGIN{    printf("Tracing, type control-c to end...\n");}pid$target:libc:malloc:entry{    @["malloc calls"] = count();}pid$target:libc:free:entry{    @["free calls"] = count();}

And here's output for a simple program that has a memory leak (the program is called "leak").

# ./leak.d -c ./leakTracing, type control-c to end...  free calls                                                       14  malloc calls                                                     24#

Let's trace returns from malloc(3c) and calls to free(3c).

#!/usr/sbin/dtrace -s/* leak1.d */#pragma D option quietpid$target:libc:malloc:entry{    self->nbytes = arg0;}pid$target:libc:malloc:return/self->nbytes/{    printf("%s: allocated %d bytes at address %lx\n",      probefunc, self->nbytes, arg1);    self->nbytes = 0;}pid$target:libc:free:entry{    printf("%s: freeing space at address %lx\n", probefunc, arg0);}

Running this shows:

#./leak1.d -c ./leakmalloc: allocated 10 bytes at address 411630malloc: allocated 20 bytes at address 412640malloc: allocated 30 bytes at address 412670free: freeing space at address 411630free: freeing space at address 412640malloc: allocated 10 bytes at address 411630malloc: allocated 20 bytes at address 412640malloc: allocated 30 bytes at address 4126a0free: freeing space at address 411630free: freeing space at address 412640malloc: allocated 10 bytes at address 411630malloc: allocated 20 bytes at address 412640malloc: allocated 30 bytes at address 4126d0free: freeing space at address 411630free: freeing space at address 412640malloc: allocated 10 bytes at address 411630malloc: allocated 20 bytes at address 412640malloc: allocated 30 bytes at address 412700...

So, it looks like every third malloc(3c) call is not getting freed. Let's get a stack trace for those calls.

Here's a DTrace script to do that.

#!/usr/sbin/dtrace -s/* leak2.d */#pragma D option quietpid$target:libc:malloc:entry/arg0 == 30/{    printf("malloc of 30 bytes occurred at:\n");    ustack();}

The script watches for 30 byte malloc(3c) calls. (Tracing every third call instead of 30 byte calls is left as an exercise).

And the output:

# ./leak2.d -c ./leakmalloc of 30 bytes occurred at:              libc.so.1`malloc              leak`sub3+0x18              leak`main+0x35              leak`_start+0x6cmalloc of 30 bytes occurred at:              libc.so.1`malloc              leak`sub3+0x18              leak`main+0x35              leak`_start+0x6c...

At this point we could go into the source code and see the malloc() call in sub3(), and determine where (and if), the free(3c) should occur. (For the code to this tiny example, send me email).

DTrace can also be used to print arbitrary structures that are passed to routines as arguments. At user level, you can #include header files and tell DTrace to invoke the C preprocessor to handle them. Here's an example D script which traces entry to a routine call foo() and prints out the first argument, which is a struct fubar *. The structure is defined in the header file, /root/example3.h. I'll put all of the code on github if there is interest.

#!/usr/sbin/dtrace -qCs/* example3.d */#include "/root/example3.h"pid$target::foo:entry{    print(*(struct fubar *)copyin(arg0, sizeof(struct fubar)));}

Running this with the example shows:

# ./example3.d -c "./example3 30"struct fubar {    long a = 0x1e    char *ptr = 0x4010e0    char [10] data = [ 't', 'h', 'e', 'r', 'e', '\012', '\0', '\0', 'H', '\37777777774' ]}#

One thing to note is that the value for ptr is a user-level address. To see what ptr points to, we would have to use the copyin() or copyinstr() action. What if we don't pass the example3 program an argument?

# ./example3.d -c "./example3"#

The probe doesn't trigger. This means the foo() function is not getting called.

Earlier, I mentioned that using print statements in a function that is called very often leads to a huge amount of output. One way to get around this is to aggregate the calls to the function, using for instance, the argument(s) passed to the function as keys. Something like:

# dtrace -n 'pid$target:::entry{@[probefunc] = count();}' -p 1996dtrace: description 'pid$target:::entry' matched 4846 probes  __nanosleep                       41  ioctl                             41  nanosleep                         41  sleep                             41#

Here, I am tracing function calls made by some arbitrary process. The four functions shown are being called roughly once a second (I waited about 40 seconds before typing control-c). To aggregate on the arguments passed to ioctl, I could do:

# dtrace -n 'pid$target::ioctl:entry{@[arg0, arg1, arg2] = count();}' -p 1996dtrace: description 'pid$target::ioctl:entry' matched 2 probes                5       1074030207  -2748781167336              10#

So, after about 10 seconds, ioctl was called with arguments 5,1074030207, and -2748781167336. The last 2 arguments should probably be printed in hex. The first argument to ioctl(2) is a file descriptor. Running pfiles(1) on pid 1996 would give us the file being used. If we wanted hex output, we could do:

# dtrace -n 'pid$target::ioctl:entry{@[arg0, arg1, arg2] = count();} END{printa("fd = %d, cmd = %lx, arg = %lx called %@d times\n", @);}' -p 1996dtrace: description 'pid$target::ioctl:entry' matched 3 probesCPU     ID                    FUNCTION:NAME  0      2                             :END fd = 5, cmd = 4004667f, arg = fffffd7fffdffd18 called 9 times#

So the ioctl(2) is always being done with the same arguments. It is a little interesting that it matched 2 probes. Let's see what they are:

# dtrace -l -n 'pid$target::ioctl:entry{}' -p 1996   ID   PROVIDER            MODULE                          FUNCTION NAME65335    pid1996       LM1`ld.so.1                             ioctl entry68108    pid1996         libc.so.1                             ioctl entry#

Once again, there is the standard C library call, and an instance in the runtime linker library.

Let's say you need to trace some arbitrary place in the code. For instance, you have some code that looks like the following:

...#ifdef DEBUG    fprintf(stderr, "data at line $d in file %s is %d\n",        __LINE__, __FILE__, x);#endif /* DEBUG */...

Instead of using #ifdef, you could add a user statically defined probe (USDT). The advantage to this is that to turn on the probe, you do not need to recompile the code. And when the probe is not enabled, the static probe is a few nop instructions, so there is basically 0 overhead if you are not using the probe. Statically defined probes are documented at Statically Defined Tracing for User Applications. Here is the portion of C with the USDT probe.

...    for (i = 0; strings[i] != NULL; i++) {      DTRACE_PROBE3(exampleusdt_prov, example, __LINE__, __FILE__, i);...

The name of the provider for the probe is exampleusdt_provpid (same form as the pid provider, with process id at the end of the provider name). The probe name is example, and the probe has 3 arguments, the line number in the source file where the probe triggered, the name of the source file, and the value of the index, i.

To build the new USDT probe(s), we'll need a D file that defines the probe(s).

# cat prov.dprovider exampleusdt_prov {    probe example();};#

To build this takes a few steps, shown here:

# gcc -m32 -c exampleusdt.c# dtrace -G -32 -s prov.d exampleusdt.o# dtrace -G -32 -s prov.d exampleusdt.o   <-- yes, twice # gcc -m32 -o exampleusdt exampleusdt.o prov.o

Let's see if the probe is there.

# dtrace -l -n 'exampleusdt_prov$target:::{}' -c "./exampleusdt hello"   ID   PROVIDER            MODULE                          FUNCTION NAME69671 exampleusdt_prov9950       exampleusdt                              sub3 example#

And here's a D script that uses the probe.

#!/usr/sbin/dtrace -s/* exampleusdt.d */#pragma D option quietexampleusdt_prov$target:::{       printf("probemod = %s, probefunc = %s, probename = %s\n",         probemod, probefunc, probename);      printf("line number = %d, filename = %s, i = %d\n",       arg0, copyinstr(arg1), arg2);}

And here is output.

# ./exampleusdt.d -c "./exampleusdt hello"found helloprobemod = exampleusdt, probefunc = sub3, probename = exampleline number = 30, filename = exampleusdt.c, i = 0probemod = exampleusdt, probefunc = sub3, probename = exampleline number = 30, filename = exampleusdt.c, i = 1probemod = exampleusdt, probefunc = sub3, probename = exampleline number = 30, filename = exampleusdt.c, i = 2probemod = exampleusdt, probefunc = sub3, probename = exampleline number = 30, filename = exampleusdt.c, i = 3#

The "found hello" in the above is coming from the application. The rest of the output is the probe firing.

What if you're working in an interpreted language (perl, python, javascript, node.js, java, etc.)? In general, there are probes for the engine itself (perl itself, the JVM, etc.), but not for the scripts you are writing. For this, you will need to use USDT probes. This is due to the interpretive nature of the language. DTrace can't know without more information about what, for instance, a perl script itself is doing. An example of doing this for a Node.JS script can be found at NodeDublin Dtrace Talk. Searching for other languages and DTrace will yield more information.



Post written by Mr. Max Bruning