Thursday, March 22, 2007

Dynamic Tracing in sun Solaris - DTrace

A simple document I prepared for understading DTrace Probes...


What is DTrace?

DTrace is a comprehensive dynamic tracing facility that is built into Solaris and can be used by administrators and developers to examine the behavior of both user programs and of the operating system itself. With DTrace you can explore your system to understand how it works, track down performance problems across many layers of software, or locate the cause of aberrant behavior. It is safe to use on production systems and does not require restarting either the system or applications.

DTrace Overview

DTrace dynamically modifies the operating system kernel and user processes to record data at locations of interest, called probes. A probe is a location or activity to which DTrace can bind a request to perform a set of actions, like recording a stack trace, a timestamp, or the argument to a function. Probes are like programmable sensors scattered all over your Solaris system in interesting places. DTrace probes come from a set of kernel modules called providers, each of which performs a particular kind of instrumentation to create probes.

Project Objective

The major focus from the perspective of a project would be to
  • Provide a real-time FAQ answering questions as per request.
  • To solve common operational problems faced by Back Office.
  • To provide appropriate resons for aberrant behavior.
To sum up, the tool should act as a perfomance analysis tool helping Sun Solaris 10 users know where they are stuck and identify the internal details of their processes running.

Basic DTrace Functions

  • Examine the behavior of user programs and the Solaris OS and quickly identify the root causes of system and application bottlenecks.
  • Highlight trends and patterns to tune systems for best performance.
  • Track down performance problems across many layers of software.
  • Locate the cause of aberrant behavior.
  • Write reusable scripts for common or complex routines.

How to use DTrace?

DTrace includes a new scripting language called D which is designed specifically for dynamic tracing. With D it is easy to write scripts that dynamically turn on probes, collect the information, and process it. D scripts make it convenient for users to share their knowledge and troubleshooting methods with others

Introduction to D Scripts

Template

probe description
/predicate/ { actions }

Probe Description

provider:module:function:name

  • Provider: The name of the DTrace provider that is publishing this probe. The provider name typically corresponds to the name of the DTrace kernel module that performs the instrumentation to enable the probe.

  • Module: If this probe corresponds to a specific program location, the name of the module in which the probe is located. This name is either the name of a kernel module or the name of a user library.

  • Function: If this probe corresponds to a specific program location, the name of the program function in which the probe is located.

  • Name: The final component of the probe name is a name that gives you some idea of the probe's semantic meaning, such as BEGIN or END.

Predicate A predicate can be any D expression. Below are a few examples. The action is executed only when the predicate evaluates to true.

Actions The action section can contain a series of action commands separated by semi-colons (;)

Compilation and Instrumentation

  • How DTrace Command Executes? When you use the dtrace command you are invoking the compiler for the D language. Once your program is compiled, it is sent into the operating system kernel for execution by DTrace.There the probes that are named in your program are enabled and the corresponding provider performs whatever instrumentation is needed to activate them.
  • What about the Performance? All of the instrumentation in DTrace is completely dynamic: probes are enabled discretely only when you are using them.No instrumented code is present for inactive probes, so your system does not experience any kind of performance degradation when you are not using DTrace. Once your experiment is complete and the dtrace command exits, all of the probes you used are automatically disabled and their instrumentation is removed, returning your system to its exact original state. No effective difference exists between a system where DTrace is not active and one where the DTrace software is not installed.
  • Where does DTrace run and why should we prefer DTrace? The instrumentation for each probe is performed dynamically on the live running operating system or on user processes you select. The system is not paused in any way, and instrumentation code is added only for the probes that you enable. As a result, the probe effect of using DTrace is limited to exactly what you ask DTrace to do: no extraneous data is traced, no one big �tracing switch� is turned on in the system, and all of the DTrace instrumentation is designed to be as efficient as possible. These features enable you to use DTrace in production to solve real problems in real time.
  • Are D-scripts safe? DTrace D programs are compiled into a safe intermediate form that is used for execution when your probes fire. This intermediate form is validated for safety when your program is first examined by the DTrace kernel software. The DTrace execution environment also handles any run-time errors that might occur during your D program's execution, including dividing by zero, dereferencing invalid memory, and so on, and reports them to you. As a result, you can never construct an unsafe program that would cause DTrace to inadvertently damage the Solaris kernel or one of the processes running on your system.These safety features allow you to use DTrace in a production environment without worrying about crashing or corrupting your system. If you make a programming mistake, DTrace will report your error to you, disable your instrumentation, and you can correct your mistake and try again.

DTrace Limitations

  • Dtrace is not a good tool for post-mortem analysis. For example, one problem frequently encountered in Networking is the "where did my packet get queued up?"
  • dtrace will not be able to directly analyze the sequence of events that led to a memory leak or reference-count leak after the leak has occurred.

DTrace Code Snippets

The pid Provider

  • Here is the D script you can run to print all the functions that process id 1234 calls
/* The above line means that the script that follows needs to be
interpreted using dtrace. D uses C-style comments. */
pid1234:::entry
{}

You can run .d files using the command #dtrace -s file.d

  • The following example adds aggregates in the script you are developing to see a summary table of user functions
 pid1234:::entry
{
@count_table[probefunc]=count();
}

This script will collect information into a table and will continue to run until you press ^c (Control c). Once you stop the script, DTrace will print out the table of information. Notice that you do not need to write any code to print the table. DTrace automatically does this for you.

  • This script can easily be modified to collect a table with function and library name by changing the index to probemod, probefunc
 pid1234:::entry
{
@count_table[probemod,probefunc]=count();
}

  • Find how much time is being spent in each function. This can be done through the built in variable timestamp. You can create probes in the entry and return of the functions and calculate the time spent by the difference in timestamp from entry to return. The timestamp variable reports time in nanoseconds. Here is the modified script
pid1234:::entry
{
ts[probefunc] = timestamp;
}

pid$1:::return
{
@func_time[probefunc] = sum(timestamp - ts[probefunc]);
ts[probefunc] = 0;
}

Note that the ts[] is an array, and D has automatically declared and initialized it for you. It is good practice to save space by setting the variable to 0.
  • How to monitor a process from the time it starts until it ends. DTrace allows you to do this using the $target variable and the -c option. This script will count the number of times libc functions are called from a given application.
pid$target:libc::entry
{
@[probefunc]=count();
}

Save this script as libc_func.d and run it.

# libc_func.d -c "cat /etc/hosts"

You can easily replace "cat /etc/hosts" with the command of interest.

Sysinfo Provider

  • The following script counts the number of times various processes get to run in the CPU. Note that the sysinfo:::pswitch probe fires when a process is switched to run on the CPU. Remember to press ^c (Control c) after a few minutes.
sysinfo:::pswitch
{
@[execname] = count();
}

Syscall Provider

This is probably the most important provider to learn and use because system calls are the main communication channel between user level applications and the kernel. Knowing which system calls are being used among other information establishes metrics of system usage and identifies possible misbehavior. With the syscall provider you can easily identify who executes what and how much time a certain operation takes, helping you to identify the root cause of system misbehavior.

  • Script to find the stack trace when the program makes the write system call. Note that you need to run this with the -c option.
syscall::write:entry
{
@[ustack()]=count();
}
  • To list all the occurrences of the probe when it was fired and give information about the system calls at entry into the system that are performing a close system call, use the following script:

# dtrace -n syscall::close:entry

  • To start to identify the process which sent a kill signal to a particular process, use the following script

syscall::kill:entry
{
trace(pid);
trace(execname);
}
  • To determine how much time your webserver is spending at read, use the following script. This script can easily be modified for other processes
BEGIN
{
printf("size\ttime\n");
}
syscall::read:entry
/execname == "httpd"/
{
self->start = timestamp;
}
syscall::read:return
/self->start/
{
printf("%d\t%d\n", arg0, timestamp - self->start);
self->start = 0;
}

Proc Provider

This provider fires at processes and thread creation and termination as well as signals. It is a more sophisticated approach to the simple kill probe and will tell you which user sent a given signal (3head) to which process.
  • The following script prints out the process name, pid and uid when a new process is started in the system. Note that proc:::exec-success fires when a new process is started successfully.
proc:::exec-success
{
printf("%s(pid=%d) started by uid - %d\n",execname, pid, uid);
}
  • Trace all the signals sent to all the processes currently running on the system
proc:::signal-send
{
printf("%d was sent to %s by ", args[2], args[1]->pr_fname);
}
Note: We have to use dtrace -w for this example along with -s.
  • Add the conditional statement (/args[2] == SIGKILL/) into the script and send SIGKILL signals to different processes from different users.
proc:::signal-send
/args[2] == SIGKILL/
{
printf("SIGKILL was sent to %s by ", args[1]->pr_fname);
}

Sched Provider

This provider dynamically traces scheduling events. Use it to understand when and why threads sleep, run, change priority, or wake other threads.
  • The following script determines the amount of time the CPU spends on I/O wait and working. It also breaks the I/O process and indicates that the data that was retrieved during the I/O wait time by sendmail. This script can easily be modified to fit your particular situation.
sched:::on-cpu
/execname == "soffice.bin"/
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@time[""] = sum(vtimestamp self->on);
self->on = 0;
}

I/O Provider

This provider looks into the disk input and output (I/O) subsystem. With the io provider you can get an in-depth understanding of iostat(1M) output.
  • Trace which files are being accessed on which device and to determine if the task being performed is a read or a write.
BEGIN
{
printf("%10s %58s %2s\n", "DEVICE", "FILE", "RW");
}
io:::start
{
printf("%10s %58s %2s\n", args[1]->dev_statname, args[2]->fi_pathname,
args[0]->b_flags & B_READ ? "R" : "W");
}

No comments: