Sunday, January 30, 2022

DTrace Basics and Tracing Every MariaDB Server Function Call

In my previous post I've explained how to build MariaDB Server 10.8 from GitHub source on macOS. This kind of (totally unsupported by both MariaDB Corporation and Foundation) build may be used to study new interesting testing, troubleshooting and performance problem solving options that DTrace provides.

To start adding custom probes and doing something non-trivial one has to understand (or, better, master) some basics of the language used. Code in the sample support-files/dtrace/query-time.d program looked clear and familiar enough (at least for those who've seen bpftrace probes). It consists of several probes like this:

mysql*:::query-start
{
   self->query = copyinstr(arg0);
   self->connid = arg1;
   self->db    = copyinstr(arg2);
   self->who   = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4)));
   self->querystart = timestamp;
}

As we can read in the manual, the format for the probe specification is provider:module:function:name. An empty component in a probe specification matches anything. You can also describe probes with a pattern matching syntax that is similar to the syntax that is used by shell. 

In the case above  (USDT) provider name should start with mysql, we do not care about module and function (actually we maybe have USDT with the name in different functions), and we refer to the probe by name. 

If you are wondering to see other kinds of specific probes, you can check dtrace -l output:

Yuliyas-Air:~ Valerii$ sudo dtrace -l | head -20
dtrace: system integrity protection is on, some features will not be available

   ID   PROVIDER            MODULE                          FUNCTION NAME
    1     dtrace                                                     BEGIN
    2     dtrace                                                     END
    3     dtrace                                                     ERROR
...
   42    profile                                                     tick-1000
   43    profile                                                     tick-5000
   44  mach_trap                                        kern_invalid entry
   45  mach_trap                                        kern_invalid return
   46  mach_trap                    _kernelrpc_mach_vm_allocate_trap entry
Yuliyas-Air:~ Valerii$ sudo dtrace -l | tail -4
dtrace: system integrity protection is on, some features will not be available

254755   pid82428          mariadbd            do_command(THD*, bool) 5e4
254917   pid82428 libsystem_malloc.dylib                            malloc return
467101 security_exception639          csparser _ZN8Security7CFErrorC2Ev [Security::CFError::CFError()] throw-cf
467102 security_exception639          csparser _ZN8Security9UnixErrorC2Eib [Security::UnixError::UnixError(int, bool)] throw-unix

We should study different providers that DTrace supports:

  • dtrace - few special probes: BEGIN, END and ERROR.
  • profile - this provider includes probes that are associated with an interrupt that fires at some regular, specified time interval .profile-N probes basically fire N times per second on every CPU. tick-N probes fire on only one CPU per interval.
  • syscall - enables you to trace every system call entry and return.
  • pid* - enables tracing of any user process, as specified by its pid. It basically allows to trace entries (with entry probe name) and returns (with return probe name) to every user function, among other things.
  • ...

There are more providers actually, but those mentioned above allow to start reading and writing DTrace scripts for MariaDB server tracing and profiling, so let's stop listing them for now.

As we can see from this output 

Yuliyas-Air:~ Valerii$ dtrace
Usage: dtrace [-aACeFHlqSvVwZ] [-arch i386|x86_64] [-b bufsz] [-c cmd] [-D name[=def]]
        [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
        [-x opt[=val]]

        [-P provider [[ predicate ] action ]]
        [-m [ provider: ] module [[ predicate ] action ]]
        [-f [[ provider: ] module: ] func [[ predicate ] action ]]
        [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
        [-i probe-id [[ predicate ] action ]] [ args ... ]

        predicate -> '/' D-expression '/'
           action -> '{' D-statements '}'
...

for each probe we can define optional predicate (in slashes), expression that must evaluate to true (non-zero integer value) when probe fires for optional action (in curly brackets) to be executed. Action is a sequence of DTrace statements separated by semicolons. Note that DTrace does NOT support loops or complex conditional execution in the actions, for good reasons discussed elsewhere. We may also have optional arguments for the probe.

In our query-start probe above there is no predicate, so the action is executed unconditionally when we hit it. We use some built-in functions (like copyinsrt()) and variables, like arg0, arg1 and timestamp. We'll discuss them in details in some other post, but arg0 ... arg9 obviously refer to the first ten arguments of the probe and timestamp is the current value of a nanosecond timestamp counter.

In the action we see several assignments to self->something. These are thread-local variables that share a common name in your D code but refer to separate data storage associated with each operating system thread.

Now we can try to add some probes. Given this information:

Yuliyas-Air:~ Valerii$ ps aux | grep mariadbd
Valerii          37341   0.0  0.0  4267752    700 s003  S+   10:47AM   0:00.01 grep mariadbd
Valerii          82428   0.0  0.1  4834244   5960   ??  S    Wed09PM 116:47.57 /Users/Valerii/dbs/maria10.8/bin/mariadbd --no-defaults --basedir=/Users/Valerii/dbs/maria10.8 --datadir=/Users/Valerii/dbs/maria10.8/data --plugin-dir=/Users/Valerii/dbs/maria10.8/lib/plugin --log-error=/Users/Valerii/dbs/maria10.8/data/Yuliyas-Air.err --pid-file=Yuliyas-Air.pid

and knowing that function name portion of the current probe's description is avaiable ad probefunc built-in variable, we can try to trace the enter into the every tracable function in this MariaDB server process:

Yuliyas-Air:~ Valerii$ sudo dtrace -n 'pid82428:::entry { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available

dtrace: description 'pid82428:::entry ' matched 240404 probes
CPU     ID                    FUNCTION:NAME
  1 128445            cerror_nocancel:entry

Enter cerror_nocancel
  1 129499        _pthread_testcancel:entry
Enter _pthread_testcancel
  1 128444                    __error:entry
Enter __error
  1 129505    _pthread_cond_updateval:entry
Enter _pthread_cond_updateval
  1 129429         pthread_mutex_lock:entry
Enter pthread_mutex_lock
  1  36852                  my_hrtime:entry
Enter my_hrtime
  1 122312              clock_gettime:entry
Enter clock_gettime
  1 122657               gettimeofday:entry
Enter gettimeofday
  1 128412    __commpage_gettimeofday:entry
Enter __commpage_gettimeofday
  1 128413 __commpage_gettimeofday_internal:entry
Enter __commpage_gettimeofday_internal
  1 128343         mach_absolute_time:entry
Enter mach_absolute_time
:

I've used faimiliar C-style printf() function to print some custom string including the name of the function when we hit the probe. Note that dtrace attached probes to 240404 different functions without any problem, in default configuration. We immediately started to gte the output that shows function calls happening in background threads all the time. I've used more to be able to copy-paste small portion of the output and then stopped tracing with Ctrl+C after quiting from more.

Note that by default for each hit of the probe we get probe identifier, CPU that it fired on, function and probe name reported. Add -q option to suppress this and see only what we explicitly print in the probe:

Yuliyas-Air:~ Valerii$ sudo dtrace -q -n 'pid82428:::entry { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available


Enter cerror_nocancel
Enter _pthread_testcancel
Enter __error
Enter _pthread_cond_updateval
Enter pthread_mutex_lock
Enter my_hrtime
Enter clock_gettime
Enter gettimeofday
Enter __commpage_gettimeofday
Enter __commpage_gettimeofday_internal
...

If we want to trace all processes with "mariadbd" as the name of the executable, we can add predicate to the probe that does not specify pid at all:

Yuliyas-Air:~ Valerii$ sudo dtrace -q -n ':::entry / execname == "mariadbd" / { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available


Enter cerror_nocancel
Enter _pthread_testcancel
Enter __error
Enter _pthread_cond_updateval
Enter pthread_mutex_lock
Enter my_hrtime
Enter clock_gettime
Enter gettimeofday
...

So, we can produce a raw trace (to be stored or processed by some user level code outside of DTrace). But what if we would like to do some more advanced processing in the probe itself and share only the results? For example, we run some MariaDB tests and would like to know how many times specific function in the code was executed (if at all) in the process?

For this we have to use aggregations and rely on aggregate functions to do the job in kernel space. For now it's enough to know that @name[key] is the aggregation name indexed by the key (you can use several keys separated by commas) and count() aggregation function simly counts the number of times it was called. 

Another inportant detail to study to create useful DTrace scripts is scripts arguments substitution. When called with -s option dtrace prowides a set of built-in macro variables to the script, with nsame starting with $. Of them $0 ... $9 have the same meenings as in shell, so $1 is the first argument of the script.

With the above taken into account, we can use the following basic DTrace script to count function executions per name in any user process (quiet option has the same effect as -q):

Yuliyas-Air:~ Valerii$ cat /tmp/codecoverage.d
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::$2:entry {
        @calls[probefunc] = count();
}

This script substitutes the first argument for the process id in the pid provider and the second argument for the function name in the probe, thus we create one or more probes. The aggregation's collected results are printed by default when dtrace script ends, but there is also the printa() function to do this explicitly in the probes if needed. Let's try to apply this to MariaDB server 10.8 running as process 82428 and trace all functions (hence '*' as the explicite second argument):

Yuliyas-Air:~ Valerii$ sudo /tmp/codecoverage.d 82428 '*' >/tmp/res.txt
dtrace: system integrity protection is on, some features will not be available

^C
Yuliyas-Air:~ Valerii$ ls -l /tmp/res.txt
-rw-r--r--  1 Valerii  wheel  58772 Jan 28 20:53 /tmp/res.txt
Yuliyas-Air:~ Valerii$ head -10 /tmp/res.txt


  ACL_USER* find_by_username_or_anon<ACL_USER>(ACL_USER*, unsigned long, char const*, char const*, char const*, char const*, char                1
  ACL_USER::copy(st_mem_root*)                                      1
  Apc_target::init(st_mysql_mutex*)                                 1
  Binary_string::copy(char const*, unsigned long)                   1
  CONNECT::create_thd(THD*)                                         1
  Create_func_arg1::create_func(THD*, st_mysql_const_lex_string*, List<Item>*)                1
  Create_func_sleep::create_1_arg(THD*, Item*)                      1
  Current_schema_tracker::update(THD*, set_var*)                    1
Yuliyas-Air:~ Valerii$ tail -10 /tmp/res.txt
  buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool)            33895
  pthread_self                                                  36422
  mtr_t::memcpy_low(buf_block_t const&, unsigned short, void const*, unsigned long)            44143
  unsigned char* mtr_t::log_write<(unsigned char)48>(page_id_t, buf_page_t const*, unsigned long, bool, unsigned long)            44167
  mach_boottime_usec                                            45007
  mtr_t::modify(buf_block_t const&)                             45609
  pthread_mutex_unlock                                          56530
  pthread_mutex_lock                                            57295
  pthread_cond_broadcast                                        64197
  _platform_memmove$VARIANT$Haswell                             72767
Yuliyas-Air:~ Valerii$ cat /tmp/res.txt | grep do_command
  do_command(THD*, bool)                                            4

I've executed just several statements (4 if you check do_command() executions) over a short enough period of time, but some functions like pthread_mutex_lock were executed thousands of times in the background. Now we have a lame code coverage testing method that works at MariaDB server scale (unlike with bpftrace on Linux). There are performance implications (huge impact when I tried to do the same while running sysbench test with many concurrent threads) and limitations of internal buffer sizes that may cause some probes to be skipped etc, but basically it works!

DTrace is almost as beautiful as Côte d'Azur :)

 To summarize:

  1. We checked basic building blocks of simple DTrace one liners and scripts in some details, including probe structure, main providers, some built-in variables, thread-local variables referenced via self-> and aggregations.
  2. It is possible to add probes to each and every function in MariaDB server 10.8 and do both raw tracing and aggregation in the kernel context. Lame code coverage scripts do work.
  3. It is possible to create parametrized, generic scripts with command line arguments and other macros substituted.
  4. DTrace language is similar to what we see later implemented in bpftrace. After some checks of the manual one can easily switch from one tool to the other for basic tasks.
  5. Built-in variablers and functions may be named differently than in bpftrace.
  6. There are many more details to cover and interesting scripts to use with MariaDB server, so stay tuned!

No comments:

Post a Comment