In my previous post I've explained how to build MariaDB Server 10.8 from GitHub source on macOS. This kind of (totally unsupported by both MariaDB Corporation and Foundation) build may be used to study new interesting testing, troubleshooting and performance problem solving options that DTrace provides.
To start adding custom probes and doing something non-trivial one has to understand (or, better, master) some basics of the language used. Code in the sample support-files/dtrace/query-time.d program looked clear and familiar enough (at least for those who've seen bpftrace probes). It consists of several probes like this:
mysql*:::query-start
{
self->query = copyinstr(arg0);
self->connid = arg1;
self->db = copyinstr(arg2);
self->who = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4)));
self->querystart = timestamp;
}
As we can read in the manual, the format for the probe specification is provider:module:function:name. An empty component in a probe specification matches anything. You can also describe probes with a pattern matching syntax that is similar to the syntax that is used by shell.
In the case above (USDT) provider name should start with mysql, we do not care about module and function (actually we maybe have USDT with the name in different functions), and we refer to the probe by name.
If you are wondering to see other kinds of specific probes, you can check dtrace -l output:
Yuliyas-Air:~ Valerii$ sudo dtrace -l | head -20
dtrace: system integrity protection is on, some features will not be available
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
...
42 profile tick-1000
43 profile tick-5000
44 mach_trap kern_invalid entry
45 mach_trap kern_invalid return
46 mach_trap _kernelrpc_mach_vm_allocate_trap entry
Yuliyas-Air:~ Valerii$ sudo dtrace -l | tail -4
dtrace: system integrity protection is on, some features will not be available
254755 pid82428 mariadbd do_command(THD*, bool) 5e4
254917 pid82428 libsystem_malloc.dylib malloc return
467101 security_exception639 csparser _ZN8Security7CFErrorC2Ev [Security::CFError::CFError()] throw-cf
467102 security_exception639 csparser _ZN8Security9UnixErrorC2Eib [Security::UnixError::UnixError(int, bool)] throw-unix
We should study different providers that DTrace supports:
- dtrace - few special probes: BEGIN, END and ERROR.
- profile - this provider includes probes that are
associated with an interrupt that fires at some regular, specified
time interval .profile-N probes basically fire N times per second on every CPU. tick-N probes fire on only one CPU per interval.
- syscall - enables you to trace every system call entry and return.
- pid* - enables tracing of any user
process, as specified by its pid
. It basically allows to trace entries (with entry probe name) and returns (with return probe name) to every user function, among other things.
- ...
There are more providers actually, but those mentioned above allow to start reading and writing DTrace scripts for MariaDB server tracing and profiling, so let's stop listing them for now.
As we can see from this output
Yuliyas-Air:~ Valerii$ dtrace
Usage: dtrace [-aACeFHlqSvVwZ] [-arch i386|x86_64] [-b bufsz] [-c cmd] [-D name[=def]]
[-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
[-x opt[=val]]
[-P provider [[ predicate ] action ]]
[-m [ provider: ] module [[ predicate ] action ]]
[-f [[ provider: ] module: ] func [[ predicate ] action ]]
[-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
[-i probe-id [[ predicate ] action ]] [ args ... ]
predicate -> '/' D-expression '/'
action -> '{' D-statements '}'
...
for each probe we can define optional predicate (in slashes), expression that must evaluate to true (non-zero integer value) when probe fires for optional action (in curly brackets) to be executed. Action is a sequence of DTrace statements separated by semicolons. Note that DTrace does NOT support loops or complex conditional execution in the actions, for good reasons discussed elsewhere. We may also have optional arguments for the probe.
In our query-start probe above there is no predicate, so the action is executed unconditionally when we hit it. We use some built-in functions (like copyinsrt()) and variables, like arg0, arg1 and timestamp. We'll discuss them in details in some other post, but arg0 ... arg9 obviously refer to the first ten arguments of the probe and timestamp is the current value of a nanosecond timestamp counter.
In the action we see several assignments to self->something. These are thread-local variables that share a common name in your D code but refer to separate data storage associated with each operating system thread.
Now we can try to add some probes. Given this information:
Yuliyas-Air:~ Valerii$ ps aux | grep mariadbd
Valerii 37341 0.0 0.0 4267752 700 s003 S+ 10:47AM 0:00.01 grep mariadbd
Valerii 82428 0.0 0.1 4834244 5960 ?? S Wed09PM 116:47.57 /Users/Valerii/dbs/maria10.8/bin/mariadbd --no-defaults --basedir=/Users/Valerii/dbs/maria10.8 --datadir=/Users/Valerii/dbs/maria10.8/data --plugin-dir=/Users/Valerii/dbs/maria10.8/lib/plugin --log-error=/Users/Valerii/dbs/maria10.8/data/Yuliyas-Air.err --pid-file=Yuliyas-Air.pid
and knowing that function name portion of the current probe's description is avaiable ad probefunc built-in variable, we can try to trace the enter into the every tracable function in this MariaDB server process:
Yuliyas-Air:~ Valerii$ sudo dtrace -n 'pid82428:::entry { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available
dtrace: description 'pid82428:::entry ' matched 240404 probes
CPU ID FUNCTION:NAME
1 128445 cerror_nocancel:entry
Enter cerror_nocancel
1 129499 _pthread_testcancel:entry
Enter _pthread_testcancel
1 128444 __error:entry
Enter __error
1 129505 _pthread_cond_updateval:entry
Enter _pthread_cond_updateval
1 129429 pthread_mutex_lock:entry
Enter pthread_mutex_lock
1 36852 my_hrtime:entry
Enter my_hrtime
1 122312 clock_gettime:entry
Enter clock_gettime
1 122657 gettimeofday:entry
Enter gettimeofday
1 128412 __commpage_gettimeofday:entry
Enter __commpage_gettimeofday
1 128413 __commpage_gettimeofday_internal:entry
Enter __commpage_gettimeofday_internal
1 128343 mach_absolute_time:entry
Enter mach_absolute_time
:
I've used faimiliar C-style printf() function to print some custom string including the name of the function when we hit the probe. Note that dtrace attached probes to 240404 different functions without any problem, in default configuration. We immediately started to gte the output that shows function calls happening in background threads all the time. I've used more to be able to copy-paste small portion of the output and then stopped tracing with Ctrl+C after quiting from more.
Note that by default for each hit of the probe we get probe identifier, CPU that it fired on, function and probe name reported. Add -q option to suppress this and see only what we explicitly print in the probe:
Yuliyas-Air:~ Valerii$ sudo dtrace -q -n 'pid82428:::entry { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available
Enter cerror_nocancel
Enter _pthread_testcancel
Enter __error
Enter _pthread_cond_updateval
Enter pthread_mutex_lock
Enter my_hrtime
Enter clock_gettime
Enter gettimeofday
Enter __commpage_gettimeofday
Enter __commpage_gettimeofday_internal
...
If we want to trace all processes with "mariadbd" as the name of the executable, we can add predicate to the probe that does not specify pid at all:
Yuliyas-Air:~ Valerii$ sudo dtrace -q -n ':::entry / execname == "mariadbd" / { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available
Enter cerror_nocancel
Enter _pthread_testcancel
Enter __error
Enter _pthread_cond_updateval
Enter pthread_mutex_lock
Enter my_hrtime
Enter clock_gettime
Enter gettimeofday
...
So, we can produce a raw trace (to be stored or processed by some user level code outside of DTrace). But what if we would like to do some more advanced processing in the probe itself and share only the results? For example, we run some MariaDB tests and would like to know how many times specific function in the code was executed (if at all) in the process?
For this we have to use aggregations and rely on aggregate functions to do the job in kernel space. For now it's enough to know that @name[key] is the aggregation name indexed by the key (you can use several keys separated by commas) and count() aggregation function simly counts the number of times it was called.
Another inportant detail to study to create useful DTrace scripts is scripts arguments substitution. When called with -s option dtrace prowides a set of built-in macro variables to the script, with nsame starting with $. Of them $0 ... $9 have the same meenings as in shell, so $1 is the first argument of the script.
With the above taken into account, we can use the following basic DTrace script to count function executions per name in any user process (quiet option has the same effect as -q):
Yuliyas-Air:~ Valerii$ cat /tmp/codecoverage.d
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::$2:entry {
@calls[probefunc] = count();
}
This script substitutes the first argument for the process id in the pid provider and the second argument for the function name in the probe, thus we create one or more probes. The aggregation's collected results are printed by default when dtrace script ends, but there is also the printa() function to do this explicitly in the probes if needed. Let's try to apply this to MariaDB server 10.8 running as process 82428 and trace all functions (hence '*' as the explicite second argument):
Yuliyas-Air:~ Valerii$ sudo /tmp/codecoverage.d 82428 '*' >/tmp/res.txt
dtrace: system integrity protection is on, some features will not be available
^C
Yuliyas-Air:~ Valerii$ ls -l /tmp/res.txt
-rw-r--r-- 1 Valerii wheel 58772 Jan 28 20:53 /tmp/res.txt
Yuliyas-Air:~ Valerii$ head -10 /tmp/res.txt
ACL_USER* find_by_username_or_anon<ACL_USER>(ACL_USER*, unsigned long, char const*, char const*, char const*, char const*, char 1
ACL_USER::copy(st_mem_root*) 1
Apc_target::init(st_mysql_mutex*) 1
Binary_string::copy(char const*, unsigned long) 1
CONNECT::create_thd(THD*) 1
Create_func_arg1::create_func(THD*, st_mysql_const_lex_string*, List<Item>*) 1
Create_func_sleep::create_1_arg(THD*, Item*) 1
Current_schema_tracker::update(THD*, set_var*) 1
Yuliyas-Air:~ Valerii$ tail -10 /tmp/res.txt
buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) 33895
pthread_self 36422
mtr_t::memcpy_low(buf_block_t const&, unsigned short, void const*, unsigned long) 44143
unsigned char* mtr_t::log_write<(unsigned char)48>(page_id_t, buf_page_t const*, unsigned long, bool, unsigned long) 44167
mach_boottime_usec 45007
mtr_t::modify(buf_block_t const&) 45609
pthread_mutex_unlock 56530
pthread_mutex_lock 57295
pthread_cond_broadcast 64197
_platform_memmove$VARIANT$Haswell 72767
Yuliyas-Air:~ Valerii$ cat /tmp/res.txt | grep do_command
do_command(THD*, bool) 4
I've executed just several statements (4 if you check do_command() executions) over a short enough period of time, but some functions like pthread_mutex_lock were executed thousands of times in the background. Now we have a lame code coverage testing method that works at MariaDB server scale (unlike with bpftrace on Linux). There are performance implications (huge impact when I tried to do the same while running sysbench test with many concurrent threads) and limitations of internal buffer sizes that may cause some probes to be skipped etc, but basically it works!
DTrace is almost as beautiful as Côte d'Azur :) |
To summarize:
- We checked basic building blocks of simple DTrace one liners and scripts in some details, including probe structure, main providers, some built-in variables, thread-local variables referenced via self-> and aggregations.
- It is possible to add probes to each and every function in MariaDB server 10.8 and do both raw tracing and aggregation in the kernel context. Lame code coverage scripts do work.
- It is possible to create parametrized, generic scripts with command line arguments and other macros substituted.
- DTrace language is similar to what we see later implemented in bpftrace. After some checks of the manual one can easily switch from one tool to the other for basic tasks.
- Built-in variablers and functions may be named differently than in bpftrace.
- There are many more details to cover and interesting scripts to use with MariaDB server, so stay tuned!