The goal is the same as before - try to add dynamic probe(s) to trace query execution. More specifically, to capture text of the queries executed by clients of MySQL server. As bpftrace requires new kernel and just does not work on Ubuntu 16.04, for the demonstration I use my Fedora 29 box with kernel 5.2.x and, for a change, get queries from Fedora's own MariaDB 10.3.17 installed from rpm packages there.
I have both bcc and bpftrace installed also from packages:
[openxs@fc29 ~]$ rpm -qa | grep bccYou can check fine manual for the details, but even basic -h option provides enough to start, as long as you already know some terms and probes syntax:
python3-bcc-0.8.0-1.fc29.x86_64
bcc-0.8.0-1.fc29.x86_64
bcc-tools-0.8.0-1.fc29.x86_64
[openxs@fc29 ~]$ rpm -qa | grep bpf
bpftrace-0.9-0.fc29.x86_64
[openxs@fc29 ~]$
[openxs@fc29 ~]$ bpftrace -hIn our case we need to define uprobe for proper mysqld binary and trace the dispatch_command() function. Before we start, note that parameters of dispatch_command() in MariaDB 10.3 are not the same as in Percona Server 5.7 I've used in the previous post. Basically, this function starts as follows in sql/sql_parse.cc:
USAGE:
bpftrace [options] filename
bpftrace [options] -e 'program'
OPTIONS:
-B MODE output buffering mode ('line', 'full', or 'none')
-d debug info dry run
-dd verbose debug info dry run
-e 'program' execute this program
-h show this help message
-l [search] list probes
-p PID enable USDT probes on PID
-c 'CMD' run CMD and enable USDT probes on resulting process
-v verbose messages
--version bpftrace version
ENVIRONMENT:
BPFTRACE_STRLEN [default: 64] bytes on BPF stack per str()
BPFTRACE_NO_CPP_DEMANGLE [default: 0] disable C++ symbol demangling
EXAMPLES:
bpftrace -l '*sleep*'
list probes containing "sleep"
bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
trace processes calling sleep
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
count syscalls by process name
1570 bool dispatch_command(enum enum_server_command command, THD *thd,Note the third argument, packet. If the first argument, command. is SQL_QUERY, then packet contains the query text (as a zero-terminated string) for sure (it's also true for many other commands, but let me skip the details for now). That's why we'll use third argument in our uprobe to capture the SQL text.
1571 char* packet, uint packet_length, bool is_com_mult i,
1572 bool is_next_command)
Now, let's start the service and check the exact full path name for the mysql binary:
[openxs@fc29 ~]$ sudo service mariadb startThe first naive attempt to add the probe after cursory reading the documentation and checking few examples may look like this:
[sudo] password for openxs:
Redirecting to /bin/systemctl start mariadb.service
[openxs@fc29 ~]$ ps aux | grep mysqldmysql 9109 6.2 1.2 1699252 104108 ? Ssl 09:30 0:00 /usr/libexec/mysqld --basedir=/usr
openxs 9175 0.0 0.0 215744 892 pts/0 S+ 09:30 0:00 grep --color=auto mysqld
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:dispatch_command { printf("%s\n", str(arg2)); }'It seems in my case, unlike perf, bpftrace is not "aware" of C++ names or symbolic information in a separate -debuginfo package. So, I need mangled name:
Attaching 1 probe...
Could not resolve symbol: /usr/libexec/mysqld:dispatch_command
[openxs@fc29 ~]$ nm -na /usr/libexec/mysqld | grep dispatch_commandSurely there is no symbols in the binary from Fedora package, so I checked the binary (of the same version) that I've built myself (as usual) and assumed that neither parameters nor mangling approach could be different. So, the next attempt to add dynamic probe would look as follows:
nm: /usr/libexec/mysqld: no symbols
[openxs@fc29 ~]$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5190 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5190 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { printf("%s\n", str(arg2)); }'It worked and you see above the output I've got for the following session:
Attaching 1 probe...
show databases
show tables
t1
select @@version_comment limit 1
select user, host from mysql.user
^C
[openxs@fc29 ~]$ mysql -uroot testReading table information for completion of table and column namesYou can see some SQL statements generated when mysql command line connects, as well as packet value in some other packet ("t1") than COM_QUERY, probably. My probe had not even tried to check other parameters besides the supposed query text.
You can turn off this feature to get a quicker startup with -A
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.3.17-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [test]> select user, host from mysql.user;
+------+-----------+
| user | host |
+------+-----------+
| root | 127.0.0.1 |
| root | ::1 |
| root | fc29 |
| root | localhost |
+------+-----------+
4 rows in set (0.000 sec)
Now, the probe is defined on uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb - I've just used long, mangled version of the function name and full path name to the binary, and defined a dynamic probe (uprobe). There is no filter and the action for the probe is defined as { printf("%s\n", str(arg2)); } - that is, I print third argument (they are numbered starting from zero, arg0, arg1, arg2, ...) as a zero-terminated string. Without str() built in function I'd get just a pointer that could be printed as (unsigned) long integer, u64.
Basically, that's all. We have a quick and dirty way to capture all queries. No timing or anything, but it all depends on probe action that can use numerous built in variables and functions.
More "advanced" use of bpftrace, a lame attempt to capture time to execute query, may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql = str(arg2); @start[@sql] = nsecs; }In this case I try to store time since probe start into the associative array with query text as an "index" and start time (in nanoseconds) as a value.Then I calculate the difference from current nsecs value upon function return, in a separate uretprobe. I've used global variables, @sql for the query text, and @start[] for the array. It even seems to work well for a single threaded load based on the above. But as soon as I try to use multiple concurrent threads:
uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[@sql] != 0/ { printf("%s : %u64 ms\n", @sql, (nsecs - @start[@sql])/1000000); } '
Attaching 2 probes...
select sleep(3) : 300064 ms
select sleep(1) : 100064 ms
^C
@sql: select sleep(1)
@start[select sleep(3)]: 10666558704666
@start[select sleep(1)]: 10685614895675
[openxs@fc29 ~]$
[openxs@fc29 ~]$ for i in `seq 1 4`; do mysql -uroot test -e"select sleep($i)" & doneit becomes clear that global variables are really global and my outputs are all wrong.
A bit better version may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql[tid] = str(arg2); @start[tid] = nsecs; } uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[tid] != 0/ { printf("%s : %u64 %u64 ms\n", @sql[tid], tid, (nsecs - @start[tid])/1000000); } 'The output is for both sequential and concurrent execution of queries. I've used two associative arrays, @sql[] for queries and @start[] for start times, both indexed by tid - built in variable for thread id, that should not change, at least until we use pool-of-threads... You can also see that the tool by default outputs the content of all global associate arrays at the end, unless we free them explicitly.
Attaching 2 probes...
select sleep(1) : 1120764 100064 ms
: 1120764 064 ms
select sleep(2) : 949064 200064 ms
: 949064 064 ms
select sleep(3) : 1120864 300064 ms
: 1120864 064 ms
select sleep(4) : 1120664 400064 ms
: 1120664 064 ms
select sleep(1) : 1120764 100064 ms
: 1120764 064 ms
select sleep(2) : 949064 200064 ms
: 949064 064 ms
select sleep(3) : 1120664 300064 ms
: 1120664 064 ms
select sleep(4) : 1120864 400064 ms
: 1120864 064 ms
^C
@sql[9490]:
@sql[11206]:
@sql[11207]:
@sql[11208]:
@start[11207]: 13609305005933
@start[9490]: 13610305621499
@start[11206]: 13611305753596
@start[11208]: 13612305235313
[openxs@fc29 ~]$
* * *
This image of bpftrace internals is taken from the Brendan Gregg's post |
Instead of hunting for symbols with nm, you can just use "bpftrace -l -p $(pidof mysqld) | grep dispatch_command" and feed that output directly into the program string, that way you can automate it and run it without problem on other machines.
ReplyDelete