Thursday, February 10, 2022

How to Summarize gdb Backtrace with pt-pmp (and Flamegraph)

This is going to be more like a note to myself and my readers than a real blog post. But still I'd like o document the trick that I am applying for years already, in its most convenient form. 

Assuming you have a backtrace (or full backtrace) created from a core file or by attaching gdb to a live (and maybe hanging) process with many threads, like MySQL or MariaDB server, what is the best way to summarize it quickly, to understand what most of the threads are doing/hanging at? Like this huge backtrace with hundreds of threads:

openxs@ao756:~$ ls -l /tmp/backtrace1.txt
-rw-rw-r-- 1 openxs openxs 2817054 лют 10 17:02 /tmp/backtrace1.txt
openxs@ao756:~$ grep LWP /tmp/backtrace1.txt | wc -l
1915

Here it is. You have to download pt-pmp from Percona Toolkit. Then source the pt-pmp script and rely on the aggregate_stacktrace function from it, quite advanced in summarizing stack traces:

openxs@ao756:~$ which pt-pmp
/usr/bin/pt-pmp
openxs@ao756:~$ . /usr/bin/pt-pmp
openxs@ao756:~$ cat /tmp/backtrace1.txt | aggregate_stacktrace > /tmp/pmp1.txt 
openxs@ao756:~$ ls -l /tmp/pmp1.txt
-rw-rw-r-- 1 openxs openxs 34174 лют 10 18:07 /tmp/pmp1.txt
openxs@ao756:~$ head -5 /tmp/pmp1.txt
    598 poll(libc.so.6),vio_io_wait(viosocket.c:945),vio_socket_io_wait(viosocket.c:108),vio_read(viosocket.c:184),my_real_read(net_serv.cc:892),my_net_read_packet_reallen(net_serv.cc:1162),my_net_read_packet(net_serv.cc:1146),do_command(sql_parse.cc:1262),do_handle_one_connection(sql_connect.cc:1336),handle_one_connection(sql_connect.cc:1241),pfs_spawn_thread(pfs.cc:1862),start_thread(libpthread.so.0),clone(libc.so.6)
     82 pthread_cond_wait,wait(os0event.cc:158),wait_low(os0event.cc:158),os_event_wait_low(os0event.cc:158),sync_array_wait_event(sync0arr.cc:471),rw_lock_s_lock_spin(sync0rw.cc:373),rw_lock_s_lock_func(sync0rw.ic:290),pfs_rw_lock_s_lock_func(sync0rw.ic:290),buf_page_get_gen(buf0buf.cc:4905),btr_cur_search_to_nth_level(btr0cur.cc:1243),btr_pcur_open_low(btr0pcur.ic:467),row_ins_scan_sec_index_for_duplicate(btr0pcur.ic:467),row_ins_sec_index_entry_low(btr0pcur.ic:467),row_ins_sec_index_entry(row0ins.cc:3251),row_ins_index_entry(row0ins.cc:3297),row_ins_index_entry_step(row0ins.cc:3297),row_ins(row0ins.cc:3297),row_ins_step(row0ins.cc:3297),row_insert_for_mysql(row0mysql.cc:1414),ha_innobase::write_row(ha_innodb.cc:8231),handler::ha_write_row(handler.cc:6089),write_record(sql_insert.cc:1941),mysql_insert(sql_insert.cc:1066),mysql_execute_command(sql_parse.cc:4170),mysql_parse(sql_parse.cc:7760),dispatch_command(sql_parse.cc:1832),do_command(sql_parse.cc:1386),do_handle_one_connection(sql_connect.cc:1336),handle_one_connection(sql_connect.cc:1241),pfs_spawn_thread(pfs.cc:1862),start_thread(libpthread.so.0),clone(libc.so.6)
     55 pthread_cond_wait,wait(os0event.cc:158),wait_low(os0event.cc:158),os_event_wait_low(os0event.cc:158),lock_wait_suspend_thread(lock0wait.cc:347),row_mysql_handle_errors(row0mysql.cc:741),row_insert_for_mysql(row0mysql.cc:1428),ha_innobase::write_row(ha_innodb.cc:8231),handler::ha_write_row(handler.cc:6089),write_record(sql_insert.cc:1941),mysql_insert(sql_insert.cc:1066),mysql_execute_command(sql_parse.cc:4170),mysql_parse(sql_parse.cc:7760),dispatch_command(sql_parse.cc:1832),do_command(sql_parse.cc:1386),do_handle_one_connection(sql_connect.cc:1336),handle_one_connection(sql_connect.cc:1241),pfs_spawn_thread(pfs.cc:1862),start_thread(libpthread.so.0),clone(libc.so.6)
     38 pthread_cond_wait,wait(os0event.cc:158),wait_low(os0event.cc:158),os_event_wait_low(os0event.cc:158),sync_array_wait_event(sync0arr.cc:471),rw_lock_x_lock_func(sync0rw.cc:733),pfs_rw_lock_x_lock_func(sync0rw.ic:544),buf_page_get_gen(buf0buf.cc:4918),btr_cur_search_to_nth_level(btr0cur.cc:1243),row_ins_sec_index_entry_low(row0ins.cc:2946),row_ins_sec_index_entry(row0ins.cc:3251),row_ins_index_entry(row0ins.cc:3297),row_ins_index_entry_step(row0ins.cc:3297),row_ins(row0ins.cc:3297),row_ins_step(row0ins.cc:3297),row_insert_for_mysql(row0mysql.cc:1414),ha_innobase::write_row(ha_innodb.cc:8231),handler::ha_write_row(handler.cc:6089),write_record(sql_insert.cc:1941),mysql_insert(sql_insert.cc:1066),mysql_execute_command(sql_parse.cc:4170),mysql_parse(sql_parse.cc:7760),dispatch_command(sql_parse.cc:1832),do_command(sql_parse.cc:1386),do_handle_one_connection(sql_connect.cc:1336),handle_one_connection(sql_connect.cc:1241),pfs_spawn_thread(pfs.cc:1862),start_thread(libpthread.so.0),clone(libc.so.6)
     32 pthread_cond_wait,wait(os0event.cc:158),wait_low(os0event.cc:158),os_event_wait_low(os0event.cc:158),sync_array_wait_event(sync0arr.cc:471),rw_lock_x_lock_func(sync0rw.cc:733),pfs_rw_lock_x_lock_func(sync0rw.ic:544),buf_page_get_gen(buf0buf.cc:4918),btr_cur_search_to_nth_level(btr0cur.cc:1243),row_ins_sec_index_entry_low(row0ins.cc:3040),row_ins_sec_index_entry(row0ins.cc:3251),row_ins_index_entry(row0ins.cc:3297),row_ins_index_entry_step(row0ins.cc:3297),row_ins(row0ins.cc:3297),row_ins_step(row0ins.cc:3297),row_insert_for_mysql(row0mysql.cc:1414),ha_innobase::write_row(ha_innodb.cc:8231),handler::ha_write_row(handler.cc:6089),write_record(sql_insert.cc:1941),mysql_insert(sql_insert.cc:1066),mysql_execute_command(sql_parse.cc:4170),mysql_parse(sql_parse.cc:7760),dispatch_command(sql_parse.cc:1832),do_command(sql_parse.cc:1386),do_handle_one_connection(sql_connect.cc:1336),handle_one_connection(sql_connect.cc:1241),pfs_spawn_thread(pfs.cc:1862),start_thread(libpthread.so.0),clone(libc.so.6)

If a summary like above is not clear enough, we can surely go one step further and create a proper flame graph based on the output above:

openxs@ao756:~$ cat /tmp/pmp1.txt | awk '{print $2, $1}' | sed -e 's/,/;/g' | ~/git/FlameGraph/flamegraph.pl --countname="threads" --reverse - >/tmp/pmp1.svg  

Then with some creative zooming and search we can concentrate on waits:

Frames with "wait" substring in the function name are highlighted

That's all. As simple as it looks. Quick overview of the backtrace before you start digging there.

Friday, February 4, 2022

How to Add Probe Inside a Function and Access Local Variables with bpftrace

During several my talks on bpftrace I mentioned that it is possible to add probe to "every other" line of the code, same as with perf, but had never demonstrated how to do this. Looks like it's time to show both this and access to local variables declared inside the function we trace, as I am going to speak about bpftrace again at FOSDEM (and have to share something new and cool).

Consider the following debugging session with latest MariaDB server 10.8 built from GitHub source (as usual, for my own work I use only my own builds from current source) on Ubuntu 20.04:

openxs@ao756:~/dbs/maria10.8$ sudo gdb -p `pidof mariadbd`
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 200650
[New LWP 200652]
[New LWP 200653]
[New LWP 200654]
[New LWP 200655]
[New LWP 200656]
[New LWP 200659]
[New LWP 200662]
[New LWP 200663]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
0x00007f7b8424caff in __GI___poll (fds=0x557960733d38, nfds=3,
    timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) b do_command
Breakpoint 1 at 0x55795e867570: file /home/openxs/git/server/sql/sql_parse.cc, line 1197.
(gdb) c
Continuing.

Basically I've attached to running mariadbd process and set a breakpoint on do_command(), and then let it continue working. In another terminal I've connected to server and got the breakpoint hit immediately:

[New Thread 0x7f7b5dbf0700 (LWP 200680)]
[New Thread 0x7f7b6c084700 (LWP 200797)]
[Switching to Thread 0x7f7b6c084700 (LWP 200797)]

Thread 11 "mariadbd" hit Breakpoint 1, do_command (thd=0x7f7b4c001738,
    blocking=blocking@entry=true)
    at /home/openxs/git/server/sql/sql_parse.cc:1197
1197    {
(gdb) bt
#0  do_command (thd=0x7f7b4c001738, blocking=blocking@entry=true)
    at /home/openxs/git/server/sql/sql_parse.cc:1197
#1  0x000055795e97df07 in do_handle_one_connection (connect=<optimized out>,
    put_in_cache=true) at /home/openxs/git/server/sql/sql_connect.cc:1418
#2  0x000055795e97e23d in handle_one_connection (arg=arg@entry=0x55796079f798)
    at /home/openxs/git/server/sql/sql_connect.cc:1312
#3  0x000055795ecc7e8d in pfs_spawn_thread (arg=0x557960733dd8)
    at /home/openxs/git/server/storage/perfschema/pfs.cc:2201
#4  0x00007f7b84686609 in start_thread (arg=<optimized out>)
    at pthread_create.c:477
#5  0x00007f7b84259293 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

From this state in gdb we can trace execution step by step:

(gdb) n
[New Thread 0x7f7b5ebf2700 (LWP 200800)]
1213      if (thd->async_state.m_state == thd_async_state::enum_async_state::RESUMED)
(gdb) n
1229      thd->lex->current_select= 0;
(gdb) n
1237      if (!thd->skip_wait_timeout)
(gdb) n
1238        my_net_set_read_timeout(net, thd->get_net_wait_timeout());
(gdb) n
1241      thd->clear_error(1);
(gdb) n
1243      net_new_transaction(net);
(gdb) n
1246      thd->start_bytes_received= thd->status_var.bytes_received;
(gdb) n
1264      packet_length= my_net_read_packet(net, 1);
(gdb) n
1266      if (unlikely(packet_length == packet_error))
(gdb) n
1300      packet= (char*) net->read_pos;
(gdb) n
1309      if (packet_length == 0)                       /* safety */
(gdb) n
1316      packet[packet_length]= '\0';                  /* safety */
(gdb) n
1319      command= fetch_command(thd, packet);
(gdb) p packet
$1 = 0x7f7b4c008d08 "\003select @@version_comment limit 1"
(gdb) q
A debugging session is active.

        Inferior 1 [process 200650] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/openxs/dbs/maria10.8/bin/mariadbd, process 200650
[Inferior 1 (process 200650) detached]

In that session above I moved to the next step until I've got to the place in the code where local variable packet got a value, zero-terminated string. I know from the previous experience that it's value is the SQL statement to be executed (in most cases). So, I printed it to find out that mysql command line client starts with finding the server version comment to output.

Question is: can we do the same with bpftrace, attache the probe to the instruction at/around line 1316 of the sql/sql_parse.cc and print the value of the local variable named packet as a string? 

It should be possible, as we can do this with perf or even raw ftrace probes. This is how we can find all "tracable" lines in the code of do_command() function with perf:

openxs@ao756:~/dbs/maria10.8$ sudo perf probe -x /home/openxs/dbs/maria10.8/bin/mysqld --line do_command
<do_command@/home/openxs/git/server/sql/sql_parse.cc:0>
      0  dispatch_command_return do_command(THD *thd, bool blocking)
         {
      2    dispatch_command_return return_value;
           char *packet= 0;
           ulong packet_length;
           NET *net= &thd->net;
           enum enum_server_command command;
           DBUG_ENTER("do_command");

         #ifdef WITH_WSREP
           DBUG_ASSERT(!thd->async_state.pending_ops() ||
                       (WSREP(thd) &&
                        thd->wsrep_trx().state() == wsrep::transaction::s_abort>
         #else
           DBUG_ASSERT(!thd->async_state.pending_ops());
         #endif

           if (thd->async_state.m_state == thd_async_state::enum_async_state::R>
           {
             /*
              Resuming previously suspended command.
              Restore the state
             */
     23      command = thd->async_state.m_command;
     24      packet = thd->async_state.m_packet.str;
             packet_length = (ulong)thd->async_state.m_packet.length;
             goto resume;
           }

...
    104    packet= (char*) net->read_pos;
           /*
             'packet_length' contains length of data, as it was stored in packet
             header. In case of malformed header, my_net_read returns zero.
             If packet_length is not zero, my_net_read ensures that the returned
             number of bytes was actually read from network.
             There is also an extra safety measure in my_net_read:
             it sets packet[packet_length]= 0, but only for non-zero packets.
           */
    113    if (packet_length == 0)                       /* safety */
           {
             /* Initialize with COM_SLEEP packet */
    116      packet[0]= (uchar) COM_SLEEP;
    117      packet_length= 1;
           }
           /* Do not rely on my_net_read, extra safety against programming erro>
    120    packet[packet_length]= '\0';                  /* safety */


    123    command= fetch_command(thd, packet);
...
    221    DBUG_ASSERT(thd->m_digest == NULL);
           DBUG_ASSERT(thd->m_statement_psi == NULL);
         #ifdef WITH_WSREP
           if (packet_length != packet_error)
           {
             /* there was a command to process, and before_command() has been c>
             if (unlikely(wsrep_service_started))
    228        wsrep_after_command_after_result(thd);
           }
         #endif /* WITH_WSREP */
           DBUG_RETURN(return_value);

openxs@ao756:~/dbs/maria10.8$

So, basically we need a probe on line 120 to match the place where I printed the value in gdb. Let's create it with perf:

openxs@ao756:~/dbs/maria10.8$ sudo perf probe -x /home/openxs/dbs/maria10.8/bin/mariadbd 'do_command:120 packet:string'
Added new event:
  probe_mariadbd:do_command_L120 (on do_command:120 in /home/openxs/dbs/maria10.8/bin/mariadbd with packet:string)

You can now use it in all perf tools, such as:

        perf record -e probe_mariadbd:do_command_L120 -aR sleep 1

Now, we can check what user probe was really created by that command line:

openxs@ao756:~/dbs/maria10.8$ sudo cat /sys/kernel/tracing/uprobe_events
p:probe_mariadbd/do_command_L120 /home/openxs/dbs/maria10.8/bin/mariadbd:0x00000000007b761f packet_string=+0(%r10):string

So, the probe is attached to some address, 0x7b761f, in the mariadb binary. Looks like we refer to the register, r10, while trying to access local variable, packet.

Now let's delete this probe (we'll add it back with bpftrace soon) and try to make sense of that address:

openxs@ao756:~/dbs/maria10.8$ sudo perf probe --del 'probe_mariadbd:do_command*'
Removed event: probe_mariadbd:do_command_L120
openxs@ao756:~/dbs/maria10.8$ objdump -tT /home/openxs/dbs/maria10.8/bin/mariadbd | grep do_command
00000000007b7570 g     F .text  00000000000007e9              _Z10do_commandP3THDb
00000000007b7570 g    DF .text  00000000000007e9  Base        _Z10do_commandP3THDb

Note that based on the above the entry address of the do_command() function in the binary is 0x7b7570, a bit smaller than 0x7b761f. The difference is 0x7b761f - 0x7b7570 = 0xaf or 175 in decimal. 

Can we come up to the address for the probe  without perf? Yes, we can do it by checking the assembly code of the function, for example, in gdb:

openxs@ao756:~/dbs/maria10.8$ ps aux | grep 'mariadbd ' | grep -v grep
openxs    200650  0.0  2.4 1341032 94976 ?       Sl   січ30   0:05 /home/openxs/dbs/maria10.8/bin/mariadbd --no-defaults --basedir=/home/openxs/dbs/maria10.8 --datadir=/home/openxs/dbs/maria10.8/data --plugin-dir=/home/openxs/dbs/maria10.8/lib/plugin --log-error=/home/openxs/dbs/maria10.8/data/ao756.err --pid-file=ao756.pid --socket=/tmp/mariadb.sock --port=3309
openxs@ao756:~/dbs/maria10.8$ sudo gdb -p `pidof mariadbd`
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 200650
[New LWP 200652]
[New LWP 200653]
[New LWP 200654]
[New LWP 200655]
[New LWP 200662]
[New LWP 200663]
[New LWP 204349]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
0x00007f7b8424caff in __GI___poll (fds=0x557960733d38, nfds=3,
    timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) disass /m do_command
Dump of assembler code for function do_command(THD*, bool):
1142    static bool wsrep_command_no_result(char command)

1143    {
1144      return (command == COM_STMT_PREPARE          ||
   0x000055795e8679d0 <+1120>:  cmp    $0x16,%al
   0x000055795e8679d2 <+1122>:  je     0x55795e8679dc <do_command(THD*, bool)+1132>
   0x000055795e8679d4 <+1124>:  cmp    $0x1c,%al
   0x000055795e8679d6 <+1126>:  jne    0x55795e867b50 <do_command(THD*, bool)+1504>
   0x000055795e8679dc <+1132>:  mov    $0x1,%esi

1145              command == COM_STMT_FETCH            ||
1146              command == COM_STMT_SEND_LONG_DATA   ||
1147              command == COM_STMT_CLOSE);
1148    }
1149    #endif /* WITH_WSREP */
1150    #ifndef EMBEDDED_LIBRARY
1151    static enum enum_server_command fetch_command(THD *thd, char *packet)

1152    {
1153      enum enum_server_command
--Type <RET> for more, q to quit, c to continue without paging--
   0x000055795e867628 <+184>:   movzbl %al,%r15d

1154        command= (enum enum_server_command) (uchar) packet[0];
1155      DBUG_ENTER("fetch_command");

1156
1157      if (command >= COM_END ||
   0x000055795e86762c <+188>:   lea    -0x20(%r15),%edx
   0x000055795e867630 <+192>:   cmp    $0xd9,%edx
   0x000055795e867636 <+198>:   jbe    0x55795e8676f0 <do_command(THD*, bool)+384>
   0x000055795e86763c <+204>:   cmp    $0xff,%al
   0x000055795e86763e <+206>:   je     0x55795e8676f0 <do_command(THD*, bool)+384>

1158          (command >= COM_MDB_GAP_BEG && command <= COM_MDB_GAP_END))
1159        command= COM_END;                           // Wrong command

1160
1161      DBUG_PRINT("info",("Command on %s = %d (%s)",

1162                         vio_description(thd->net.vio), command,
1163                         command_name[command].str));
--Type <RET> for more, q to quit, c to continue without paging--
1164      DBUG_RETURN(command);

1165    }
1166
1167    /**
1168      Read one command from connection and execute it (query or simple command).
1169      This function is to be used by different schedulers (one-thread-per-connection,
1170      pool-of-threads)
1171
1172      For profiling to work, it must never be called recursively.
1173
1174      @param thd - client connection context
1175
1176      @param blocking - wait for command to finish.
1177                        if false (nonblocking), then the function might
1178                        return when command is "half-finished", with
1179                        DISPATCH_COMMAND_WOULDBLOCK.
1180                        Currenly, this can *only* happen when using
1181                        threadpool. The command will resume, after all outstanding
1182                        async operations (i.e group commit) finish.
--Type <RET> for more, q to quit, c to continue without paging--
1183                        Threadpool scheduler takes care of "resume".
1184
1185      @retval
1186        DISPATCH_COMMAND_SUCCESS - success
1187      @retval
1188        DISPATCH_COMMAND_CLOSE_CONNECTION  request of THD shutdown
1189              (s. dispatch_command() description)
1190      @retval
1191        DISPATCH_COMMAND_WOULDBLOCK - need to wait for asyncronous operations
1192                                      to finish. Only returned if parameter
1193                                      'blocking' is false.
1194    */
1195
1196    dispatch_command_return do_command(THD *thd, bool blocking)
1197    {
   0x000055795e867570 <+0>:     endbr64

1198      dispatch_command_return return_value;

1199      char *packet= 0;

1200      ulong packet_length;
--Type <RET> for more, q to quit, c to continue without paging--

1201      NET *net= &thd->net;

1202      enum enum_server_command command;

1203      DBUG_ENTER("do_command");

1204
1205    #ifdef WITH_WSREP
1206      DBUG_ASSERT(!thd->async_state.pending_ops() ||

1207                  (WSREP(thd) &&
1208                   thd->wsrep_trx().state() == wsrep::transaction::s_aborted));
1209    #else
1210      DBUG_ASSERT(!thd->async_state.pending_ops());
1211    #endif
1212
1213      if (thd->async_state.m_state == thd_async_state::enum_async_state::RESUMED)
   0x000055795e867574 <+4>:     push   %rbp
   0x000055795e867575 <+5>:     mov    %rsp,%rbp
   0x000055795e867578 <+8>:     push   %r15
--Type <RET> for more, q to quit, c to continue without paging--
   0x000055795e86757a <+10>:    push   %r14
   0x000055795e86757c <+12>:    push   %r13
   0x000055795e86757e <+14>:    mov    %esi,%r13d
   0x000055795e867581 <+17>:    push   %r12
   0x000055795e867583 <+19>:    push   %rbx
   0x000055795e867584 <+20>:    mov    %rdi,%rbx
   0x000055795e867587 <+23>:    sub    $0x78,%rsp
   0x000055795e86758b <+27>:    cmpl   $0x2,0x6270(%rdi)
   0x000055795e867592 <+34>:    je     0x55795e867770 <do_command(THD*, bool)+512>
   0x000055795e867598 <+40>:    mov    0x58(%rdi),%rax
   0x000055795e86759c <+44>:    cmpb   $0x0,0x279e(%rdi)
   0x000055795e8675a3 <+51>:    lea    0x290(%rdi),%r14

1214      {
1215        /*
1216         Resuming previously suspended command.
1217         Restore the state
1218        */
1219        command = thd->async_state.m_command;
   0x000055795e867770 <+512>:   mov    0x6278(%rdi),%r10
   0x000055795e867777 <+519>:   mov    0x6280(%rdi),%r12
   0x000055795e86777e <+526>:   movzbl %sil,%r8d
--Type <RET> for more, q to quit, c to continue without paging--
   0x000055795e867782 <+530>:   mov    %rdi,%rsi
   0x000055795e867785 <+533>:   mov    0x6274(%rdi),%r15d

1220        packet = thd->async_state.m_packet.str;

1221        packet_length = (ulong)thd->async_state.m_packet.length;

1222        goto resume;

1223      }
1224
1225      /*
1226        indicator of uninitialized lex => normal flow of errors handling
1227        (see my_message_sql)
1228      */
1229      thd->lex->current_select= 0;
   0x000055795e8675aa <+58>:    movq   $0x0,0xd38(%rax)

1230
1231      /*
1232        This thread will do a blocking read from the client which
1233        will be interrupted when the next command is received from
1234        the client, the connection is closed or "net_wait_timeout"
--Type <RET> for more, q to quit, c to continue without paging--
1235        number of seconds has passed.
1236      */
1237      if (!thd->skip_wait_timeout)
   0x000055795e8675b5 <+69>:    je     0x55795e867710 <do_command(THD*, bool)+416>

1238        my_net_set_read_timeout(net, thd->get_net_wait_timeout());
1239
1240      /* Errors and diagnostics are cleared once here before query */
1241      thd->clear_error(1);
1242
1243      net_new_transaction(net);
   0x000055795e8675dd <+109>:   mov    0x13f0(%rbx),%rax
   0x000055795e8675e4 <+116>:   mov    $0x1,%esi
   0x000055795e8675e9 <+121>:   mov    %r14,%rdi
   0x000055795e8675ec <+124>:   movl   $0x0,0x2f0(%rbx)

1244
1245      /* Save for user statistics */
1246      thd->start_bytes_received= thd->status_var.bytes_received;
   0x000055795e8675f6 <+134>:   mov    %rax,0x4040(%rbx)

1247
--Type <RET> for more, q to quit, c to continue without paging--
1248      /*
1249        Synchronization point for testing of KILL_CONNECTION.
1250        This sync point can wait here, to simulate slow code execution
1251        between the last test of thd->killed and blocking in read().
1252
1253        The goal of this test is to verify that a connection does not
1254        hang, if it is killed at this point of execution.
1255        (Bug#37780 - main.kill fails randomly)
1256
1257        Note that the sync point wait itself will be terminated by a
1258        kill. In this case it consumes a condition broadcast, but does
1259        not change anything else. The consumed broadcast should not
1260        matter here, because the read/recv() below doesn't use it.
1261      */
1262      DEBUG_SYNC(thd, "before_do_command_net_read");

1263
1264      packet_length= my_net_read_packet(net, 1);
   0x000055795e8675fd <+141>:   callq  0x55795ebda8a0 <my_net_read_packet(NET*, my_bool)>
   0x000055795e867602 <+146>:   mov    %rax,%r12

1265
--Type <RET> for more, q to quit, c to continue without paging--
1266      if (unlikely(packet_length == packet_error))
   0x000055795e867605 <+149>:   cmp    $0xffffffffffffffff,%rax
   0x000055795e867609 <+153>:   je     0x55795e8678d0 <do_command(THD*, bool)+864>

1267      {
1268        DBUG_PRINT("info",("Got error %d reading command from socket %s",

1269                           net->error,
1270                           vio_description(net->vio)));
1271
1272        /* Instrument this broken statement as "statement/com/error" */
1273        thd->m_statement_psi= MYSQL_REFINE_STATEMENT(thd->m_statement_psi,
   0x000055795e8678d0 <+864>:   mov    0x3b78(%rbx),%rdi

1274                                                     com_statement_info[COM_END].
1275                                                     m_key);
1276
1277
1278        /* Check if we can continue without closing the connection */
1279
1280        /* The error must be set. */
--Type <RET> for more, q to quit, c to continue without paging--
1281        DBUG_ASSERT(thd->is_error());

1282        thd->protocol->end_statement();
   0x000055795e8678ee <+894>:   mov    0x558(%rbx),%rdi
   0x000055795e8678f5 <+901>:   callq  0x55795e79f7e0 <Protocol::end_statement()>

1283
1284        /* Mark the statement completed. */
1285        MYSQL_END_STATEMENT(thd->m_statement_psi, thd->get_stmt_da());
1286        thd->m_statement_psi= NULL;
   0x000055795e86791b <+939>:   cmpb   $0x3,0x324(%rbx)
   0x000055795e867922 <+946>:   mov    $0x1,%r13d
   0x000055795e867928 <+952>:   movq   $0x0,0x3b78(%rbx)

1287        thd->m_digest= NULL;
   0x000055795e867933 <+963>:   movq   $0x0,0x3b30(%rbx)

1288
1289        if (net->error != 3)
   0x000055795e86793e <+974>:   jne    0x55795e86794a <do_command(THD*, bool)+986>

--Type <RET> for more, q to quit, c to continue without paging--
1290        {
1291          return_value= DISPATCH_COMMAND_CLOSE_CONNECTION;     // We have to close it.
1292          goto out;
1293        }
1294
1295        net->error= 0;
   0x000055795e867940 <+976>:   movb   $0x0,0x324(%rbx)

1296        return_value= DISPATCH_COMMAND_SUCCESS;

1297        goto out;
   0x000055795e867947 <+983>:   xor    %r13d,%r13d

1298      }
1299
1300      packet= (char*) net->read_pos;
   0x000055795e86760f <+159>:   mov    0x2b0(%rbx),%r10

1301      /*
1302        'packet_length' contains length of data, as it was stored in packet
1303        header. In case of malformed header, my_net_read returns zero.
1304        If packet_length is not zero, my_net_read ensures that the returned
--Type <RET> for more, q to quit, c to continue without paging--
1305        number of bytes was actually read from network.
1306        There is also an extra safety measure in my_net_read:
1307        it sets packet[packet_length]= 0, but only for non-zero packets.
1308      */
1309      if (packet_length == 0)                       /* safety */
   0x000055795e867616 <+166>:   test   %rax,%rax
   0x000055795e867619 <+169>:   je     0x55795e8676e0 <do_command(THD*, bool)+368>

1310      {
1311        /* Initialize with COM_SLEEP packet */
1312        packet[0]= (uchar) COM_SLEEP;
   0x000055795e8676e0 <+368>:   movb   $0x0,(%r10)

1313        packet_length= 1;
   0x000055795e8676e4 <+372>:   mov    $0x1,%r12d
   0x000055795e8676ea <+378>:   jmpq   0x55795e86761f <do_command(THD*, bool)+175>
   0x000055795e8676ef <+383>:   nop

1314      }
1315      /* Do not rely on my_net_read, extra safety against programming errors. */
--Type <RET> for more, q to quit, c to continue without paging--
1316      packet[packet_length]= '\0';                  /* safety */
   0x000055795e86761f <+175>:   movb   $0x0,(%r10,%r12,1)

1317
1318
1319      command= fetch_command(thd, packet);
   0x000055795e867624 <+180>:   movzbl (%r10),%eax

1320
1321    #ifdef WITH_WSREP
1322      DEBUG_SYNC(thd, "wsrep_before_before_command");

1323      /*
1324        If this command does not return a result, then we
1325        instruct wsrep_before_command() to skip result handling.
1326        This causes BF aborted transaction to roll back but keep
1327        the error state until next command which is able to return
1328        a result to the client.
1329      */
1330      if (unlikely(wsrep_service_started) &&
   0x000055795e867644 <+212>:   lea    0x1862105(%rip),%rcx        # 0x5579600c9750 <wsrep_service_started>
   0x000055795e86764b <+219>:   cmpb   $0x0,(%rcx)
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) quit
A debugging session is active.

        Inferior 1 [process 200650] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/openxs/dbs/maria10.8/bin/mariadbd, process 200650
[Inferior 1 (process 200650) detached]

At this stage we can stop debugging and quit from gdb. We see that line 1316 in the code (were packet is properly formed and ready for printing) matches (decimal) offset 175 from the beginning of the function. Now we can add that to the start of the function from objdump to find the address for the probe, 0x7b761f.

The only remaining detail is how to print the content of the register r10 as a zero terminated string. Just read fine bpftrace manual for this, and you can come up with the following:

openxs@ao756:~/dbs/maria10.8$ cd ~/git/bpftrace/
openxs@ao756:~/git/bpftrace$ cd build/
openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e 'uprobe:/home/openxs/dbs/maria10.8/bin/mariadbd:0x00000000007b761f { printf("Function: %s packet: %s\n", func, str(reg("r10"))); }'
[sudo] password for openxs:
Attaching 1 probe...
Function: do_command(THD*, bool) packet: select @@version_comment limit 1
Function: do_command(THD*, bool) packet: select 1 + 1
^C

This is what I've got from the probe while in another terminal I've connected and executed the following:

openxs@ao756:~/dbs/maria10.8$ bin/mysql
ERROR 2002 (HY000): Can't connect to local server through socket '/tmp/mysql.sock' (2)
openxs@ao756:~/dbs/maria10.8$ bin/mysql --socket=/tmp/mariadb.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.8.1-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select 1 + 1;
+-------+
| 1 + 1 |
+-------+
|     2 |
+-------+
1 row in set (0.000 sec)

So, as long as you know the address of instruction inside the function code, you can attach a probe to it. You can safely experiment, as if the address is surely wrong:

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e 'uprobe:/home/openxs/dbs/maria10.8/bin/mariadbd:0x00000000007b761e { printf("Function: %s packet: %s\n", func, str(reg("r10"))); }'
Attaching 1 probe...
ERROR: Could not add uprobe into middle of instruction: /home/openxs/dbs/maria10.8/bin/mariadbd:_Z10do_commandP3THDb+174
openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e 'uprobe:/home/openxs/dbs/maria10.8/bin/mariadbd:0x00000000008b761f { printf("Function: %s packet: %s\n", func, str(reg("r10"))); }'
Attaching 1 probe...
ERROR: Could not add uprobe into middle of instruction: /home/openxs/dbs/maria10.8/bin/mariadbd:_ZN14partition_info25set_up_default_partitionsEP3THDP7handlerP14HA_CREATE_INFOj+799

the tool will tell you and do no harm.


Nice conbination of old Tower and new buildings in the City of London. Combining bpftrace with good old gdb is also nice.

To summarize:

  1.  It is possible to add a user probe to every assembly insturction inside the function with bpftrace (or perf, or ftrace, as long as you know what you are doing).
  2. It is possible to access CPU registers in the bpftrace probes.
  3. One can surely access, print etc local varaibles inside functions, as long as the address or register where it is stored is determined.
  4. It is surely easier and faster to do all the above with some knowledge of gdb and assembly language for your CPU.
  5. It is still fun to try to solve more complex problems with bpftrace, so more posts are expected. Stay tuned!