Sunday, January 30, 2022

DTrace Basics and Tracing Every MariaDB Server Function Call

In my previous post I've explained how to build MariaDB Server 10.8 from GitHub source on macOS. This kind of (totally unsupported by both MariaDB Corporation and Foundation) build may be used to study new interesting testing, troubleshooting and performance problem solving options that DTrace provides.

To start adding custom probes and doing something non-trivial one has to understand (or, better, master) some basics of the language used. Code in the sample support-files/dtrace/query-time.d program looked clear and familiar enough (at least for those who've seen bpftrace probes). It consists of several probes like this:

mysql*:::query-start
{
   self->query = copyinstr(arg0);
   self->connid = arg1;
   self->db    = copyinstr(arg2);
   self->who   = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4)));
   self->querystart = timestamp;
}

As we can read in the manual, the format for the probe specification is provider:module:function:name. An empty component in a probe specification matches anything. You can also describe probes with a pattern matching syntax that is similar to the syntax that is used by shell. 

In the case above  (USDT) provider name should start with mysql, we do not care about module and function (actually we maybe have USDT with the name in different functions), and we refer to the probe by name. 

If you are wondering to see other kinds of specific probes, you can check dtrace -l output:

Yuliyas-Air:~ Valerii$ sudo dtrace -l | head -20
dtrace: system integrity protection is on, some features will not be available

   ID   PROVIDER            MODULE                          FUNCTION NAME
    1     dtrace                                                     BEGIN
    2     dtrace                                                     END
    3     dtrace                                                     ERROR
...
   42    profile                                                     tick-1000
   43    profile                                                     tick-5000
   44  mach_trap                                        kern_invalid entry
   45  mach_trap                                        kern_invalid return
   46  mach_trap                    _kernelrpc_mach_vm_allocate_trap entry
Yuliyas-Air:~ Valerii$ sudo dtrace -l | tail -4
dtrace: system integrity protection is on, some features will not be available

254755   pid82428          mariadbd            do_command(THD*, bool) 5e4
254917   pid82428 libsystem_malloc.dylib                            malloc return
467101 security_exception639          csparser _ZN8Security7CFErrorC2Ev [Security::CFError::CFError()] throw-cf
467102 security_exception639          csparser _ZN8Security9UnixErrorC2Eib [Security::UnixError::UnixError(int, bool)] throw-unix

We should study different providers that DTrace supports:

  • dtrace - few special probes: BEGIN, END and ERROR.
  • profile - this provider includes probes that are associated with an interrupt that fires at some regular, specified time interval .profile-N probes basically fire N times per second on every CPU. tick-N probes fire on only one CPU per interval.
  • syscall - enables you to trace every system call entry and return.
  • pid* - enables tracing of any user process, as specified by its pid. It basically allows to trace entries (with entry probe name) and returns (with return probe name) to every user function, among other things.
  • ...

There are more providers actually, but those mentioned above allow to start reading and writing DTrace scripts for MariaDB server tracing and profiling, so let's stop listing them for now.

As we can see from this output 

Yuliyas-Air:~ Valerii$ dtrace
Usage: dtrace [-aACeFHlqSvVwZ] [-arch i386|x86_64] [-b bufsz] [-c cmd] [-D name[=def]]
        [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
        [-x opt[=val]]

        [-P provider [[ predicate ] action ]]
        [-m [ provider: ] module [[ predicate ] action ]]
        [-f [[ provider: ] module: ] func [[ predicate ] action ]]
        [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
        [-i probe-id [[ predicate ] action ]] [ args ... ]

        predicate -> '/' D-expression '/'
           action -> '{' D-statements '}'
...

for each probe we can define optional predicate (in slashes), expression that must evaluate to true (non-zero integer value) when probe fires for optional action (in curly brackets) to be executed. Action is a sequence of DTrace statements separated by semicolons. Note that DTrace does NOT support loops or complex conditional execution in the actions, for good reasons discussed elsewhere. We may also have optional arguments for the probe.

In our query-start probe above there is no predicate, so the action is executed unconditionally when we hit it. We use some built-in functions (like copyinsrt()) and variables, like arg0, arg1 and timestamp. We'll discuss them in details in some other post, but arg0 ... arg9 obviously refer to the first ten arguments of the probe and timestamp is the current value of a nanosecond timestamp counter.

In the action we see several assignments to self->something. These are thread-local variables that share a common name in your D code but refer to separate data storage associated with each operating system thread.

Now we can try to add some probes. Given this information:

Yuliyas-Air:~ Valerii$ ps aux | grep mariadbd
Valerii          37341   0.0  0.0  4267752    700 s003  S+   10:47AM   0:00.01 grep mariadbd
Valerii          82428   0.0  0.1  4834244   5960   ??  S    Wed09PM 116:47.57 /Users/Valerii/dbs/maria10.8/bin/mariadbd --no-defaults --basedir=/Users/Valerii/dbs/maria10.8 --datadir=/Users/Valerii/dbs/maria10.8/data --plugin-dir=/Users/Valerii/dbs/maria10.8/lib/plugin --log-error=/Users/Valerii/dbs/maria10.8/data/Yuliyas-Air.err --pid-file=Yuliyas-Air.pid

and knowing that function name portion of the current probe's description is avaiable ad probefunc built-in variable, we can try to trace the enter into the every tracable function in this MariaDB server process:

Yuliyas-Air:~ Valerii$ sudo dtrace -n 'pid82428:::entry { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available

dtrace: description 'pid82428:::entry ' matched 240404 probes
CPU     ID                    FUNCTION:NAME
  1 128445            cerror_nocancel:entry

Enter cerror_nocancel
  1 129499        _pthread_testcancel:entry
Enter _pthread_testcancel
  1 128444                    __error:entry
Enter __error
  1 129505    _pthread_cond_updateval:entry
Enter _pthread_cond_updateval
  1 129429         pthread_mutex_lock:entry
Enter pthread_mutex_lock
  1  36852                  my_hrtime:entry
Enter my_hrtime
  1 122312              clock_gettime:entry
Enter clock_gettime
  1 122657               gettimeofday:entry
Enter gettimeofday
  1 128412    __commpage_gettimeofday:entry
Enter __commpage_gettimeofday
  1 128413 __commpage_gettimeofday_internal:entry
Enter __commpage_gettimeofday_internal
  1 128343         mach_absolute_time:entry
Enter mach_absolute_time
:

I've used faimiliar C-style printf() function to print some custom string including the name of the function when we hit the probe. Note that dtrace attached probes to 240404 different functions without any problem, in default configuration. We immediately started to gte the output that shows function calls happening in background threads all the time. I've used more to be able to copy-paste small portion of the output and then stopped tracing with Ctrl+C after quiting from more.

Note that by default for each hit of the probe we get probe identifier, CPU that it fired on, function and probe name reported. Add -q option to suppress this and see only what we explicitly print in the probe:

Yuliyas-Air:~ Valerii$ sudo dtrace -q -n 'pid82428:::entry { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available


Enter cerror_nocancel
Enter _pthread_testcancel
Enter __error
Enter _pthread_cond_updateval
Enter pthread_mutex_lock
Enter my_hrtime
Enter clock_gettime
Enter gettimeofday
Enter __commpage_gettimeofday
Enter __commpage_gettimeofday_internal
...

If we want to trace all processes with "mariadbd" as the name of the executable, we can add predicate to the probe that does not specify pid at all:

Yuliyas-Air:~ Valerii$ sudo dtrace -q -n ':::entry / execname == "mariadbd" / { printf ("\nEnter %s", probefunc); }' | more
dtrace: system integrity protection is on, some features will not be available


Enter cerror_nocancel
Enter _pthread_testcancel
Enter __error
Enter _pthread_cond_updateval
Enter pthread_mutex_lock
Enter my_hrtime
Enter clock_gettime
Enter gettimeofday
...

So, we can produce a raw trace (to be stored or processed by some user level code outside of DTrace). But what if we would like to do some more advanced processing in the probe itself and share only the results? For example, we run some MariaDB tests and would like to know how many times specific function in the code was executed (if at all) in the process?

For this we have to use aggregations and rely on aggregate functions to do the job in kernel space. For now it's enough to know that @name[key] is the aggregation name indexed by the key (you can use several keys separated by commas) and count() aggregation function simly counts the number of times it was called. 

Another inportant detail to study to create useful DTrace scripts is scripts arguments substitution. When called with -s option dtrace prowides a set of built-in macro variables to the script, with nsame starting with $. Of them $0 ... $9 have the same meenings as in shell, so $1 is the first argument of the script.

With the above taken into account, we can use the following basic DTrace script to count function executions per name in any user process (quiet option has the same effect as -q):

Yuliyas-Air:~ Valerii$ cat /tmp/codecoverage.d
#!/usr/sbin/dtrace -s
#pragma D option quiet
pid$1::$2:entry {
        @calls[probefunc] = count();
}

This script substitutes the first argument for the process id in the pid provider and the second argument for the function name in the probe, thus we create one or more probes. The aggregation's collected results are printed by default when dtrace script ends, but there is also the printa() function to do this explicitly in the probes if needed. Let's try to apply this to MariaDB server 10.8 running as process 82428 and trace all functions (hence '*' as the explicite second argument):

Yuliyas-Air:~ Valerii$ sudo /tmp/codecoverage.d 82428 '*' >/tmp/res.txt
dtrace: system integrity protection is on, some features will not be available

^C
Yuliyas-Air:~ Valerii$ ls -l /tmp/res.txt
-rw-r--r--  1 Valerii  wheel  58772 Jan 28 20:53 /tmp/res.txt
Yuliyas-Air:~ Valerii$ head -10 /tmp/res.txt


  ACL_USER* find_by_username_or_anon<ACL_USER>(ACL_USER*, unsigned long, char const*, char const*, char const*, char const*, char                1
  ACL_USER::copy(st_mem_root*)                                      1
  Apc_target::init(st_mysql_mutex*)                                 1
  Binary_string::copy(char const*, unsigned long)                   1
  CONNECT::create_thd(THD*)                                         1
  Create_func_arg1::create_func(THD*, st_mysql_const_lex_string*, List<Item>*)                1
  Create_func_sleep::create_1_arg(THD*, Item*)                      1
  Current_schema_tracker::update(THD*, set_var*)                    1
Yuliyas-Air:~ Valerii$ tail -10 /tmp/res.txt
  buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool)            33895
  pthread_self                                                  36422
  mtr_t::memcpy_low(buf_block_t const&, unsigned short, void const*, unsigned long)            44143
  unsigned char* mtr_t::log_write<(unsigned char)48>(page_id_t, buf_page_t const*, unsigned long, bool, unsigned long)            44167
  mach_boottime_usec                                            45007
  mtr_t::modify(buf_block_t const&)                             45609
  pthread_mutex_unlock                                          56530
  pthread_mutex_lock                                            57295
  pthread_cond_broadcast                                        64197
  _platform_memmove$VARIANT$Haswell                             72767
Yuliyas-Air:~ Valerii$ cat /tmp/res.txt | grep do_command
  do_command(THD*, bool)                                            4

I've executed just several statements (4 if you check do_command() executions) over a short enough period of time, but some functions like pthread_mutex_lock were executed thousands of times in the background. Now we have a lame code coverage testing method that works at MariaDB server scale (unlike with bpftrace on Linux). There are performance implications (huge impact when I tried to do the same while running sysbench test with many concurrent threads) and limitations of internal buffer sizes that may cause some probes to be skipped etc, but basically it works!

DTrace is almost as beautiful as Côte d'Azur :)

 To summarize:

  1. We checked basic building blocks of simple DTrace one liners and scripts in some details, including probe structure, main providers, some built-in variables, thread-local variables referenced via self-> and aggregations.
  2. It is possible to add probes to each and every function in MariaDB server 10.8 and do both raw tracing and aggregation in the kernel context. Lame code coverage scripts do work.
  3. It is possible to create parametrized, generic scripts with command line arguments and other macros substituted.
  4. DTrace language is similar to what we see later implemented in bpftrace. After some checks of the manual one can easily switch from one tool to the other for basic tasks.
  5. Built-in variablers and functions may be named differently than in bpftrace.
  6. There are many more details to cover and interesting scripts to use with MariaDB server, so stay tuned!

Thursday, January 27, 2022

First steps with MariaDB Server and DTrace on macOS

FOSDEM 2022 is going to happen on the next weekend and I am still missing blog posts supporting my unusual upcoming talk there, this time devoted to building and using MariaDB server on macOS. So tonight I am going to contribute something new to refer to during my talk or followup questions. I'll try to document my way to build MariaDB Server from GitHub source on old (early 2015) MacBook Air running macOS 10.13.6 High Sierra. I am also going to show why one may want to use macOS there instead of installing Linux and having way more usual environment on the same decent hardware.

I've inherited this Air from my daughter in September, 2021 after she had got a new, M1-based, and initially used it mostly for Zoom calls and web-browsing. Soon I recalled that I've used MacBook for more than 3 years in the past while working for Sun and Oracle, and it was my main working platform not only for content consumption, emails and virtual machines. It was regularly used for bugs verification, MySQL builds and tests of all kinds. It would be a waste of capable OS and good hardware (still more powerful formally than all my other machines but Fedora 33 desktop) NOT to try to use it properly.

That's why after few software upgrades to end up with more recent 10.13 minor release:

Yuliyas-Air:maria10.6 Valerii$ uname -a
Darwin Yuliyas-Air 17.7.0 Darwin Kernel Version 17.7.0: Mon Aug 31 22:11:23 PDT 2020; root:xnu-4570.71.82.6~1/RELEASE_X86_64 x86_64

I read the manual, registered as a developer, downloaded and installed proper version of XCode and decided to proceed with MacPorts.

Then I updated ports tree and proceed by the manual, with installing git and other surely needed tools and packages:

sudo port install git cmake jemalloc judy openssl boost gnutls

That was just the beginning, and eventually, with dependencies, port updates, problems and workarounds, I ended up like this (and that works for building current code of 10.1 to 10.8 for sure, with some ports maybe not needed or used for other builds, but who cares):

Yuliyas-Air:maria10.8 Valerii$ port installed
The following ports are currently installed:
  autoconf @2.71_1 (active)
  automake @1.16.5_0 (active)
  bison @3.8.2_0
  bison @3.8.2_2 (active)
  bison-runtime @3.8.2_0 (active)
  boehmgc @8.0.6_0 (active)
  boost @1.76_0 (active)
  boost171 @1.71.0_3+no_single+no_static+python39 (active)
  boost176 @1.76.0_2+no_single+no_static+python39
  boost176 @1.76.0_3+no_single+no_static+python39 (active)
  bzip2 @1.0.8_0 (active)
  cmake @3.21.4_0
  cmake @3.22.1_0 (active)
  curl @7.80.0_0+ssl (active)
  curl-ca-bundle @7.80.0_0
  curl-ca-bundle @7.80.0_1 (active)
  cyrus-sasl2 @2.1.27_5+kerberos (active)
  db48 @4.8.30_4 (active)
  expat @2.4.1_0
  expat @2.4.2_0
  expat @2.4.3_0 (active)
  gdb @11.1_0 (active)
  gdbm @1.22_0 (active)
  gettext @0.21_0 (active)
  gettext-runtime @0.21_0 (active)
  gettext-tools-libs @0.21_0 (active)
  git @2.34.1_1+credential_osxkeychain+diff_highlight+doc+pcre+perl5_28
  git @2.34.1_2+credential_osxkeychain+diff_highlight+doc+pcre+perl5_28 (active)
  gmp @6.2.1_0 (active)
  gnutls @3.6.16_1
  gnutls @3.6.16_2 (active)
  icu @67.1_4 (active)
  jemalloc @5.2.1_1 (active)
  judy @1.0.5_1 (active)
  kerberos5 @1.19.2_1 (active)
  libarchive @3.5.2_1 (active)
  libb2 @0.98.1_1 (active)
  libcomerr @1.45.6_0 (active)
  libcxx @5.0.1_4 (active)
  libedit @20210910-3.1_1 (active)
  libevent @2.1.12_1 (active)
  libffi @3.4.2_2 (active)
  libiconv @1.16_1 (active)
  libidn @1.38_0 (active)
  libidn2 @2.3.2_0
  libidn2 @2.3.2_1 (active)
  libpsl @0.21.1-20210726_2 (active)
  libtasn1 @4.18.0_0 (active)
  libtextstyle @0.21_0 (active)
  libtool @2.4.6_13 (active)
  libunistring @0.9.10_0
  libunistring @1.0_0 (active)
  libuv @1.42.0_1
  libuv @1.43.0_0 (active)
  libxml2 @2.9.12_1 (active)
  libxslt @1.1.34_6 (active)
  lmdb @0.9.29_0 (active)
  luajit @2.1.0-beta3_5 (active)
  lz4 @1.9.3_1 (active)
  lzma @4.65_1 (active)
  lzo2 @2.10_0 (active)
  m4 @1.4.19_1 (active)
  mariadb @5.5.68_0 (active)
  mariadb-server @5.5.68_0 (active)
  mysql57 @5.7.36_1 (active)
  mysql_select @0.1.2_4 (active)
  ncurses @6.3_0 (active)
  nettle @3.7.3_0 (active)
  openssl @3_1
  openssl @3_2 (active)
  openssl3 @3.0.0_6+legacy
  openssl3 @3.0.1_0+legacy (active)
  openssl11 @1.1.1l_5 (active)
  p5-dbd-mysql @4.50.0_0 (active)
  p5.28-authen-sasl @2.160.0_0 (active)
  p5.28-cgi @4.530.0_0 (active)
  p5.28-clone @0.450.0_0 (active)
  p5.28-dbd-mysql @4.50.0_0+mysql57 (active)
  p5.28-dbi @1.643.0_0 (active)
  p5.28-digest-hmac @1.40.0_0 (active)
  p5.28-digest-sha1 @2.130.0_4 (active)
  p5.28-encode @3.160.0_0 (active)
  p5.28-encode-locale @1.50.0_0 (active)
  p5.28-error @0.170.290_0 (active)
  p5.28-gssapi @0.280.0_3 (active)
  p5.28-html-parser @3.760.0_0 (active)
  p5.28-html-tagset @3.200.0_4 (active)
  p5.28-http-date @6.50.0_0 (active)
  p5.28-http-message @6.350.0_0 (active)
  p5.28-io-html @1.4.0_0 (active)
  p5.28-io-socket-ssl @2.72.0_0
  p5.28-io-socket-ssl @2.73.0_0 (active)
  p5.28-lwp-mediatypes @6.40.0_0 (active)
  p5.28-mozilla-ca @20211001_0 (active)
  p5.28-net-libidn @0.120.0_5 (active)
  p5.28-net-smtp-ssl @1.40.0_0 (active)
  p5.28-net-ssleay @1.900.0_4 (active)
  p5.28-term-readkey @2.380.0_0 (active)
  p5.28-time-local @1.300.0_0 (active)
  p5.28-timedate @2.330.0_0 (active)
  p5.28-uri @5.100.0_0 (active)
  p11-kit @0.24.0_1 (active)
  pcre2 @10.39_0 (active)
  perl5.28 @5.28.3_4 (active)
  perl5.30 @5.30.3_3 (active)
  pkgconfig @0.29.2_0 (active)
  popt @1.18_1 (active)
  python3_select @0.0_2 (active)
  python39 @3.9.9_0+lto+optimizations
  python39 @3.9.10_0+lto+optimizations (active)
  python_select @0.3_9 (active)
  readline @8.1.000_0 (active)
  readline-5 @5.2.014_2 (active)
  rsync @3.2.3_1 (active)
  sqlite3 @3.37.0_0
  sqlite3 @3.37.1_0
  sqlite3 @3.37.2_0 (active)
  sysbench @1.0.20_0 (active)
  tcp_wrappers @20_4 (active)
  texinfo @6.8_0 (active)
  umem @1.0.1_1 (active)
  xxhashlib @0.8.1_0
  xxhashlib @0.8.1_1 (active)
  xz @5.2.5_0 (active)
  zlib @1.2.11_0 (active)
  zstd @1.5.0_0
  zstd @1.5.1_0 (active)
Yuliyas-Air:maria10.8 Valerii$

This is surely not the minimal needed set of ports. I've highlighted a couple (like openssl11) that were really needed to build and install 10.8 successfully eventually.

Then I cloned the code with usual steps and ended up with this:

Yuliyas-Air:server Valerii$ git log -1
commit c1cef1afa9962544de4840c9a796ae0a9b5e92e6 (HEAD -> 10.8, origin/bb-10.8-wlad, origin/bb-10.8-release, origin/HEAD, origin/10.8)
Merge: db2013787d2 9d93b51effd
Author: Vladislav Vaintroub <wlad@mariadb.com>
Date:   Wed Jan 26 13:57:00 2022 +0100

    Merge remote-tracking branch 'origin/bb-10.8-wlad' into 10.8
Yuliyas-Air:server Valerii$ git submodule update --init --recursive
Submodule path 'libmariadb': checked out 'ddb031b6a1d8b6e26a0f10f454dc1453a48a6ca8'
Yuliyas-Air:server Valerii$ cd buildtmp/
Yuliyas-Air:buildtmp Valerii$ rm -rf *
Yuliyas-Air:buildtmp Valerii$ cmake .. -DCMAKE_INSTALL_PREFIX=/Users/Valerii/dbs/maria10.8 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CONFIG=mysql_release -DFEATURE_SET=community -DWITH_EMBEDDED_SERVER=OFF -DPLUGIN_TOKUDB=NO -DWITH_SSL=/opt/local/libexec/openssl11 -DENABLE_DTRACE=1
...
-- The following OPTIONAL packages have not been found:

 * PMEM
 * Snappy

-- Configuring done
CMake Warning (dev):
  Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
  --help-policy CMP0042" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  MACOSX_RPATH is not specified for the following targets:

   libmariadb

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/Valerii/git/server/buildtmp
Yuliyas-Air:buildtmp Valerii$

Note that I've used somewhat nontrivial cmake command line for out of source build, and the output above was not from the clean state, but from the state after 10.1 to 10.7 where all checked out and built successfully, one by one, with problems found and resolved (more on that on my slides and during the talk).

Two key options in that command line are -DWITH_SSL=/opt/local/libexec/openssl11 to use supported OpenSSL version 1.1 (even though for 10.8 default version 3 should already work too) and -DENABLE_DTRACE=1 to enable static DTrace probes in the code of MariaDB that are still there. The rest are typical for my builds and blog posts.

Then I proceeded with usual make to end up with the first problem:

Yuliyas-Air:buildtmp Valerii$ make
...
[ 46%] Built target heap
[ 46%] Building CXX object storage/innobase/CMakeFiles/innobase.dir/btr/btr0btr.cc.o
In file included from /Users/Valerii/git/server/storage/innobase/btr/btr0btr.cc:28:
In file included from /Users/Valerii/git/server/storage/innobase/include/btr0btr.h:31:
In file included from /Users/Valerii/git/server/storage/innobase/include/dict0dict.h:32:
In file included from /Users/Valerii/git/server/storage/innobase/include/dict0mem.h:45:
In file included from /Users/Valerii/git/server/storage/innobase/include/buf0buf.h:33:
/Users/Valerii/git/server/storage/innobase/include/fil0fil.h:1497:11: error:
      'asm goto' constructs are not supported yet
  __asm__ goto("lock btsl $31, %0\t\njnc %l1" : : "m" (n_pending)
          ^
1 error generated.
make[2]: *** [storage/innobase/CMakeFiles/innobase.dir/btr/btr0btr.cc.o] Error 1
make[1]: *** [storage/innobase/CMakeFiles/innobase.dir/all] Error 2
make: *** [all] Error 2

caused by the fact that clang 10.0.0 from XCode: 

Yuliyas-Air:maria10.8 Valerii$ clang --version
Apple LLVM version 10.0.0 (clang-1000.10.44.4)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

does NOT support asm goto used in MariaDB Server code since 10.6 (while it was supposed to support it). I've applied a lame fix (see MDEV-27402 for more details and final diff later) and proceeded:

Yuliyas-Air:buildtmp Valerii$ make
...

[ 97%] Building CXX object client/CMakeFiles/mariadb.dir/mysql.cc.o
/Users/Valerii/git/server/client/mysql.cc:2853:59: error: expected expression
  rl_attempted_completion_function= (rl_completion_func_t*)&new_mysql_co...
                                                          ^
/Users/Valerii/git/server/client/mysql.cc:2853:38: error: use of undeclared
      identifier 'rl_completion_func_t'; did you mean 'rl_completion_matches'?
  rl_attempted_completion_function= (rl_completion_func_t*)&new_mysql_co...
                                     ^~~~~~~~~~~~~~~~~~~~
                                     rl_completion_matches
/usr/include/editline/readline.h:202:16: note: 'rl_completion_matches' declared
      here
char           **rl_completion_matches(const char *, rl_compentry_func_t *);
                 ^
/Users/Valerii/git/server/client/mysql.cc:2854:33: error: assigning to
      'Function *' (aka 'int (*)(const char *, int)') from incompatible type
      'rl_compentry_func_t *' (aka 'char *(*)(const char *, int)'): different
      return type ('int' vs 'char *')
  rl_completion_entry_function= (rl_compentry_func_t*)&no_completion;
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/Valerii/git/server/client/mysql.cc:2856:3: error: no matching function
      for call to 'rl_add_defun'
  rl_add_defun("magic-space", (rl_command_func_t *)&fake_magic_space, -1);
  ^~~~~~~~~~~~
/usr/include/editline/readline.h:195:7: note: candidate function not viable: no
      known conversion from 'rl_command_func_t *' (aka 'int (*)(int, int)') to
      'Function *' (aka 'int (*)(const char *, int)') for 2nd argument
int              rl_add_defun(const char *, Function *, int);
                 ^
4 errors generated.
make[2]: *** [client/CMakeFiles/mariadb.dir/mysql.cc.o] Error 1
make[1]: *** [client/CMakeFiles/mariadb.dir/all] Error 2
make: *** [all] Error 2
Yuliyas-Air:buildtmp Valerii$

This readline-related problem was also reported (see MDEV-27579) and I fixed it with a lame patch, and eventually I was able to build successfully:

...

[100%] Linking C executable wsrep_check_version
[100%] Built target wsrep_check_version
Yuliyas-Air:buildtmp Valerii$ make install && make clean
...
-- Installing: /Users/Valerii/dbs/maria10.8/share/aclocal/mysql.m4
-- Installing: /Users/Valerii/dbs/maria10.8/support-files/mysql.server
Yuliyas-Air:buildtmp Valerii$ echo $?
0

The final diff is like this:

Yuliyas-Air:server Valerii$ git diff -u
diff --git a/client/mysql.cc b/client/mysql.cc
index 6612b273d17..902589f2e83 100644
--- a/client/mysql.cc
+++ b/client/mysql.cc
@@ -2849,7 +2849,7 @@ static void initialize_readline ()
   rl_terminal_name= getenv("TERM");

   /* Tell the completer that we want a crack first. */
-#if defined(USE_NEW_READLINE_INTERFACE)
+#if defined(USE_NEW_READLINE_INTERFACE) && !defined(__APPLE_CC__)
   rl_attempted_completion_function= (rl_completion_func_t*)&new_mysql_completion;
   rl_completion_entry_function= (rl_compentry_func_t*)&no_completion;

@@ -2859,7 +2859,9 @@ static void initialize_readline ()
   setlocale(LC_ALL,""); /* so as libedit use isprint */
 #endif
   rl_attempted_completion_function= (CPPFunction*)&new_mysql_completion;
+#if !defined(__APPLE_CC__)
   rl_completion_entry_function= &no_completion;
+#endif
   rl_add_defun("magic-space", (Function*)&fake_magic_space, -1);
 #else
   rl_attempted_completion_function= (CPPFunction*)&new_mysql_completion;
diff --git a/storage/innobase/include/fil0fil.h b/storage/innobase/include/fil0fil.h
index 34a53746b42..a795313116f 100644
--- a/storage/innobase/include/fil0fil.h
+++ b/storage/innobase/include/fil0fil.h
@@ -1489,7 +1489,7 @@ inline void fil_space_t::reacquire()
 inline bool fil_space_t::set_stopping_check()
 {
   mysql_mutex_assert_owner(&fil_system.mutex);
-#if defined __clang_major__ && __clang_major__ < 10
+#if (defined __clang_major__ && __clang_major__ < 10) || defined __APPLE_CC__
   /* Only clang-10 introduced support for asm goto */
   return n_pending.fetch_or(STOPPING, std::memory_order_relaxed) & STOPPING;
 #elif defined __GNUC__ && (defined __i386__ || defined __x86_64__)
diff --git a/storage/rocksdb/rocksdb b/storage/rocksdb/rocksdb
--- a/storage/rocksdb/rocksdb
+++ b/storage/rocksdb/rocksdb
@@ -1 +1 @@
-Subproject commit bba5e7bc21093d7cfa765e1280a7c4fdcd284288
+Subproject commit bba5e7bc21093d7cfa765e1280a7c4fdcd284288-dirty
Yuliyas-Air:server Valerii$

The last diff in rocksdb submodule is related to the "missing zstd headers" problem I fined with yet another lame patch while building 10.7, see MDEV-27619

After usual mysql_install_db and startup, I've ended up with a shiny new MariaDB 10.8.0 up and running on macOS 10.13.6:

Yuliyas-Air:maria10.8 Valerii$ bin/mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 10.8.0-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show variables like 'version%';
+-------------------------+------------------------------------------+
| Variable_name           | Value                                    |
+-------------------------+------------------------------------------+
| version                 | 10.8.0-MariaDB                           |
| version_comment         | MariaDB Server                           |
| version_compile_machine | x86_64                                   |
| version_compile_os      | osx10.13                                 |
| version_malloc_library  | system                                   |
| version_source_revision | c1cef1afa9962544de4840c9a796ae0a9b5e92e6 |
| version_ssl_library     | OpenSSL 1.1.1l  24 Aug 2021              |
+-------------------------+------------------------------------------+
7 rows in set (0.001 sec)

I even checked few MTR test suites, and many tests pass:

Yuliyas-Air:mysql-test Valerii$ ./mtr --suite=rocksdb
Logging: ./mtr  --suite=rocksdb
VS config:
vardir: /Users/Valerii/dbs/maria10.8/mysql-test/var
Checking leftover processes...
 - found old pid 89161 in 'mysqld.1.pid', killing it...
   process did not exist!
Removing old var directory...
Creating var directory '/Users/Valerii/dbs/maria10.8/mysql-test/var'...
Checking supported features...
MariaDB Version 10.8.0-MariaDB
 - SSL connections supported
Using suites: rocksdb
Collecting tests...
Installing system database...
...
rocksdb.index_merge_rocksdb 'write_committed' [ pass ]    911
rocksdb.shutdown 'write_prepared'        [ pass ]   1895
rocksdb.index_merge_rocksdb 'write_prepared' [ pass ]    923
rocksdb.mariadb_misc_binlog 'write_committed' [ pass ]     43
rocksdb.mariadb_misc_binlog 'write_prepared' [ pass ]     43
...
rocksdb.issue495 'write_committed'       [ pass ]     33
rocksdb.partition 'write_committed'      [ fail ]
        Test ended at 2022-01-27 22:41:15

CURRENT_TEST: rocksdb.partition
mysqltest: At line 67: query 'ALTER TABLE t1 REBUILD PARTITION p0, p1' failed: ER_METADATA_INCONSISTENCY (4064): Table 'test.t1#P#p0#TMP#' does not exist, but metadata information exists inside MyRocks. This is a sign of data inconsistency. Please check if './test/t1#P#p0#TMP#.frm' exists, and try to restore it if it does not exist.

The result from queries just before the failure was:
< snip >
DROP TABLE IF EXISTS employees_hash_1;
DROP TABLE IF EXISTS t1_hash;
DROP TABLE IF EXISTS employees_linear_hash;
DROP TABLE IF EXISTS t1_linear_hash;
DROP TABLE IF EXISTS k1;
DROP TABLE IF EXISTS k2;
DROP TABLE IF EXISTS tm1;
DROP TABLE IF EXISTS tk;
DROP TABLE IF EXISTS ts;
DROP TABLE IF EXISTS ts_1;
DROP TABLE IF EXISTS ts_3;
DROP TABLE IF EXISTS ts_4;
DROP TABLE IF EXISTS ts_5;
DROP TABLE IF EXISTS trb3;
DROP TABLE IF EXISTS tr;
DROP TABLE IF EXISTS members_3;
DROP TABLE IF EXISTS clients;
DROP TABLE IF EXISTS clients_lk;
DROP TABLE IF EXISTS trb1;
CREATE TABLE t1 (i INT, j INT, k INT, PRIMARY KEY (i)) ENGINE = ROCKSDB PARTITION BY KEY(i) PARTITIONS 4;

More results from queries before failure can be found in /Users/Valerii/dbs/maria10.8/mysql-test/var/log/partition.log

 - saving '/Users/Valerii/dbs/maria10.8/mysql-test/var/log/rocksdb.partition-write_committed/' to '/Users/Valerii/dbs/maria10.8/mysql-test/var/log/rocksdb.partition-write_committed/'

Only  148  of 555 completed.
--------------------------------------------------------------------------
The servers were restarted 23 times
Spent 154.726 of 206 seconds executing testcases

Failure: Failed 1/27 tests, 96.30% were successful.

Failing test(s): rocksdb.partition

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

69 tests were skipped, 2 by the test itself.

mysql-test-run: *** ERROR: there were failing test cases

But the real reason to build MariaDB server on macOS (other than "because I can") was not even RocksDB testing, but this:

Yuliyas-Air:mysql-test Valerii$ dtrace
Usage: dtrace [-aACeFHlqSvVwZ] [-arch i386|x86_64] [-b bufsz] [-c cmd] [-D name[=def]]
        [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
        [-x opt[=val]]

        [-P provider [[ predicate ] action ]]
        [-m [ provider: ] module [[ predicate ] action ]]
        [-f [[ provider: ] module: ] func [[ predicate ] action ]]
        [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
        [-i probe-id [[ predicate ] action ]] [ args ... ]

        predicate -> '/' D-expression '/'
           action -> '{' D-statements '}'

        -arch Generate programs and Mach-O files for the specified architecture

        -a  claim anonymous tracing state
        -A  generate plist(5) entries for anonymous tracing
        -b  set trace buffer size
        -c  run specified command and exit upon its completion
        -C  run cpp(1) preprocessor on script files
        -D  define symbol when invoking preprocessor
        -e  exit after compiling request but prior to enabling probes
        -f  enable or list probes matching the specified function name
        -F  coalesce trace output by function
        -h  generate a header file with definitions for static probes
        -H  print included files when invoking preprocessor
        -i  enable or list probes matching the specified probe id
        -I  add include directory to preprocessor search path
        -l  list probes matching specified criteria
        -L  add library directory to library search path
        -m  enable or list probes matching the specified module name
        -n  enable or list probes matching the specified probe name
        -o  set output file
        -p  grab specified process-ID and cache its symbol tables
        -P  enable or list probes matching the specified provider name
        -q  set quiet mode (only output explicitly traced data)
        -s  enable or list probes according to the specified D script
        -S  print D compiler intermediate code
        -U  undefine symbol when invoking preprocessor
        -v  set verbose mode (report stability attributes, arguments)
        -V  report DTrace API version
        -w  permit destructive actions
        -W  wait for specified process and exit upon its completion
        -x  enable or modify compiler and tracing options
        -Z  permit probe descriptions that match zero probes
Yuliyas-Air:mysql-test Valerii$

I remember how cool dtrace was since workin for Sun and after last re-install of Fedora I ended up with my Illumos VM gone and no recent FreeBSD VM anyway,. so basically there was no dtrace at hand until I've got this MacBook.

So, unlike MySQL 8.0, MariaDB server still contains USDT (DTrace probes) and I've built my version with them enabled. Let's quickly check how they can be used. There are few sample D files in the source:

Yuliyas-Air:server Valerii$ ls support-files/dtrace/
locktime.d                      query-rowops.d
query-execandqc.d               query-time.d
query-filesort-time.d           statement-time.d
query-network-time.d            statement-type-aggregate.d
query-parse-time.d
Yuliyas-Air:server Valerii$ cat support-files/dtrace/query-time.d
#!/usr/sbin/dtrace -s
#
# Copyright (c) 2009 Sun Microsystems, Inc.
# Use is subject to license terms.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; version 2 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA
#
# Shows basic query execution time, who execute the query, and on what database

#pragma D option quiet

dtrace:::BEGIN
{
   printf("%-20s %-20s %-40s %-9s\n", "Who", "Database", "Query", "Time(ms)");
}

mysql*:::query-start
{
   self->query = copyinstr(arg0);
   self->connid = arg1;
   self->db    = copyinstr(arg2);
   self->who   = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4)));
   self->querystart = timestamp;
}

mysql*:::query-done
{
   printf("%-20s %-20s %-40s %-9d\n",self->who,self->db,self->query,
          (timestamp - self->querystart) / 1000000);
}
Yuliyas-Air:server Valerii$

We can try to run it (with yet another hack to do that I leave to the curious reader who is really going to try this) and then execute some queries against MariaDB from another terminal:

Yuliyas-Air:server Valerii$ sudo support-files/dtrace/query-time.d
dtrace: system integrity protection is on, some features will not be available

Who                  Database             Query                                    Time(ms)
Valerii@localhost                         select @@version_comment limit 1         0
Valerii@localhost                         select 1+1                               0
Valerii@localhost                         select sleep(4)                          4004
Valerii@localhost                         select @@version_comment limit 1         0
Valerii@localhost                         create database sbtest                   1
Valerii@localhost    sbtest               CREATE TABLE sbtest1(
  id INTEGER NOT NULL AUTO_INCREMENT,
  k INTEGER DEFAULT '0' NOT NULL,
  c CHAR(120) DEFAULT '' NOT NULL,
  pad CHAR(60) DEFAULT '' NOT NULL,
  PRIMARY KEY (id)
) /*! ENGINE = innodb */ 91
Valerii@localhost    sbtest               INSERT INTO sbtest1(k, c, pad) VALUES(366941, '31451373586-15688153734-79729593694-96509299839-83724898275-86711833539-78981337422-35049690573-51724173961-87474696253', '98996621624-36689827414-04092488557-09587706818-65008859162'),(277750, '21472970079-7 181
Valerii@localhost    sbtest               INSERT INTO sbtest1(k, c, pad) VALUES(124021, '80697810288-90543941719-80227288793-55278810422-59841440561-49369413842-83550451066-12907725305-62036548401-86959403176', '65708342793-83311865079-53224065384-18645733125-16333693298'),(660496, '07381386584-5 31
Valerii@localhost    sbtest               INSERT INTO sbtest1(k, c, pad) VALUES(425080, '48883413333-17783399741-03981526516-97596354402-27141206678-83563692683-30244461835-25263435890-49140039573-28211133426', '81560227417-96691828090-72817141653-15106797886-43970285630'),(322421, '68618246702-7 33
...

to end up with a lightweight general query log with query execution time repported in milliseconds. Note some statement from, yes, sysbench test preapre :)

Nice view. macOS is also nice and useful platform to run MariaDB server for fun :)

Thta's probably enough for a simgle post. Now go and get that from your MySQL 8 on any OS! USDTs rule!

To summarize:

  1. It is surely possible to build even latest ang greates still alpha MariaDB 10.8 on macOS even as old as 10.13.6, and get usable result.
  2. One of the main benefits of macOS is DTrace support.
  3. It is still possible to use a limited set of USDTs once added to MySQL and now gone in all current versions of MariaDB server.
  4. That rocksdb.partition test failure is to be studied an maybe reported as a MDEV
  5. DTrace is surely more capable than wehat those examples in the source code show. More DTrace-related posts are coming this year. Stay tuned!

Saturday, January 15, 2022

Accessing Complex Structures in MariaDB Server Code in bpftrace Probes - First Steps

I had already written many blog posts about bpftrace. All my public talks devoted to bpftrace included a slide about "problems", and one of them usually sounded like this:

...access to complex structures (bpftrace needs headers)...

So, today I decided to demonstrate how to resolve this minor problem. As an example I tried to reproduce gdb debugging steps from this older post. I want to trace and report table and row level locks set during execution of various SQL statements against InnoDB tables in MariaDB Server 10.6.6 built from more or less current GitHub code. To reduce the performance impact I'll use recent version of bpftrace build from GitHub source:

openxs@ao756:~/git/bpftrace/build$ src/bpftrace --version
bpftrace v0.14.0-50-g4228

First thing to find out (given the same test case as in that older blog post) is what functions to add uprobe on and what information is available. Let's start with gdb and set a couple of breakpoints on lock_tables and lock_row_lock functions:

openxs@ao756:~/dbs/maria10.6$ sudo gdb -p `pidof mariadbd`
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 14850
[New LWP 14852]
[New LWP 14853]
[New LWP 14854]
[New LWP 14855]
[New LWP 14859]
[New LWP 14861]
[New LWP 14862]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
0x00007f126bc04aff in __GI___poll (fds=0x55ef5e36c838, nfds=3,
    timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) b lock_table
Breakpoint 1 at 0x55ef5c44a840: lock_table. (4 locations)
(gdb) b lock_rec_lock
Breakpoint 2 at 0x55ef5c1c4896: lock_rec_lock. (2 locations)
(gdb) c
Continuing.
[New Thread 0x7f1245ffd700 (LWP 14899)]
[New Thread 0x7f1246fff700 (LWP 14900)]
[New Thread 0x7f1268518700 (LWP 14904)]
[Switching to Thread 0x7f1268518700 (LWP 14904)]

Thread 11 "mariadbd" hit Breakpoint 1, lock_table (table=0x7f1218065b20,
    mode=LOCK_IS, thr=0x7f121806cc10)
    at /home/openxs/git/server/storage/innobase/lock/lock0lock.cc:3481
warning: Source file is more recent than executable.
3481    {
(gdb) p table
$1 = (dict_table_t *) 0x7f1218065b20
(gdb) p table->name
$2 = {m_name = 0x7f1218020908 "test/tt", static part_suffix = "#P#"}
(gdb) c
Continuing.

Thread 11 "mariadbd" hit Breakpoint 1, lock_table (table=0x7f1218065b20,
    mode=LOCK_IS, thr=0x7f121806cc10)
    at /home/openxs/git/server/storage/innobase/include/que0que.ic:37
warning: Source file is more recent than executable.
37              return(thr->graph->trx);
(gdb) c
Continuing.

Thread 11 "mariadbd" hit Breakpoint 2, lock_rec_lock (impl=false, mode=2,
    block=0x7f12480325a0, heap_no=2, index=0x7f1218066f90, thr=0x7f121806cc10)
    at /home/openxs/git/server/storage/innobase/include/que0que.ic:37
37              return(thr->graph->trx);
(gdb) p index
$3 = (dict_index_t *) 0x7f1218066f90
(gdb) p index->name
$4 = {m_name = 0x7f1218067120 "PRIMARY"}
(gdb) p index->table->name
$5 = {m_name = 0x7f1218020908 "test/tt", static part_suffix = "#P#"}
(gdb) p index->table
$6 = (dict_table_t *) 0x7f1218065b20
(gdb) q
A debugging session is active.

        Inferior 1 [process 14850] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/openxs/dbs/maria10.6/bin/mariadbd, process 14850
[Inferior 1 (process 14850) detached]
openxs@ao756:~/dbs/maria10.6$

From the above we already see that breakpoints are set in more than one place (so we expect more that one function name to match for the uprobe). We also see some structures used as functions arguments to access, and so we need a way to define them for tracing.

Now, if we try something lame to add a probe (should be done as root or via sudo) for just lock_tables:

openxs@ao756:~/git/bpftrace/build$ src/bpftrace -e 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table { printf("lock_table: %p, %d, %p\n", arg0, arg1, arg2); }'
ERROR: bpftrace currently only supports running as the root user.
openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table { printf("lock_table: %p, %d, %p\n", arg0, arg1, arg2); }'
[sudo] password for openxs:
Attaching 20 probes...
^C

we can already suspect something bad, as we ended up with 20 probes. We can even list them in a readable way:

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -l 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table' | c++filt
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table(dict_table_t*, lock_mode, que_thr_t*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_tables(THD*, TABLE_LIST*, unsigned int, unsigned int)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_names(THD*, DDL_options_st const&, TABLE_LIST*, TABLE_LIST*, unsigned long, unsigned int)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_names(THD*, DDL_options_st const&, TABLE_LIST*, TABLE_LIST*, unsigned long, unsigned int) [clone .cold]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_create(dict_table_t*, unsigned int, trx_t*, ib_lock_t*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_for_trx(dict_table_t*, trx_t*, lock_mode)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_has_locks(dict_table_t*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_resurrect(dict_table_t*, trx_t*, lock_mode)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_resurrect(dict_table_t*, trx_t*, lock_mode) [clone .cold]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_lock_list_init(ut_list_base<ib_lock_t, ut_list_node<ib_lock_t> lock_table_t::*>*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_print(_IO_FILE*, ib_lock_t const*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_print(_IO_FILE*, ib_lock_t const*) [clone .cold]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_wsrep(dict_table_t*, lock_mode, que_thr_t*, trx_t*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_wsrep(dict_table_t*, lock_mode, que_thr_t*, trx_t*) [clone .cold]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_dequeue(ib_lock_t*, bool)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_dequeue(ib_lock_t*, bool) [clone .cold]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_dequeue(ib_lock_t*, bool) [clone .constprop.0]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table_dequeue(ib_lock_t*, bool) [clone .constprop.0] [clone .cold]
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_tables_precheck(THD*, TABLE_LIST*)
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_tables_open_and_lock_tables(THD*, TABLE_LIST*)
openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -l 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table' | wc -l
20

So, as expected, out probe maptaches any function with "lock_tables" in its name, and using the first row of the output (demangled function signature) does NOT help:

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -l 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table' | head -1 | c++filt
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table(dict_table_t*, lock_mode, que_thr_t*)
openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -l 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table(dict_table_t*, lock_mode, que_thr_t*)'
stdin:1:1-59: ERROR: syntax error, unexpected (, expecting {
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table(dict_table_t*, lock_mode, que_thr_t*)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can NOT use demandled signature, but we can use mangled equivalent:

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -l 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_table' | head -1
uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:_Z10lock_tableP12dict_table_t9lock_modeP9que_thr_t

Like this:

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e 'uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:_Z10lock_tableP12dict_table_t9lock_modeP9que_thr_t
  { printf("lock_table: %p, %d, %p\n", arg0, arg1, arg2); }'

Attaching 1 probe...
lock_table: 0x7f1218065b20, 0, 0x7f121806cc10
lock_table: 0x7f121801d690, 4, 0x7f1218068d60
lock_table: 0x7f121801d690, 1, 0x7f1218068d60
lock_table: 0x7f122400d350, 1, 0x7f121807a158
lock_table: 0x7f122400d350, 1, 0x7f121807a158
lock_table: 0x7f12240101e0, 1, 0x7f121807a900
lock_table: 0x7f12240101e0, 1, 0x7f121807a900
lock_table: 0x7f12240101e0, 1, 0x7f121807a900
lock_table: 0x7f12240101e0, 1, 0x7f121807a900
lock_table: 0x7f12240101e0, 1, 0x7f121807a900
lock_table: 0x7f12240101e0, 1, 0x7f121807a900
^C

I've got the above output whilke execvuting rhis SQL statement:

MariaDB [test]> insert into t(val) select 100 from tt;
Query OK, 4 rows affected (0.038 sec)
Records: 4  Duplicates: 0  Warnings: 0

A bit more table level lock requests than one would expect, and from more than one thread. This can be explained, but the real problem is that we see lock mode, locking thread address but NOT the actual name of the locked table (that is hidden inside a complex structure). Fro0m gdb output we know that the first argument (arg0) of the function is a pointer to dict_table_t structure, but how is it defined? 

I do not know InnoDB source code by heart, so I have to search (with some assumptions in mind this is easier):

openxs@ao756:~/git/server$ grep -rn 'struct dict_table_t' * | grep '\.h'
storage/innobase/include/row0import.h:34:struct dict_table_t;
storage/innobase/include/srv0start.h:33:struct dict_table_t;
storage/innobase/include/dict0mem.h:1788:struct dict_table_t {
storage/innobase/include/dict0types.h:39:struct dict_table_t;

I've highlighted the line where structure definition begins, so I can see what's there (only the much later important part is quoted):

...
public:
        /** Id of the table. */
        table_id_t                              id;
        /** dict_sys.id_hash chain node */
        dict_table_t*                           id_hash;
        /** Table name in name_hash */
        table_name_t                            name;
...

Even the name is not a simple scalar data type, so we can find the definition in te code (I had not cared much to make sure this is really a 10.6 branhc, assuming that some basic things do not change that often in MariaDB, optimistic approach):

openxs@ao756:~/git/server$ vi +102 storage/innobase/include/dict0types.h

struct table_name_t
{
        /** The name in internal representation */
        char*   m_name;

        /** Default constructor */
        table_name_t() {}
        /** Constructor */
        table_name_t(char* name) : m_name(name) {}

        /** @return the end of the schema name */
        const char* dbend() const
        {
                const char* sep = strchr(m_name, '/');
                ut_ad(sep);
                return sep;
        }

        /** @return the length of the schema name, in bytes */
        size_t dblen() const { return size_t(dbend() - m_name); }

        /** Determine the filename-safe encoded table name.
        @return the filename-safe encoded table name */
        const char* basename() const { return dbend() + 1; }

        /** The start of the table basename suffix for partitioned tables */
        static const char part_suffix[4];

        /** Determine the partition or subpartition name suffix.
        @return the partition name
        @retval NULL    if the table is not partitioned */
        const char* part() const { return strstr(basename(), part_suffix); }

        /** @return whether this is a temporary or intermediate table name */
        inline bool is_temporary() const;
};

I do not care about fun ction members, just data stored in the strucute (highlughted). Moreover, read this part of bpftrace reference carefully:

You can define your own structs when needed. In some cases, kernel structs are not declared in the kernel headers package, and are declared manually in bpftrace tools (or partial structs are: enough to reach the member to dereference).

Thing is, while we theoretically can use #include in bpftrace, structured used in MariaDB server are defined in so many heards amd are so specific and sometimes deeply nested, that usually it's easier to use properly sised placeholderns in explicit struct  definitions. You can get proper sizes from gdb:

(gdb) p sizeof(table_id_t)
$1 = 8
(gdb) p sizeof(dict_table_t *)
$2 = 8
(gdb) p sizeof(long long)
$3 = 8

Taking the above into account, we can try to define structured like these:

struct table_name_t
{
        char*   m_name;
        char part_suffix[4];
}

struct dict_table_t {
        long long                              id;
        struct dict_table_t*                           id_hash;
        struct table_name_t                            name;
}

in our bpftrace code to be able to pproperly derefence the table name as a null-terminated string:

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e '
>
> struct table_name_t
> {
        char*   m_name;
        char part_suffix[4];
> }
>
> struct dict_table_t {
        long long                              id;
        struct dict_table_t*                   id_hash;
        struct table_name_t                    name;
> }
>
> uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:_Z10lock_tableP12dict_table_t9lock_modeP9que_thr_t
>   { printf("lock_table: %s, %d, %p\n", str(((struct dict_table_t *)arg0)->name.m_name), arg1, arg2); }'
Attaching 1 probe...
lock_table: test/tt, 0, 0x7f121806cc10
lock_table: test/t, 4, 0x7f1218068d60
lock_table: test/t, 1, 0x7f1218068d60
lock_table: mysql/innodb_table_stats, 1, 0x7f122800e7d8
lock_table: mysql/innodb_table_stats, 1, 0x7f122800e7d8
lock_table: mysql/innodb_index_stats, 1, 0x7f122800ef80
lock_table: mysql/innodb_index_stats, 1, 0x7f122800ef80
lock_table: mysql/innodb_index_stats, 1, 0x7f122800ef80
lock_table: mysql/innodb_index_stats, 1, 0x7f122800ef80
lock_table: mysql/innodb_index_stats, 1, 0x7f122800ef80
lock_table: mysql/innodb_index_stats, 1, 0x7f122800ef80
^C

Looks good enough as a proof of concept already. We see how to define simplified structured containing just enough information to find and derefernce items of complex nested structures used all over the MariaDB server code.

I'd surely want to add a probe to lock_row_lock too, and for this I need some more gdb outputs from the breakpoint set:

Thread 11 "mariadbd" hit Breakpoint 2, lock_rec_lock (impl=false, mode=2,
    block=0x7f12480325a0, heap_no=2, index=0x7f1218066f90, thr=0x7f121806cc10)
    at /home/openxs/git/server/storage/innobase/include/que0que.ic:37
37              return(thr->graph->trx);
(gdb) p index
$3 = (dict_index_t *) 0x7f1218066f90
(gdb) p index->name
$4 = {m_name = 0x7f1218067120 "PRIMARY"}
(gdb) p index->table->name
$5 = {m_name = 0x7f1218020908 "test/tt", static part_suffix = "#P#"}
(gdb) p index->table
$6 = (dict_table_t *) 0x7f1218065b20
(gdb) p index->id
$7 = 48
(gdb) p sizeof(index->id)
$8 = 8
(gdb) p sizeof(index->heap)
$9 = 8
(gdb) p sizeof(index->name)
$10 = 8

(gdb) p impl
$11 = false
(gdb) p sizeof(impl)
$12 = 1
(gdb) p sizeof(mode)
$13 = 4

I can surely find the defintion of dict_index_t structure and work based on it:

struct dict_index_t {
  /** Maximum number of fields */
  static constexpr unsigned MAX_N_FIELDS= (1U << 10) - 1;

        index_id_t      id;     /*!< id of the index */
        mem_heap_t*     heap;   /*!< memory heap */
        id_name_t       name;   /*!< index name */
        dict_table_t*   table;  /*!< back pointer to table */
...

but sizes above are actually enough to come up with a proper code like this:

sudo src/bpftrace -e '

struct table_name_t
{
        char*   m_name;
        char part_suffix[4];
}

struct dict_table_t {
        long long                              id;
        struct dict_table_t*                           id_hash;
        struct table_name_t                            name;
}

struct dict_index_t {
        long long        id;
        long long        heap;
        char*            name;
        struct dict_table_t*    table;
}

uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:_Z10lock_tableP12dict_table_t9lock_modeP9que_thr_t
  { printf("lock_table: %s, mode: %d, thread: %p\n",
      str(((struct dict_table_t *)arg0)->name.m_name),
      arg1,
      arg2); }

uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_rec_lock
  { printf("lock_rec_lock: impl (%d) mode %d index %s rec of %s, thread: %p\n",
      arg0,
      arg1,
      str(((struct dict_index_t *)arg4)->name),
      str(((struct dict_index_t *)arg4)->table->name.m_name),
      arg5); }

that produces the following: 

openxs@ao756:~/git/bpftrace/build$ sudo src/bpftrace -e '
>
> struct table_name_t
> {
        char*   m_name;
        char part_suffix[4];
> }
>
> struct dict_table_t {
        long long                              id;
        struct dict_table_t*                           id_hash;
        struct table_name_t                            name;
> }
>
> struct dict_index_t {
        long longid;
        long longheap;
        char*name;
        struct dict_table_t*table;
> }
>
> uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:_Z10lock_tableP12dict_table_t9lock_modeP9que_thr_t
  { printf("lock_table: %s, mode: %d, thread: %p\n",
      str(((struct dict_table_t *)arg0)->name.m_name),
      arg1,
      arg2); }
>
> uprobe:/home/openxs/dbs/maria10.6/bin/mariadbd:lock_rec_lock
  { printf("lock_rec_lock: impl (%d) mode %d index %s rec of %s, thread: %p\n",
      arg0,
      arg1,
      str(((struct dict_index_t *)arg4)->name),
      str(((struct dict_index_t *)arg4)->table->name.m_name),
      arg5); }
>
> '
Attaching 3 probes...
lock_table: test/tt, mode: 0, thread: 0x7f121806cc10
lock_rec_lock: impl (0) mode 2 index PRIMARY rec of test/tt, thread: 0x7f121806cc10
lock_table: test/t, mode: 4, thread: 0x7f1218068d60
lock_table: test/t, mode: 1, thread: 0x7f1218068d60
lock_rec_lock: impl (0) mode 2 index PRIMARY rec of test/tt, thread: 0x7f121806cc10
lock_rec_lock: impl (0) mode 2 index PRIMARY rec of test/tt, thread: 0x7f121806cc10
lock_rec_lock: impl (0) mode 2 index PRIMARY rec of test/tt, thread: 0x7f121806cc10
lock_rec_lock: impl (0) mode 2 index PRIMARY rec of test/tt, thread: 0x7f121806cc10
lock_table: mysql/innodb_table_stats, mode: 1, thread: 0x7f123400ebc8
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_table_stats, thread: 0x7f123400ebc8
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_table_stats, thread: 0x7f123400ebc8
lock_table: mysql/innodb_table_stats, mode: 1, thread: 0x7f123400ebc8
lock_rec_lock: impl (0) mode 1026 index PRIMARY rec of mysql/innodb_table_stats, thread: 0x7f123400ebc8
lock_table: mysql/innodb_index_stats, mode: 1, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_table: mysql/innodb_index_stats, mode: 1, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 1026 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_table: mysql/innodb_index_stats, mode: 1, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_table: mysql/innodb_index_stats, mode: 1, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 1026 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_table: mysql/innodb_index_stats, mode: 1, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 3 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
lock_table: mysql/innodb_index_stats, mode: 1, thread: 0x7f123400f370
lock_rec_lock: impl (0) mode 1026 index PRIMARY rec of mysql/innodb_index_stats, thread: 0x7f123400f370
^C

while this SQL is executed, for example:

MariaDB [test]> insert into t(val) select 100 from tt;
Query OK, 4 rows affected (0.080 sec)
Records: 4  Duplicates: 0  Warnings: 0

Please, check that older post from proper interpretation of the output, including contants used to represent lock mode etc. The idea here was to show that bpftrace understands a subset of C/C++ struct statement and show the ways to create such simplified strucutres to be able to dereference and access deeply nexted arguments if needed for tracing.

A bit more advanced code may show more details about the individual rows locked...

* * *

To summarize:

  1. With some efforts like source code checks and gdb breakpoints/prints one can eventually figure out what structures to define to be able to access members of complex structures typical for MariaDB and MySQL server code in the bpftrace code.
  2. Inlcuding existing headers is NOT possible and is hardly practical for complex software like MariaDB that uses C++ nowadays.
  3. bpftrace, with some efforts like those presented above, allows to study what happens in complex server operations (like InnoDB locking) easily, safely and with minimal impact on a running system.
  4. bpftrace is cool, but you already know that!