Monday, January 20, 2020

Dynamic Tracing of MariaDB Server With bcc trace - Basic Example

This is a yet another blog post in my series about dynamic tracing of MySQL server (and friends) on Linux. Logically it had to appear after this one about perf and another one about bpftrace. For older Linux systems or when you are in a hurry with customer and have no time to upgrade, build from source etc perf just works and is really flexible (but it comes with some cost of writing many samples to disk and then processing them). For happy users of Linux with kernels 4.9+ (the newer the better), like recent Ubuntu, RHEL 8, Debian 9+ or Fedora there entire world of new efficient tracing with bpftrace is open and extending with every new kernel release.

For those in between, like me with this Ubuntu 16.04:
openxs@ao756:~/git/bcc/build$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
openxs@ao756:~/git/bcc/build$ uname -a
Linux ao756 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
the fancy world of eBPF and more efficient dynamic tracing is still mostly open, as we can try to use BCC tools. BCC is a toolkit for creating efficient kernel tracing and manipulation programs that includes several potentially useful tools and examples for MySQL DBAs. It makes use of extended BPF (Berkeley Packet Filters), formally known as eBPF.

I have a draft of this blog post hanging around since October 2019, but every time I tried to complete it I was not happy with the content. I wanted to get back, test more, try to present more tools, find out how to be able to access structure members in probes as easy as I can do it with gdb or perf, but then I hit some problem and put the draft aside...

When I started again some time later I often hit some new problem, so today I just decided to finally write down what I already know for sure, and provide at least a very basic example of dynamic tracing along the lines of those used in earlier posts (capturing queries executed by different threads using dynamic probes).

The first problem in case of Ubuntu 16.04 is to get the binaries of BCC tools. One of the ways is to build from GitHub source. The INSTALL.md document is clear enough when describing build dependencies and steps:
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install
but still there is something to note. In recent versions you surely have to update the libbpf submodule, or you'll end up with compilation errors at early stage. My steps today were the following:
openxs@ao756:~/dbs/maria10.3$ cd ~/git/bcc/
openxs@ao756:~/git/bcc$ git pull
Already up-to-date.
openxs@ao756:~/git/bcc$ git log -1
commit dce8e9daf59f44dec4e3500d39a82a8ce59e43ba
Author: Yonghong Song <yhs@fb.com>
Date:   Fri Jan 17 22:06:52 2020 -0800

    sync with latest libbpf repo

    sync libbpf submodule upto the following commit:
        commit 033ad7ee78e8f266fdd27ee2675090ccf4402f3f
        Author: Andrii Nakryiko <andriin@fb.com>
        Date:   Fri Jan 17 16:22:23 2020 -0800

            sync: latest libbpf changes from kernel

    Signed-off-by: Yonghong Song <yhs@fb.com>

openxs@ao756:~/git/bcc$ git submodule init
openxs@ao756:~/git/bcc$ git submodule update
Submodule path 'src/cc/libbpf': checked out '033ad7ee78e8f266fdd27ee2675090ccf4402f3f'
Now I can proceed to build subdirectory and complete the build:
openxs@ao756:~/git/bcc/build$ cmake .. -DCMAKE_INSTALL_PREFIX=/usr...
openxs@ao756:~/git/bcc/build$ make
...
[ 99%] Building CXX object tests/cc/CMakeFiles/test_libbcc.dir/test_usdt_probes.cc.o
[100%] Building CXX object tests/cc/CMakeFiles/test_libbcc.dir/utils.cc.o
[100%] Linking CXX executable test_libbcc
[100%] Built target test_libbcc
It's always interesting to check if tests pass:
openxs@ao756:~/git/bcc/build$ make test
Running tests...
Test project /home/openxs/git/bcc/build
      Start  1: style-check
 1/40 Test  #1: style-check ......................   Passed    0.01 sec
      Start  2: c_test_static
 2/40 Test  #2: c_test_static ....................   Passed    0.30 sec
...
40/40 Test #40: lua_test_standalone ..............***Failed    0.06 sec

75% tests passed, 10 tests failed out of 40

Total Test time (real) = 450.78 sec

The following tests FAILED:
          3 - test_libbcc (Failed)
          4 - py_test_stat1_b (Failed)
          5 - py_test_bpf_log (Failed)
          6 - py_test_stat1_c (Failed)
          7 - py_test_xlate1_c (Failed)
          8 - py_test_call1 (Failed)
         16 - py_test_brb (Failed)
         17 - py_test_brb2 (Failed)
         18 - py_test_clang (Failed)
         40 - lua_test_standalone (Failed)
Errors while running CTest
Makefile:105: recipe for target 'test' failed
make: *** [test] Error 8
I've always had some tests failed, and one day I probably have to report the issue for the project, but for the purpose of this post (based on previous experience with older code) I'll get at least trace tool working as expected. So, I decided to process with installation:
openxs@ao756:~/git/bcc/build$ sudo make install
...
-- Up-to-date: /usr/share/bcc/tools/old/stackcount
-- Up-to-date: /usr/share/bcc/tools/old/oomkill
The tools are installed by default to /usr/share/bcc/tools.

For adding dynamic probes I'll use trace tool  that probes functions you specify and displays trace messages if a particular condition is met. You can control the message format to display function
arguments and return values.

Brendan Gregg explains the usage of this and other tools here with a lot of details. I'll just add a nice chart from that page here:
There is a separate tutorial with examples. You may want to check section for trace tool there.

For the purpose of this blog post I think it's enough to quickly check help output:
openxs@ao756:~/git/bcc/build$ sudo /usr/share/bcc/tools/trace
usage: trace [-h] [-b BUFFER_PAGES] [-p PID] [-L TID] [-v] [-Z STRING_SIZE]
             [-S] [-M MAX_EVENTS] [-t] [-u] [-T] [-C] [-c CGROUP_PATH]
             [-n NAME] [-f MSG_FILTER] [-B] [-s SYM_FILE_LIST] [-K] [-U] [-a]
             [-I header]
             probe [probe ...]
trace: error: too few arguments
and note the following basic syntax is used to define probes (see man /usr/share/bcc/man/man8/trace.8.gz after buuilding the tools from source as described above):
PROBE SYNTAX
       The general probe syntax is as follows:

       [{p,r}]:[library]:function[(signature)]      [(predicate)]     ["format
       string"[, arguments]]

       {t:category:event,u:library:probe}  [(predicate)]  ["format   string"[,
       arguments]]

       {[{p,r}],t,u}
              Probe  type  -  "p" for function entry, "r" for function return,
              "t" for kernel tracepoint, "u" for USDT probe. The default probe
              type is "p".
...
Fir simplicity here we do not consider conditional probes, so predicate is skipped. At the moment we are not interested in kernel or or user defined static tracepoints (the are not defined in default recent builds of MySQL or MariaDB server anway and require -DENABLE_DTRACE=ON to be explictly added to cmake command line used). For user defined dynamic probes in the mysqld process we need p (for probe at function entry) and maybe r (for function return).

We need to refer to library, and in our case this is a full path name to the mysqld binary (or just mysqld if it's in PATH). We also need to refer to some function by names. Quick test will show you that by default trace does NOT accept plain function names in MySQL or MariaDB code (as perf does), and require mangled ones to be used (same as bpftrace). We can find the names with nm command:
openxs@ao756:~/git/bcc/build$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5180 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5180 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
In the example above I was specifically looking for dispatch_command() function of MariaDB server version 10.3.x that I assume (see previous post) has a string with SQL statement as the third argument, packet. So, I can refer to this function in probe as "_Z16dispatch_command19enum_server_commandP3THDPcjbb".

The "format string" that define how to output arguments of probe  is a printf-style format string.  You  can  use the following format specifiers: %s, %d%u, ...  with the same semantics as printf's. In our case for zero-terminating string we'll use "%s".

Arguments of the function traced are named arg1, arg2, ... argN (unless we provide a signature for function), and are numbered starting from 1. So, in out case we can add a probe to print third argument of the dispatch_command() function upon entry as follows:
openxs@ao756:~/dbs/maria10.3$ sudo /usr/share/bcc/tools/trace -T 'p:/home/openxs/dbs/maria10.3/bin/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb "%s" arg3'
[sudo] password for openxs:
TIME     PID     TID     COMM            FUNC             -
16:16:53 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select @@version_comment limit 1
16:17:02 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select 1
16:17:05 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb select 2
16:17:07 26585   29133   mysqld          _Z16dispatch_command19enum_server_commandP3THDPcjbb
^C
I've got the output above for this sample session:
openxs@ao756:~/dbs/maria10.3$ bin/mysql -uroot --socket=/tmp/mariadb.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 16
Server version: 10.3.22-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0,000 sec)

MariaDB [(none)]> select 2;
+---+
| 2 |
+---+
| 2 |
+---+
1 row in set (0,001 sec)

MariaDB [(none)]> exit
Bye
Note that I've added -T option to the command line to output timestamp.
  
It's a bit more complex with recent versions of Percona Server or MySQL (or any function parameters that are complex structures). It's also more complex if we want to process prepared statements and process packet content depending on the first argument (the above is correct only for COM_QUERY) and so on. But these are basic steps to get a log of SQL queries with timestamps by adding a dynamic probe with BCC trace tool. Enjoy!

Saturday, January 18, 2020

Fun with Bugs #92 - On MySQL Bug Reports I am Subscribed to, Part XXVI

I'd like to continue reviewing MySQL bug reports from Community users that I considered interesting and subscribed to. Unlike in the previous post in this series, I am not going to check test cases on any competitor product, but will use only recently released MySQL 5.7.29 and 8.0.19 for checks, if any. This time I'll concentrate on bugs reported in November 2019.

As usual, I mostly care about optimizer, InnoDB and replication related bugs. Here is the list:
  • Bug #97476 - "Range optimizer skips rows". This bug reported by Ilya Raudsepp looks like a clear regression in MySQL 8.0.x comparing to MySQL 5.7.x at least. I get the following correct results with 5.7.29:
    mysql> SELECT t.id
        -> FROM Test t
        -> JOIN (
        ->     SELECT item_id, MAX(created_at) AS created_at
        ->     FROM Test t
        ->     WHERE (platform_id = 2) AND (item_id IN (3,2,111)) AND (type = 'Default')
        ->     GROUP BY item_id
        -> ) t2 ON t.item_id = t2.item_id
        ->   t.item_id = t2.item_id
        ->   AND t.created_at = t2.created_at
        ->   AND t.type = 'Default'
        -> WHERE t.platform_id = 2;
    +----+
    | id |
    +----+
    |  6 |
    |  3 |
    |  5 |
    +----+
    3 rows in set (0,03 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 5.7.29    |
    +-----------+
    1 row in set (0,02 sec)
  • Bug #97531 - "5.7 replication breakage with syntax error with GRANT management". This tricky bug reported by Simon Mudd applies also to MySQL 8.0.x. It is closed as fixed, but the fix had not made it to recent 5.7.29 and 8.0.19 releases, so you'll have to wait for few more months.
  • Bug #97552 - "Regression: LEFT JOIN with Impossible ON condition performs slowly". Yet another optimizer regression in MySQL 8 (comparing to 5.7.x) that is fixed only in MySQL 8.0.20+. The bug was reported by Fredric Johansson.
  • Bug #97648 - "Bug in order by clause in union clause". Yet another regression (at least from user's point of view) in recent MySQL 5.7.x and 8.0.x comparing to 5.6.x. This time without a "regression" tag. The bug was reported by Andrei Mart.
  • Bug #97662 - "MySQL v8.0.18 FIPS mode is no longer supported". According to Ryan L, MySQL 8.0.18+ is no longer supporting ssl_fips_mode=STRICT, as OpenSSL 1.1.1 is not FIPS-compatible and MySQL Server must be compiled using OpenSSL 1.1.1 or higher. That's interesting. Check also this link.
  • Bug #97682 - "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception". This regression (comparing to MySQL 5.7) was reported by Jericho Rivera. It is fixed in MySQL 8.0.20. The patch was provided by Kamil Holubicki.
  • Bug #97692 - "Querying information_schema.TABLES issue". I do not see any documented attempt to check on MySQL 8.0, so I had to add a comment to the bug report. From what I see, in MySQL 8.0.19 we still get different (empty) result from the second query, but at least now we have a warning:
    mysql> SELECT ts.TABLE_SCHEMA
        -> FROM information_schema.TABLES ts
        -> WHERE ts.TABLE_TYPE ='VIEW'
        -> AND ts.TABLE_SCHEMA NOT IN ('sys')
        -> AND ts.TABLE_COMMENT LIKE '%invalid%';
    +--------------+
    | TABLE_SCHEMA |
    +--------------+
    | test         |
    +--------------+
    1 row in set, 1 warning (0,00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
      Level: Warning
       Code: 1356
    Message: View 'test.v' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
    1 row in set (0,00 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 8.0.19    |
    +-----------+
    1 row in set (0,00 sec)
    The bug was reported by Vinicius Malvestio Grippa.
  • Bug #97693 - "ALTER USER user IDENTIFIED BY 'password' broken by invalid authentication_string". The bug was reported by Nikolai Ikhalainen. MySQL 8.0.19 is still affected.
  • Bug #97694 - "MySQL 8.0.18 fails on STOP SLAVE/START SLAVE stress test". For some reason I do not see any documented attempt to verify this on MySQL 5.7 also. The bug was reported by Przemysław Skibiński, who also suggested a fix.
  • Bug #97734 - "Document the correct method to stop slaving with MTS without a warning or error". I can only agree with this request from Buchan Milne. Please. do :)
  • Bug #97735 - "ALTER USER IF EXISTS ... WITH_MAX_USER_CONNECTIONS 9999 not applied correctly". yet another bug report by Simon Mudd in this list. For some reason, again, I do not see any documented attempt to verify the bug on MySQL 8.0.x, while there is no clear reason to think it is not affected.
  • Bug #97742 - "bad item ref from correlated subquery to outer distinct table". This bug was reported by Song Zhibai, who also had contributed a patch. Based on further comments from  Øystein Grøvlen and these results:
    mysql> EXPLAIN SELECT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    | id | select_type        | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    |  1 | PRIMARY            | t1    | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |    3 |   100.00 | NULL        |
    |  2 | DEPENDENT SUBQUERY | t2    | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
    +----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
    2 rows in set, 2 warnings (0,00 sec)

    mysql> show warnings\G
    *************************** 1. row ***************************
      Level: Note
       Code: 1276
    Message: Field or reference 'f2' of SELECT #2 was resolved in SELECT #1
    *************************** 2. row ***************************
      Level: Note
       Code: 1003
    Message: /* select#1 */ select `test`.`t1`.`f3` AS `f3` from `test`.`t1` having (/* select#2 */ select 1 from `test`.`t2` having `test`.`t1`.`f2` limit 1)
    2 rows in set (0,00 sec)

    mysql> select version();
    +-----------+
    | version() |
    +-----------+
    | 5.7.29    |
    +-----------+
    1 row in set (0,00 sec)
    I'd say that MySQL 5.7.x is also affected, but for some reason nobody documented any attempt to verify it there. So, I've added a comment.
  • Bug #97777 - "separate global variables (from hot variables) using linker script (ELF)". Beautiful bug report from Daniel Black.With a lot of details, perf and readelf outputs and patch contributed. See also his Bug #97822 - "buf_page_get_gen buf_pool->stat.n_page_gets++ is a cpu waste", with perf analysis up to a single assembler instruction level and fix suggested.
  • Bug #97825 - "dd_mdl_acquire in dd_table_open with dict_sys->mutex hold may cause deadlock". Here I am really puzzled by no visible attempt to check the arguments of bug reporter, Dave Do, who tried to perform lock order analysis by code review. All we see as a result is this:
    "Lock order could be different, but it is irrelevant, since these are locks on totally different levels and can't, in themselves, cause any deadlock."
    What a great argument! Not a bug, surely... We trust you.
"What bugs are you talking about? I have no bugs, neither does MySQL 8!"
To summarize:
  1. MySQL 8 introduces some optimizer (and some other) regressions. They seem to be fixed fast enough, but I wonder why only Community users were able to find them not Oracle's QA...
  2. MySQL 8.0.19 is surely great, but I see many serious bugs fixed o0nly in 8.0.20+.
  3. Percona, Booking and Facebook engineers still continue contributing high quality bug reports, comments/verification details and patches. Oracle is lucky to have such nice partners in making MySQL better.
  4. I still see problems with following proper verification procedures and documenting the results. Too often the bug reported for 8.0.x is NOT checked on 5.7.x as well, regression tag is not set, and so on. Sometimes reports are closed as "Not a bug" without any attempt to follow the analysis provided or prove the point. This is sad and wrong.

Sunday, January 12, 2020

Fun with Bugs #91 - On MySQL Bug Reports I am Subscribed to, Part XXV

Not sure if it's still interesting to anybody else, but MySQL users keep finding and reporting new problems that may be caused by genuine bugs in the code. I keep checking these reports and subscribing to those I consider interesting. Let me start blogging in the New Year of 2020 with a review of some replication, InnoDB and (many!) optimizer bugs reported in September and October, 2019.

As usual, I start from the oldest and care to mention bug reporters by names and add links to their other bug reports, if any. So, here is the new list:
  • Bug #96827 - "mysqlbinlog needs options to abort if invalid events are found on in-use binlogs". I had never checked myself, but I see no reasons not to trust Yoshinori Matsunobu in this case, based on code fragments shared. All current MySQL versions, from 5.6.x to 8.0.x, are affected. From what I see here, MariaDB is also affected.
  • Bug #96853 - "Inconsistent super_read_only status when changing the variable is blocked". Nice bug report by Przemyslaw Malkowski from Percona. For some reason I do not see clear statement if MySQL 8 is affected.
  • Bug #96874 - "The write_notifier_mutex in log_sys is useless". This bug was reported by Chen Zongzhi on MySQL 8.0.17 (see also hist another similar Bug #97358 - "The log.flush_notifier_mutex in log_sys is useless"), but "Version" filed is empty even though the bug is "Verified". This is NOT acceptable.
  • Bug #96946 - "Outer reference in join condition isn't allowed". This bug (that affects all MySQL versions) was reported by Laurents Meyer. See also older Bug #35242 (still "Verified" and affects MariaDB 10.3.x as well).
  • Bug #96950 - "CONCAT() can generate corrupted output". I wish we'd see the exact test case, but at least based on code review this bug (reported by Jay Edgar) was verified for MySQL 5.6 and 5.7. I see the same code in MariaDB, unfortunately.
  • Bug #97001 - "Dangerous optimization reconsidering_access_paths_for_index_ordering". The problems is with queries like this:
    SELECT ... WHERE [secondary key conditions] ORDER BY `id` ASC LIMIT n
    and bug reporter, Jeremy Cole, listed a lot of potentially related older bug reports. he had also suggested a patch. I'd be happy to see the fix in MySQL soon.
  • Bug #97113 - "BIT column serialized incorrectly in CASE expression". This bug report was created by Bradley Grainger. It is stated there that MySQL 5.7 (not only 8.0) is affected, but "Version:" field of a verified bug does NOT list 5.7.x. Absolutely wrong way of bugs processing. MariaDB also seems to be inconsistent, even though the result for one of the queries is different:
    MariaDB [test]> SELECT CASE WHEN name IS NOT NULL THEN value ELSE NULL END FROM query_bit;
    Field   1:  `CASE WHEN name IS NOT NULL THEN value ELSE NULL END`
    Catalog:    `def`
    Database:   ``
    Table:      ``
    Org_table:  ``
    Type:       NEWDECIMAL
    Collation:  binary (63)
    Length:     2
    Max_length: 1
    Decimals:   0
    Flags:      BINARY NUM


    +-----------------------------------------------------+
    | CASE WHEN name IS NOT NULL THEN value ELSE NULL END |
    +-----------------------------------------------------+
    | 1                                                   |
    +-----------------------------------------------------+
    1 row in set (0.021 sec)
  • Bug #97150 - "rwlock: refine lock->recursive with C11 atomics". Patch for MySQL 8.0.x was contributed by Cai Yibo. See also his another contribution, Bug #97228 - "rwlock: refine lock->lock_word with C11 atomics".
  • Bug #97299 - "Improve the explain informations for Hash Joins". Simple EXPLAIN (unlike the one with format=tree) does not give a hint that new MySQL 8.0.18+ features, hash join, was used. Simple and useful feature request from Tibor Korocz.
  • Bug #97345 - "IO Thread not detecting failed master with relay_log_space_limit." Nice bug report from Jean-François Gagné, but no documented attempt to check if MySQL 5.6.x and 8.0.x are also affected.
  • Bug #97347 - "In some cases queries with ST_CONTAINS do not return any results". Simple and easy to check bug report from Christian Koinig. Note that based on quick test MariaDB is NOT affected:
    MariaDB [test]> select version(), count(*) FROM test
        -> WHERE ST_CONTAINS(
        ->  geo_footprint,
        ->  ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[15.11333480819996
    6,48.1337532388],[15.113329984100005,48.1337371609],[15.113411697200036,48.13371
    66354],[15.113673777399981,48.1336819199],[15.114544787600039,48.1335464618],[15
    .115336574000025,48.1334415189],[15.116374992200008,48.1332937084],[15.117266346
    799966,48.1330924824],[15.11769786879995,48.1329803459],[15.118129375299986,48.1
    32868199],[15.118515258099933,48.1327388086],[15.118597296700045,48.1327141533],
    [15.118635348899943,48.132702717],[15.11867907729993,48.1327796282],[15.11876276
    890007,48.13290987],[15.118805112699988,48.1330357889],[15.118850101500016,48.13
    33486685],[15.118823191700017,48.1334777297],[15.118820984299987,48.1334784295],
    [15.118821076099948,48.1334779691],[15.113334808199966,48.1337532388],[15.113334
    808199966,48.1337532388]]]}'));
    +--------------------+----------+
    | version()          | count(*) |
    +--------------------+----------+
    | 10.3.7-MariaDB-log |        1 |
    +--------------------+----------+
    1 row in set (0.003 sec)
  • Bug #97372 - "Constructor Query_event must check enough space". Contribution to 5.7 and 8.0 by Pengbo Shi. Waiting for the OCI signed by the contributor...
  • Bug #97418 - "MySQL chooses different execution plan in 5.7". Interesting bug report from Vinodh Krish. I am again not sure if versions affected match the results of tests presented here.
  • Bug #97421 - "Replace into affected row count not match with events in binlog". Not sure if MySQL 8 was checked, but MariaDB 10.3.7 also uses single Update_rows event in the binary log. Thanks to Ke Lu for noticing and reporting this!
Also, on a separate note, this claim of MySQL 8.0 performance regression from Mark CallaghanBug #86215, is still being analyzed it seems. No further comments for 2.5 years already!

Autumn of 2019 was fruitful. A lot of interesting MySQL bug reports also, not just grapes on my balcony...
To summarize:
  1. For some reason I often do not see explicit documented attempts by Oracle MySQL engineers from the bugs verification team to check bug on different MySQL versions. Sometimes obviously affected version (like MySQL 8.0.x) is not listed in the field. So "Version" field becomes useless This is absolutely wrong. Maybe I should submit yet another talk to some conference on how to process bugs properly?
  2. Some regression bugs are still not marked with "regression" tag when verified.
  3. MySQL optimizer still requires a lot of work to become decent.
  4. I see a lot of new interesting new bug reports both from well known old community members and users I had never noticed before by name. This is great and proves that MySQL is still alive and use all kinds of contributions from Community.
Next time I'll review interesting bugs reported in November and December, 2019. Stay tuned!

Saturday, November 30, 2019

Fun with Bugs #90 - On MySQL Bug Reports I am Subscribed to, Part XXIV

Previous post in this series was published 3 months ago and the last Bug #96340 from it is already closed as fixed in upcoming MySQL 8.0.19. I've picked up 50+ more bugs to follow since that time, so I think I should send quick status update about interesting public MySQL bug reports that are still active.

As usual I concentrate mostly on InnoDB, replication and optimizer bugs. Here is the list, starting from the oldest:
  • Bug #96374  - "binlog rotation deadlock when innodb concurrency limit setted". This bug was reported by Jia Liu, who used gdb to show threads deadlock details. I admit that recently more bug reporters use gdb and sysbench with custom(ized) Lua scripts to prove the point, and I am happy to see this happening!
  • Bug #96378 - "Subquery with parameter is exponentially slower than hard-coded value". In my primitive test with user variables replaced by constants (on MariaDB 10.3.7) I get the same plan for the query, so I am not 100% sure that the analysis by my dear friend Sinisa Milivojevic was right and it's about optimization (and not comparing values with different collations, for example). But anyway, this problem reported by Jeff Johnson ended up as a verified feature request. Let's see what may happen to it next.
  • Bug #96379 - "First query successful, second - ERROR 1270 (HY000): Illegal mix of collations ". This really funny bug was reported by Владислав Сокол.
  • Bug #96400 - "MTS STOP SLAVE takes over a minute when master crashed during event logging". Nice bug report by Przemyslaw Malkowski from Percona, who used sysbench and dbdeployer to demonstrate the problem. Later Przemysław Skibiński (also from Percona) provided a patch to resolve the problem.
  • Bug #96412 - "Mess usages of latch meta data for InnoDB latches (mutex and rw_lock)". Fungo Wang had to make a detailed code analysis to get this bug verified. I am not sure why it ended up with severity S6 (Debug Builds) though.
  • Bug #96414 - "CREATE TABLE events in wrong order in a binary log.". This bug was reported by Iwo P. His test case to demonstarte the problem included small source code modification, but (unlike with some other bug reports) this had NOT prevented accepting it as a true, verified bug. The bug not affect MySQL 8.0.3+ thanks to WL#6049 "Meta-data locking for FOREIGN KEY tables" implemented there.
  • Bug #96472 - "Memory leak after 'innodb.alter_crash'". Yet another bug affecting only MySQL 4.7 and not MySQL 8.0. It was reported by Yura Sorokin from Percona.
  • Bug #96475 - "ALTER TABLE t IMPORT TABLESPACE blocks SELECT on I_S.tables.".  Clear and simple "How to repeat" instructions (using dbdeployer) by Jean-François Gagné. See also his related Bug #96477 - "FLUSH TABLE t FOR EXPORT or ALTER TABLE t2 IMPORT TABLESPACE broken in 8.0.17" for MySQL 8. The latter is a regression bug (without a regression tag), and I just do not get how the GA releases with such new bugs introduced may happen.
  • Bug #96504 - "Refine atomics and barriers for weak memory order platform". Detailed analysis, with links to code etc from Cai Yibo.
  • Bug #96525 - "Huge malloc when open file limit is high". Looks more like a systemd problem (in versions < 240) to me. Anyway, useful report from Andreas Hasenack.
  • Bug #96615 - "mysql server cannot handle write operations after set system time to the past". A lot of arguments were needed to get this verified, but Shangshang Yu was not going to give up. First time I see gstack used in the bug report to get a stack trace quickly. It's a part of gdb RPM on CentOS 6+. I have to try it vs gdb and pstack one day and decide what is the easiest and most efficient way to get backtraces of all threads in production...
  • Bug #96637 - "Clone fails on just upgraded server from 5.7". I had not used MySQL 8 famous clone plugin yet in practice, but I already know that it has bugs. This bug was reported by Satya Bodapati, who also suggested a patch.
  • Bug #96644 - "Set read_only on a master waiting for semi-sync ACK blocked on global read lock". Yet another problem (documented limitation) report from Przemyslaw Malkowski. Not sure why it was not verified on MySQL 8.0. Without a workaround to set master to read only it is unsafe to use long rpl_semi_sync_master_timeout values, as we may end up with that long downtime.
  • Bug #96677 - ""SELECT ... INTO var_name FOR UPDATE" not working in MySQL 8". This regression bug was reported by Vinodh Krish. Some analysis and patch were later suggested by Zsolt Parragi.
  • Bug #96690 - "sql_require_primary_key should not apply to temporary tables". This bug was also reported by Przemyslaw Malkowski from Percona. It ended up as a verified feature request, but not everyone in community is happy with this. Let me quote:
    "[30 Aug 8:08] Jean-François Gagné
    Could we know what was the original severity of this bug as reported by Przemyslaw ? This is now hidden as it has been reclassified as S4 (Feature Request).

    From my point of view, this is actually a bug, not a feature request and it should be classified as S2. A perfectly working application would break for no reason when a temporary table does not have a Primary Key, so this is actually a big hurdle for using sql_require_primary_key, hence serious bug in the implementation of this otherwise very nice and useful feature.
    "
That's all about bugs I've subscribed to in summer.
Winter is coming, so why not to remember nice warm sunny days and interesting MySQL bugs reported back then.
To summarize:
  1. We still see some strange "games" played during bugs processing and trend to decrease severity of reports. I think this is a waste of time for both Oracle engineers and community bug reporters.
  2. I am still not sure if Oracle's QA does not use ASan or just ignore problems reported for MTR test cases. Anyway, Percona engineers do this for them, and report related bugs :)
  3. dbdeployer and sysbench are really popular among MySQL bug reporters recently!
  4. Importing of InnoDB tablespaces is broken in MySQL 8.0.17+ at least.
  5. There are many interesting MySQL bugs reported during last 3 months, so I epxect more posts in this series soon.

Sunday, November 17, 2019

Dynamic Tracing of MySQL Server with Timestamps Using gdb

Some time ago I wanted a customer to trace some MariaDB function execution and make sure that when it is executed I get both timestamp of execution and some of the arguments printed into some log file. Our InnoDB guru in MariaDB, Marko Mäkelä, suggested to use gdb, set breakpoint on the function and use its command ... end syntax to print whatever we needed, and log the output.

Adding a timestamp to each breakpoint "command" execution was the next step, and for this I've suggested to use plain "shell date" command. Redirecting output to the same file gdb uses to log everything indeed did the trick. Let me document it here for readers and myself. I'll consider the example of logging SQL queries I've used in the previous blog post on dynamic tracing with perf and use recent Percona Server 5.7.28-31 on Ubuntu 16.04 for tests.

Skipping the details already explained in that previous post, let's assume I want to set a breakpoint at the dispatch_command() function and print the value of com_data->com_query.query every time it hit it. This is what I did with gdb after attaching it to the proper mysqld process:
...
[New LWP 6562]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set height 0
(gdb) set log on
Copying output to gdb.txt.
(gdb) b dispatch_command
Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc, line 1254.
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>shell date >> ./gdb.txt
>p com_data->com_query.query
>continue
>end
(gdb) continue
Continuing.

Key detail here is to use command and refer to the breakpoint by number, 1, and then append the output of the date command to the gdb.txt file in the current directory that is used by default to log the gdb output.

Then in another shell I executed the following:
openxs@ao756:~$ mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28-31-log Percona Server (GPL), Release '31', Revision 'd14ef86'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.04 sec)

mysql> select 2;
+---+
| 2 |
+---+
| 2 |
+---+
1 row in set (0.04 sec)

mysql> shutdown;
Query OK, 0 rows affected (0.04 sec)

mysql> quit
Bye

I had the output in the gdb window, but what's more important is that I've got it in the gdb.txt file as well. The content looked as follows:
openxs@ao756:~$ ls -l gdb.txt
-rw-r--r-- 1 root root 7123 лис 17 19:21 gdb.txt
openxs@ao756:~$ cat gdb.txt
Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc, line 1254.
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
Continuing.
[New Thread 0x7fc566411700 (LWP 6611)]
[Switching to Thread 0x7fc566411700 (LWP 6611)]

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (thd=thd@entry=
    0x7fc548631000, com_data=com_data@entry=0x7fc566410da0, command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc: No such file or directory.
неділя, 17 листопада 2019 19:20:47 +0200
$1 = 0x7fc54865b021 "show databases"
Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:20:47 +0200
$2 = 0x7fc54865b021 "show tables"
...
Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:20:47 +0200
$9 = 0x7fc54865b021 "select @@version_comment limit 1"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:21:01 +0200
$10 = 0x7fc54865b021 "select 1"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:21:08 +0200
$11 = 0x7fc54865b021 "select 2"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:21:12 +0200
$12 = 0x7fc54865b021 "shutdown"
[Thread 0x7fc566411700 (LWP 6611) exited]

Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fc56645b780 (LWP 6525)]
0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
Detaching from program: /usr/sbin/mysqld, process 6525
openxs@ao756:~$
That's it. For basic tracing of any function calls in MySQL or MariaDB server (or any binary with debuginfo available) including timestamps and any arguments, variables or expressions printed you do not strictly need anything but gdb. With different format options for date command you can format timestamp any way you need, for example, you can remove day and month names etc and get even nanosecond precision if you prefer:
openxs@ao756:~$ for i in `seq 1 20`; do sleep 0.1; date +'%H:%m:%S.%N'; done
20:11:41.126205172
20:11:41.230087681
...
20:11:42.996521734
20:11:43.100229347
openxs@ao756:~$
* * * 
Sometimes event basic tools allow to get useful results. This photo was made with my dumb Nokia phone while running. Just a stop for a moment. Nice and useful result, IMHO, same as with gdb trick discussed here.
To summarize, for basic dynamic tracing with timestamps and arbitrary information printed you can use a command line debugger, like gdb. In some cases this simple approach is quite useful.

This kind of tracing comes with a cost, both in terms of performance impact (awful) and additional steps to parse the output (we do not really care about all that breakpoint details, we justneed a timestamp and some values printed). eBPF and related dynamic tracing tools (bcc trace and bpftrace) may help with both problems, as I am going to demonstrate in upcoming blog posts. Stay tuned!

Sunday, November 10, 2019

Time in Performance Schema

I've seen questions like this:
"Is there a way to know when (date and time) the last statement captured in ... was actually ran?"
more than once in some discussions (and customer issues) related to Performance Schema. I've seen answers provided by my colleagues and me after some limited testing. I've also noticed statements that it may not be possible.

Indeed, examples of wall clock date and time in the output of queries from the performance_schema are rare (and usually come from tables in the information_schemasys.format_time() function converts time to a nice, human readable format, but it still remains relative - it is not a date and time when something recorded in performance_schema happened.

In this post I'd like to document the answer I've seen and have in mind (and steps to get it) here, to save time for readers and myself faced with similar questions in the future. I'll also show the problem with this answer that I've noticed after testing for more than few minutes.

Let's start with simple setup of testing environment. In my case it is good old MariaDB 10.3.7 running on this netbook under Windows. First, let's check if Performance Schema is enabled:
 MariaDB [test]> select version(), @@performance_schema;
+--------------------+----------------------+
| version()          | @@performance_schema |
+--------------------+----------------------+
| 10.3.7-MariaDB-log |                    1 |
+--------------------+----------------------+
1 row in set (0.233 sec)
Then let's enable recording of time for everything and enable all consumers:
MariaDB [test]> update performance_schema.setup_instruments set enabled='yes', timed='yes';
Query OK, 459 rows affected (0.440 sec)
Rows matched: 707  Changed: 459  Warnings: 0

MariaDB [test]> update performance_schema.setup_consumers set enabled='yes';
Query OK, 8 rows affected (0.027 sec)
Rows matched: 12  Changed: 8  Warnings: 0
Now we can expect recently executed statements to be recorded, like this:
MariaDB [test]> select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current\G*************************** 1. row ***************************
      now(): 2019-11-03 17:42:51
   event_id: 46
timer_start: 22468653162059216
  timer_end: 22468697203533224
   sql_text: select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current
1 row in set (0.045 sec)
Good, but how we can get a real time when the statement was executed (like now() reports)? We all know from the fine manual that timer_start and timer_end values are in "picoseconds". So we can easily convert them into seconds (or whatever units we prefer):
MariaDB [test]> select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current\G
*************************** 1. row ***************************
                    now(): 2019-11-03 17:54:02
                 event_id: 69
timer_start/1000000000000: 23138.8159
                 sql_text: select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current
1 row in set (0.049 sec)
This value is related startup time one might assume, and we indeed can expect that timer in Performance Schema is initialized at some very early stage of startup. But how to get date and time of server startup with SQL statement?

This also seems to be easy, as we have a global status variable called Uptime measured in seconds. Depending on fork and version used we can get the value of Uptime either from the Performance Schema (in MySQL 5.7+) or from the Information Schema (in MariaDB and older MySQL versions). For example:
MariaDB [test]> select variable_value from information_schema.global_status where variable_name = 'Uptime';
+----------------+
| variable_value |
+----------------+
| 23801          |
+----------------+
1 row in set (0.006 sec)
So, server startup time is easy to get with a date_sub() function:
MariaDB [test]> select @start := date_sub(now(), interval (select variable_value from information_schema.global_status where variable_name = 'Uptime') second) as start;
+----------------------------+
| start                      |
+----------------------------+
| 2019-11-03 11:28:18.000000 |
+----------------------------+
1 row in set (0.007 sec)
In the error log of MariaDB server I see:
2019-11-03 11:28:18 0 [Note] mysqld (mysqld 10.3.7-MariaDB-log) starting as process 5636 ...
So, I am sure the result is correct. Now, if we use date_add() to add timer value converted to seconds, for example to the server startup time, we can get the desired answer, date and time when the statement recorded in performance_schema was really executed:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 657
                  ts: 2019-11-03 18:24:00.501654
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:05
timediff(now(), @ts): 00:00:04.498346
1 row in set (0.002 sec)
I was almost ready to publish this blog post a week ago, before paying more attention to the result (that used to be perfectly correct in earlier simple tests) and executing a variation of statement presented above. The problem I noticed is that when Uptime of the server is not just few minutes (as it often happens in quick test environments), but hours or days, timestamp that we get for a recent event from the performance_schema using the approach suggested may differ from current timestamp notably (we see 4.5+ seconds difference highlighted above). Moreover, this difference seem to fluctuate:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 682
                  ts: 2019-11-03 18:24:01.877763
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:07
timediff(now(), @ts): 00:00:05.122237
1 row in set (0.002 sec)
and tend to grow with Uptime. This make the entire idea of converting timer_start and timer_end Performance Schema "counters" in "picoseconds" questionable and unreliable for the precise real timestamp matching and comparing with other timestamp information sources in production.
Same as with this photo of sunset at Brighton taken with my Nokia dumb phone back in June, I do not see a clear picture of time measurement in Performance Schema...

After spending some more time thinking about this I decided to involve MySQL team somehow and created the feature request, Bug #97558 - "Add function (like sys.format_time) to convert TIMER_START to timestamp", that ended up quickly "Verified" (so I have small hope that I had not missed anything really obvious - correct me if I am wrong). I'd be happy to see further comments there and, surely, the function I asked about implemented. But I feel there is some internal problem with this and some new feature at server side may be needed to take the "drift" of time in Performance Schema into account.

There is also a list of currently open questions that I may try to answer in followup posts:
  1. Is the problem of time drift I noticed a MariaDB 10.3.7-specific, or recent MySQL 5..x and 8.0.x are also affected?
  2. Is this difference in time growing monotonically with time or really fluctuating?
  3. When exactly Performance Schema time "counter" starts, where is it in the code?
  4. Are there any other, better or at least more precise and reliable ways to get timestamps of some specific events that happen during MySQL server work? I truly suspect that gdb and especially dynamic tracing on Linux with tools like bpftrace may give us more reliable results...

Monday, October 7, 2019

Dynamic Tracing of MariaDB Server With bpftrace - Basic Example

Unlike the previous post, this one is not just a comment to some slides from the "Tracing and Profiling MySQL" talk at Percona Live Europe 2019. I am going to add the details that were missing there (as I was in a hurry and had forgotten to copy/paste proper outputs while testing). I am going to show how to add dynamic probe with "latest and greatest" bpftrace tool.

The goal is the same as before - try to add dynamic probe(s) to trace query execution. More specifically, to capture text of the queries executed by clients of MySQL server. As bpftrace requires new kernel and just does not work on Ubuntu 16.04, for the demonstration I use my Fedora 29 box with kernel 5.2.x and, for a change, get queries from Fedora's own MariaDB 10.3.17 installed from rpm packages there.

I have both bcc and bpftrace installed also from packages:
[openxs@fc29 ~]$ rpm -qa | grep bcc
python3-bcc-0.8.0-1.fc29.x86_64
bcc-0.8.0-1.fc29.x86_64
bcc-tools-0.8.0-1.fc29.x86_64
[openxs@fc29 ~]$ rpm -qa | grep bpf
bpftrace-0.9-0.fc29.x86_64
[openxs@fc29 ~]$
You can check fine manual for the details, but even basic -h option provides enough to start, as long as you already know some terms and probes syntax:
[openxs@fc29 ~]$ bpftrace -h
USAGE:
    bpftrace [options] filename
    bpftrace [options] -e 'program'

OPTIONS:
    -B MODE        output buffering mode ('line', 'full', or 'none')
    -d             debug info dry run
    -dd            verbose debug info dry run
    -e 'program'   execute this program
    -h             show this help message
    -l [search]    list probes
    -p PID         enable USDT probes on PID
    -c 'CMD'       run CMD and enable USDT probes on resulting process
    -v             verbose messages
    --version      bpftrace version

ENVIRONMENT:
    BPFTRACE_STRLEN           [default: 64] bytes on BPF stack per str()
    BPFTRACE_NO_CPP_DEMANGLE  [default: 0] disable C++ symbol demangling

EXAMPLES:
bpftrace -l '*sleep*'
    list probes containing "sleep"
bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
    trace processes calling sleep
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
    count syscalls by process name
In our case we need to define uprobe for proper mysqld binary and trace the dispatch_command() function. Before we start, note that parameters of dispatch_command() in MariaDB 10.3 are not the same as in Percona Server 5.7 I've used in the previous post. Basically, this function starts as follows in sql/sql_parse.cc:
   1570 bool dispatch_command(enum enum_server_command command, THD *thd,
   1571                       char* packet, uint packet_length, bool is_com_mult        i,
   1572                       bool is_next_command)
Note the third argument, packet. If the first argument, command. is SQL_QUERY, then packet contains the query text (as a zero-terminated string) for sure (it's also true for many other commands, but let me skip the details for now). That's why we'll use third argument in our uprobe to capture the SQL text.

Now, let's start the service and check the exact full path name for the mysql binary:
[openxs@fc29 ~]$ sudo service mariadb start
[sudo] password for openxs:
Redirecting to /bin/systemctl start mariadb.service
[openxs@fc29 ~]$ ps aux | grep mysqldmysql     9109  6.2  1.2 1699252 104108 ?      Ssl  09:30   0:00 /usr/libexec/mysqld --basedir=/usr
openxs    9175  0.0  0.0 215744   892 pts/0    S+   09:30   0:00 grep --color=auto mysqld
The first naive attempt to add the probe after cursory reading the documentation and checking few examples may look like this:
 [openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:dispatch_command { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
Could not resolve symbol: /usr/libexec/mysqld:dispatch_command
It seems in my case, unlike perf, bpftrace is not "aware" of C++ names or symbolic information in a separate -debuginfo package. So, I need mangled name:
[openxs@fc29 ~]$ nm -na /usr/libexec/mysqld | grep dispatch_command
nm: /usr/libexec/mysqld: no symbols
[openxs@fc29 ~]$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5190 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5190 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
Surely there is no symbols in the binary from Fedora package, so I checked the binary (of the same version) that I've built myself (as usual) and assumed that neither parameters nor mangling approach could be different. So, the next attempt to add dynamic probe would look as follows:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
show databases
show tables
t1
select @@version_comment limit 1
select user, host from mysql.user
^C
It worked and you see above the output I've got for the following session:
[openxs@fc29 ~]$ mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.3.17-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select user, host from mysql.user;
+------+-----------+
| user | host      |
+------+-----------+
| root | 127.0.0.1 |
| root | ::1       |
| root | fc29      |
| root | localhost |
+------+-----------+
4 rows in set (0.000 sec)
You can see some SQL statements generated when mysql command line connects, as well as packet value in some other packet ("t1") than COM_QUERY, probably. My probe had not even tried to check other parameters besides the supposed query text.

Now, the probe is defined on uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb - I've just used long, mangled version of the function name and full path name to the binary, and defined a dynamic probe (uprobe). There is no filter and the action for the probe is defined as { printf("%s\n", str(arg2)); } - that is, I print third argument (they are numbered starting from zero, arg0, arg1, arg2, ...) as a zero-terminated string. Without str() built in function I'd get just a pointer that could be printed as (unsigned) long integer, u64.

Basically, that's all. We have a quick and dirty way to capture all queries. No timing or anything, but it all depends on probe action that can use numerous built in variables and functions.

More "advanced" use of bpftrace, a lame attempt to capture time to execute query, may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql = str(arg2); @start[@sql] = nsecs; }
uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[@sql] != 0/ { printf("%s : %u64 ms\n", @sql, (nsecs - @start[@sql])/1000000); } '

Attaching 2 probes...
select sleep(3) : 300064 ms
select sleep(1) : 100064 ms
^C

@sql: select sleep(1)


@start[select sleep(3)]: 10666558704666
@start[select sleep(1)]: 10685614895675


[openxs@fc29 ~]$
In this case I try to store time since probe start into the associative array with query text as an "index" and start time (in nanoseconds) as a value.Then I calculate the difference from current nsecs value upon function return, in a separate uretprobe. I've used global variables, @sql for the query text, and @start[] for the array. It even seems to work well for a single threaded load based on the above. But as soon as I try to use multiple concurrent threads:
[openxs@fc29 ~]$ for i in `seq 1 4`; do mysql -uroot test -e"select sleep($i)" & done
it becomes clear that global variables are really global and my outputs are all wrong.

A bit better version may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql[tid] = str(arg2); @start[tid] = nsecs; }                                                                               uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[tid] != 0/ { printf("%s : %u64 %u64 ms\n", @sql[tid], tid, (nsecs - @start[tid])/1000000); } '
Attaching 2 probes...
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120864 300064 ms
 : 1120864 064 ms
select sleep(4) : 1120664 400064 ms
 : 1120664 064 ms
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120664 300064 ms
 : 1120664 064 ms
select sleep(4) : 1120864 400064 ms
 : 1120864 064 ms
^C

@sql[9490]:
@sql[11206]:
@sql[11207]:
@sql[11208]:


@start[11207]: 13609305005933
@start[9490]: 13610305621499
@start[11206]: 13611305753596
@start[11208]: 13612305235313


[openxs@fc29 ~]$
The output is for both sequential and concurrent execution of queries. I've used two associative arrays, @sql[] for queries and @start[] for start times, both indexed by tid - built in variable for thread id, that should not change, at least until we use pool-of-threads... You can also see that the tool by default outputs the content of all global associate arrays at the end, unless we free them explicitly.

* * *
This image of bpftrace internals is taken from the Brendan Gregg's post
bpftrace commands may be much more complicated and it may make sense to to store them in a separate file. The tool is near ideal of "quick and dirty" tests, and one day I'll write a way more complete posts with much better examples. But having a way to capture, filter and summarize queries in kernel space and send only the relevant results to a user space, all that in a safe manner, is really cool!