Monday, October 7, 2019

Dynamic Tracing of MariaDB Server With bpftrace - Basic Example

Unlike the previous post, this one is not just a comment to some slides from the "Tracing and Profiling MySQL" talk at Percona Live Europe 2019. I am going to add the details that were missing there (as I was in a hurry and had forgotten to copy/paste proper outputs while testing). I am going to show how to add dynamic probe with "latest and greatest" bpftrace tool.

The goal is the same as before - try to add dynamic probe(s) to trace query execution. More specifically, to capture text of the queries executed by clients of MySQL server. As bpftrace requires new kernel and just does not work on Ubuntu 16.04, for the demonstration I use my Fedora 29 box with kernel 5.2.x and, for a change, get queries from Fedora's own MariaDB 10.3.17 installed from rpm packages there.

I have both bcc and bpftrace installed also from packages:
[openxs@fc29 ~]$ rpm -qa | grep bcc
[openxs@fc29 ~]$ rpm -qa | grep bpf
[openxs@fc29 ~]$
You can check fine manual for the details, but even basic -h option provides enough to start, as long as you already know some terms and probes syntax:
[openxs@fc29 ~]$ bpftrace -h
    bpftrace [options] filename
    bpftrace [options] -e 'program'

    -B MODE        output buffering mode ('line', 'full', or 'none')
    -d             debug info dry run
    -dd            verbose debug info dry run
    -e 'program'   execute this program
    -h             show this help message
    -l [search]    list probes
    -p PID         enable USDT probes on PID
    -c 'CMD'       run CMD and enable USDT probes on resulting process
    -v             verbose messages
    --version      bpftrace version

    BPFTRACE_STRLEN           [default: 64] bytes on BPF stack per str()
    BPFTRACE_NO_CPP_DEMANGLE  [default: 0] disable C++ symbol demangling

bpftrace -l '*sleep*'
    list probes containing "sleep"
bpftrace -e 'kprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'
    trace processes calling sleep
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
    count syscalls by process name
In our case we need to define uprobe for proper mysqld binary and trace the dispatch_command() function. Before we start, note that parameters of dispatch_command() in MariaDB 10.3 are not the same as in Percona Server 5.7 I've used in the previous post. Basically, this function starts as follows in sql/
   1570 bool dispatch_command(enum enum_server_command command, THD *thd,
   1571                       char* packet, uint packet_length, bool is_com_mult        i,
   1572                       bool is_next_command)
Note the third argument, packet. If the first argument, command. is SQL_QUERY, then packet contains the query text (as a zero-terminated string) for sure (it's also true for many other commands, but let me skip the details for now). That's why we'll use third argument in our uprobe to capture the SQL text.

Now, let's start the service and check the exact full path name for the mysql binary:
[openxs@fc29 ~]$ sudo service mariadb start
[sudo] password for openxs:
Redirecting to /bin/systemctl start mariadb.service
[openxs@fc29 ~]$ ps aux | grep mysqldmysql     9109  6.2  1.2 1699252 104108 ?      Ssl  09:30   0:00 /usr/libexec/mysqld --basedir=/usr
openxs    9175  0.0  0.0 215744   892 pts/0    S+   09:30   0:00 grep --color=auto mysqld
The first naive attempt to add the probe after cursory reading the documentation and checking few examples may look like this:
 [openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:dispatch_command { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
Could not resolve symbol: /usr/libexec/mysqld:dispatch_command
It seems in my case, unlike perf, bpftrace is not "aware" of C++ names or symbolic information in a separate -debuginfo package. So, I need mangled name:
[openxs@fc29 ~]$ nm -na /usr/libexec/mysqld | grep dispatch_command
nm: /usr/libexec/mysqld: no symbols
[openxs@fc29 ~]$ nm -na /home/openxs/dbs/maria10.3/bin/mysqld | grep dispatch_command
00000000004a1eef t _Z16dispatch_command19enum_server_commandP3THDPcjbb.cold.344
00000000005c5190 T _Z16dispatch_command19enum_server_commandP3THDPcjbb
00000000005c5190 t _Z16dispatch_command19enum_server_commandP3THDPcjbb.localalias.256
Surely there is no symbols in the binary from Fedora package, so I checked the binary (of the same version) that I've built myself (as usual) and assumed that neither parameters nor mangling approach could be different. So, the next attempt to add dynamic probe would look as follows:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { printf("%s\n", str(arg2)); }'
Attaching 1 probe...
show databases
show tables
select @@version_comment limit 1
select user, host from mysql.user
It worked and you see above the output I've got for the following session:
[openxs@fc29 ~]$ mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 19
Server version: 10.3.17-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select user, host from mysql.user;
| user | host      |
| root | |
| root | ::1       |
| root | fc29      |
| root | localhost |
4 rows in set (0.000 sec)
You can see some SQL statements generated when mysql command line connects, as well as packet value in some other packet ("t1") than COM_QUERY, probably. My probe had not even tried to check other parameters besides the supposed query text.

Now, the probe is defined on uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb - I've just used long, mangled version of the function name and full path name to the binary, and defined a dynamic probe (uprobe). There is no filter and the action for the probe is defined as { printf("%s\n", str(arg2)); } - that is, I print third argument (they are numbered starting from zero, arg0, arg1, arg2, ...) as a zero-terminated string. Without str() built in function I'd get just a pointer that could be printed as (unsigned) long integer, u64.

Basically, that's all. We have a quick and dirty way to capture all queries. No timing or anything, but it all depends on probe action that can use numerous built in variables and functions.

More "advanced" use of bpftrace, a lame attempt to capture time to execute query, may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql = str(arg2); @start[@sql] = nsecs; }
uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[@sql] != 0/ { printf("%s : %u64 ms\n", @sql, (nsecs - @start[@sql])/1000000); } '

Attaching 2 probes...
select sleep(3) : 300064 ms
select sleep(1) : 100064 ms

@sql: select sleep(1)

@start[select sleep(3)]: 10666558704666
@start[select sleep(1)]: 10685614895675

[openxs@fc29 ~]$
In this case I try to store time since probe start into the associative array with query text as an "index" and start time (in nanoseconds) as a value.Then I calculate the difference from current nsecs value upon function return, in a separate uretprobe. I've used global variables, @sql for the query text, and @start[] for the array. It even seems to work well for a single threaded load based on the above. But as soon as I try to use multiple concurrent threads:
[openxs@fc29 ~]$ for i in `seq 1 4`; do mysql -uroot test -e"select sleep($i)" & done
it becomes clear that global variables are really global and my outputs are all wrong.

A bit better version may look like this:
[openxs@fc29 ~]$ sudo bpftrace -e 'uprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb { @sql[tid] = str(arg2); @start[tid] = nsecs; }                                                                               uretprobe:/usr/libexec/mysqld:_Z16dispatch_command19enum_server_commandP3THDPcjbb /@start[tid] != 0/ { printf("%s : %u64 %u64 ms\n", @sql[tid], tid, (nsecs - @start[tid])/1000000); } '
Attaching 2 probes...
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120864 300064 ms
 : 1120864 064 ms
select sleep(4) : 1120664 400064 ms
 : 1120664 064 ms
select sleep(1) : 1120764 100064 ms
 : 1120764 064 ms
select sleep(2) : 949064 200064 ms
 : 949064 064 ms
select sleep(3) : 1120664 300064 ms
 : 1120664 064 ms
select sleep(4) : 1120864 400064 ms
 : 1120864 064 ms


@start[11207]: 13609305005933
@start[9490]: 13610305621499
@start[11206]: 13611305753596
@start[11208]: 13612305235313

[openxs@fc29 ~]$
The output is for both sequential and concurrent execution of queries. I've used two associative arrays, @sql[] for queries and @start[] for start times, both indexed by tid - built in variable for thread id, that should not change, at least until we use pool-of-threads... You can also see that the tool by default outputs the content of all global associate arrays at the end, unless we free them explicitly.

* * *
This image of bpftrace internals is taken from the Brendan Gregg's post
bpftrace commands may be much more complicated and it may make sense to to store them in a separate file. The tool is near ideal of "quick and dirty" tests, and one day I'll write a way more complete posts with much better examples. But having a way to capture, filter and summarize queries in kernel space and send only the relevant results to a user space, all that in a safe manner, is really cool!

Sunday, October 6, 2019

Dynamic Tracing of MySQL Server With perf probe - Basic Example

I am going to write a series of blog posts based on my talks and experiences at Percona Live Europe 2019. The first one would be a kind of extended comment for a couple of slides from the "Tracing and Profiling MySQL" talk.

We can surely wait until Performance Schema instruments every other line of code or at least every important stage and wait in every storage engine we care about, but there is no real need for that. If you run any version of MySQL under Linux with more or less recent kernel (anything newer than 4.1 is good enough, in general), you can easily use dynamic tracing for any application (at least if there is symbolic information for the binaries), any time. As Brendan Gregg put it here:
"One benefit of dynamic tracing is that it can be enabled on a live system without restarting anything. You can take an already-running kernel or application and then begin dynamic instrumentation, which (safely) patches instructions in memory to add instrumentation. That means there is zero overhead or tax for this feature until you begin using it. One moment your binary is running unmodified and at full speed, and the next, it's running some extra instrumentation instructions that you dynamically added. Those instructions should eventually be removed once you've finished using your session of dynamic tracing."
One of the ways to use dynamic tracing (that is supported for a long time) is a perf profiler and its probe command. In the simplest case that I am going to illustrate here, probe is defined for a function defined in the code and refers to it by name. You can refer to the name of local variable, function parameter, local data structure member in the probe etc, and record the values of them with other probe data.

For a simple example let me consider recent Percona Server 5.7.x running on recent Ubuntu 16.04 with kernel 4.4.x. Let's assume I want to trace all calls to the dispatch_command() function and record every query every connection processes that way.

Skipping the details for now, let's assume I've found out (with gdb in my case, but it can be code review as well) that when this function is called I can see the query user wants to execute in the com_data structure passed via a pointer to the function:
(gdb) p com_data->com_query.query
$4 = 0x7fb0dba8d021 "select 2"
Based on this information and having -dbg package also installed for Percona Server I can add a probe dynamically any time using the following simple command (--add option is assumed by default):
openxs@ao756:~$ sudo perf probe -x /usr/sbin/mysqld 'dispatch_command com_data->com_query.query:string'
Added new event:
  probe_mysqld:dispatch_command (on dispatch_command in /usr/sbin/mysqld with query=com_data->com_query.query:string)

You can now use it in all perf tools, such as:

        perf record -e probe_mysqld:dispatch_command -aR sleep 1
In this probe I refer to the specific binary with -x option and full path name, and the function in that binary by name, and I say that I'd like to record the value of com_data->com_query.query as a zero-terminated string. Now I can use any variation of perf record command (with -F option to define sampling frequency, -g option to capture stack traces etc, see more here) and my probe will be one of the events captured.

For this simple example of tracing I'll use -e option to capture only the events related to the probe I defined. Probe name for this simple case by default consists of the binary name, colon (':') separator and function name. I'll use -R option to collect raw sample records and . I've also added -a option to collect samples on all CPUs. You can see the hint for possible command in the output above. 

So, I can record related events with default frequency as follows:
openxs@ao756:~$ sudo perf record -e 'probe_mysqld:dispatch_command*' -aR
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.676 MB (3 samples) ]
I let it work for some time in the foreground and then pressed Ctrl-C to stop collecting. Now I can check raw sample records with perf script command:
openxs@ao756:~$ sudo perf script >/tmp/queries.txt
openxs@ao756:~$ cat /tmp/queries.txt
          mysqld 31340 [001]  3888.849079: probe_mysqld:dispatch_command: (be9250) query="select 100"
          mysqld 31340 [001]  3891.648739: probe_mysqld:dispatch_command: (be9250) query="select user, host from mysql.user"
          mysqld 31340 [001]  3895.890141: probe_mysqld:dispatch_command: (be9250) query="select 2"
This is the detailed trace, with additional information (exact text of the query executed) added as requested. Output also included PID of the binary, CPU the sample was taken from and a timestamp.

When I am done with tracing, I can delete the probe with --del option referring it by name:
openxs@ao756:~$ sudo perf probe --del dispatch_commandRemoved event: probe_mysqld:dispatch_command
The (small, more on that later) overhead for tracing was added dynamically, only for the exact information I needed and only for the period of tracing. After the dynamic probe is removed we have exactly the same binary as originally started running with zero extra overhead. Now do this with Performance Schema :)

* * *
Slides are available at

More details on the way other tools mentioned during the talk can be used by MySQL DBAs are coming soon in this blog. Stay tuned!

Saturday, August 31, 2019

Fun with Bugs #89 - On MySQL Bug Reports I am Subscribed to, Part XXIII

I have to celebrate the anniversary of my last day in Oracle (that was 7 years ago!) somehow, and I think writing yet another blog post about Oracle MySQL bugs is a good way to do this. I am actually surprised (and happy) that public bugs database is still alive, maintained and considered important in Oracle, and I know who in Oracle was working hard all these years for this to happen!

In my previous post in this series I've stopped on Bug #95954 and had not completed review of interesting MySQL bug reports that I've subscribed to in June 2019. So, below I start with the next bug in my list, complete review for June and cover some bugs reported in July. There were many.
  • Bug #95957 - "IN operator issue when comparing signed column and the column cast to unsigned". This bug was reported by Manuel Rigger. As far as I can see, MariaDB 10.3 is not affected:
    MariaDB [test]> CREATE TABLE t0(c0 INT);
    Query OK, 0 rows affected (0.518 sec)

    MariaDB [test]> INSERT INTO t0(c0) VALUES(-1);
    Query OK, 1 row affected (0.196 sec)

    MariaDB [test]> SELECT t0.c0 IN (1, CAST(t0.c0 AS UNSIGNED)) from t0;
    | t0.c0 IN (1, CAST(t0.c0 AS UNSIGNED)) |
    |                                     0 |
    1 row in set, 1 warning (0.207 sec)

    MariaDB [test]> show warnings\G
    *************************** 1. row ***************************
      Level: Note
       Code: 1105
    Message: Cast to unsigned converted negative integer to it's positive complement

    1 row in set (0.013 sec)
  • Bug #96001 - "No warning when creating foreign key in MyISAM tables". I am really surprised that this "documented feature" was accepted as a S3 bug when reported by Przemyslaw Malkowski from Percona. But this happened. I also prefer NOT to have unsupported syntax just ignored silently and would like to see a warning (or error in strict mode).
  • Bug #96002 - "'variable log_bin_trust_function_creators' -variable is "hidden"." Consistency is important, as well as correct documentation. So I was happy to see this bug report from Peter Laursen. 75(!) of his bug reports are still active, by the way, some were reported more than 10 years ago. I wonder if anyone is going to check (if not fix) them any time soon.
  • Bug #96079 - "large_tests.innodb_innochecksum_3gb test failing with debug build." Yet another MTR test failure reported by Lalit Choudhary from Percona.
  • Bug #96100 - "Generated column cause a heap-use-after-free error". Probably ASan builds are not tested as carefully in Oracle as they are by some community members, like Zkong Kong who reported this bug. Otherwise they would mark this bug report as a duplicate of some known internal bug.
  • Bug #96108 - "To run mtr "innodb.log_flush_order" MySql Server would be always core down". I've listed the bug as an example that even though code modification is needed to reproduce the crash (reported by Juncai Meng) literally, it was accepted and "Verified". In other reports this sometimes not happen, and the point is that it is surely not a rule carved in stone in Oracle to NOT accept bugs if test case involves code modification. Remember that and fight for your reports if needed.
  • Bug #96128 - "Doc: documentation is inaccurate when InnoDB starts with innodb_read_only". Correct manual matters a lot, so nice to have it corrected in this case by Calvin Sun.
  • Bug #96134 - "Please provide control functions for the IO Thread." I'd be also happy to see a way to control replication I/O thread progress and read logs only up to some position or GTID, per channel. Thanks Jean-François Gagné for this nice feature request.
  • Bug #96142 - "Inconsistent error on slave for Update event on table with non-exists partition". Yet another by report from by Lalit Choudhary. Good to see multiple versions check.
  • Bug #96148 - "using Invisible Index when slave apply EVENT". This bug was reported by Songlei Wang. Consistency matters, so if the index is invisible it should remain invisible for replication SQL thread as well. See also his another Bug #96150 - "'show slave status' show the Inaccurate Last_IO_Error message".
  • Bug #96167 - "Many header files now missing from devel package". As noted by Manuel Ung, now it is impossible to build plugins and UDFs unless users download the source tree, and then copy the headers to the appropriate places. Packaging in hard.
  • Bug #96178 - "mysqldump leaks memory when selected tables are dumped with --order-by-primary". Abhinav Sharma proposed a simple MTR test case to run on ASan build, and suggested a fix. Very nice bug report. Unfortunately I do not see any statements about the results of checking MySQL 8.0.x.
  • Bug #96192 - "Possible race condition with binlog-transaction-dependency-tracking". Bug reporter, Herman Lee, complained about one place in the code where race condition may happen even after the fix for one MySQL bug. Does it really matter for bug verification if he found more places? I fail to see a reason to keep he bug in "Need Feedback" status, when code review is enough to confirm there is a problem in that one part of the code clearly identified.
  • Bug #96196 - "performance_schema_accounts_size and p_s_hosts_size limited by 16384", Nice bug that can be confirmed by code review or just opening many connections was reported by Nikolai Ikhalainen from Percona. Autoscaling is broken/limited in this case.
  • Bug #96340 - "Slow startup for mysql 8.0 with many tables due to the tablespace files scan". This is actually a regression comparing to 5.7, but it's visible on slow disks. It's nice to see useful discussion, explanations and patch suggested (by Sunny Bains) in this bug report created by Lalit Choudhary.
I started this summer in a beautiful Barcelona. This blog posts ends it for me. One of the good changes leaving Oracle seven years ago introduced in my life is a real freedom to work from anywhere and travel as often as I really want, both for work and for fun.
To summarize:
  1. I am happy to see MySQL public bugs database still up and widely used, even though I do not work on it directly for 7 years already. It's a key service and media for cooperation for the entire MySQL Community! Just check how it happens in Bug #96340.
  2. Consistency matters.
  3. I still see cases when the time is wasted at bugs verification stage.
  4. MySQL fine manual still have details explained incorrectly, and this is unfortunate.
  5. Sometime I wonder why Percona engineers and other MySQL Community users manage to find even MTR test failures faster than anyone in Oracle cares to report and fix them. They all know magic (like actually running all tests on debug builds and checking the results, maybe). Or, maybe, they care?
* * *
Shameless self-promotion at the end. First time since 2015 I am going to attend Percona Live conference and speak there. Ticket prices increase on September 1, so using code CMESPEAK-VALERII you’ll get the best deal right now.

Saturday, August 10, 2019

Fun with Bugs #88 - On MySQL Bug Reports I am Subscribed to, Part XXII

It's Saturday night. I have a week of vacation ahead that I am going to spend at home, working on things I usually do not have time for. I already did something useful (created a couple of test cases for MariaDB bugs here and there, among other things), so why not to get some fun and continue my previous review of recent interesting MySQL bug report...

So, here is the list of 15 more MySQL community bug reports that I've subscribed to back in May and June 2019:
  • Bug #95491 - "The unused wake up in simulator AIO of function reserve_slot". I am impressed by the fact that these days many people just study InnoDB code, find problematic parts, improve them, test and prove the improvement, and submit patches to Oracle. Chen Zongzhi, who reported this bug, is one of such great people.
  • Bug #95496 - "Replication of memory tables". As pointed out by Iwo P, MySQL manual describes what happens to MEMORY tables replicated from master to slave in details, including the case of master restart. But one detail is missing - when slave with MEMORY tables restarts, even with super_read_only set, it also generates even to delete data from the table into its own binary log, with GTID, and this leads to problems in case of failover.
  • Bug #95507 - "innodb_stats_method is not honored when innodb_stats_persistent=ON". This bug was reported by my colleague Sergei Petrunia. See also related MDEV-19574  (still "Open").
  • Bug #95547 - "MySQL restarts automatically after ABORT_SERVER due to disk full". I am not sure this is a replication bug, more a problem of too primitive mysqld_safe or systemd startup scripts that do not check free space on disk or reason why mysqld process committed suicide. Anyway, this report by Przemyslaw Malkowski from Percona was accepted as a replication bug.
  • Bug #95582 - "DDL using bulk load is very slow under long flush_list". As found and proved by Fungo Wang, due to optimized away redo log writing for bulk load, to make sure such DDL is crash safe, dirtied pages must be flushed to disk synchronously. Currently this is done inefficiently, as shown in this bug report. See also a year old (and still "Open") Bug #92099 - "provide an option to record redo log for bulk loading" from the same reporter, where he asks for a way to get redo logging back (but still benefit from other "online" ALTER features). MariaDB has that option after the fix of MDEV-16809.
  • Bug #95612 - "replication breakage due to Can not lock user management caches for processing". Simon Mudd noted that in some cases simple DROP USER IF EXISTS ... breaks parallel replication in MySQL 8.0.16. Good to know that START SLAVE allows to continue :)
  • Bug #95698 - "JSON columns are returned with binary charset". This bug was reported by Daniel Book. Note that according to the original issue, DBD::MariaDB (I had no idea this exists!) works properly with MariaDB 10.3 or 10.4. I had not checked.
  • Bug #95734 - "Failing SET statement in a stored procedure changes variables". This really weird and unexpected bug was reported by Przemysław Skibiński.
  • Bug #95863 - "MTS ordered commit may cause deadlock ". Great finding by Lalit Choudhary from Percona!
  • Bug #95895 - "Shutdown takes long time when table uses AUTOINC and FOREIGN KEY constraints". It was reported by Satya Bodapati from Percona, who later contributed as patch. See also his related Bug #95898 - "InnoDB releases memory of table only on shutdown if table has FK constraint". I truly hope to see these fixed in next minor releases of MySQL. Note that originally bug (PS-5639) was reported by Sveta Smirnova.
  • Bug #95928 - "InnoDB crashes on compressed and encrypted table when changing FS block size". Make sure to check and write original filesystem block size somewhere... This bug was reported by Sergei Glushchenko from Percona.
  • Bug #95929 - "Duplicate entry for key 'PRIMARY' when querying information_schema.TABLES". This nice feature of new data dictionary of MySQL 8.0.16 under concurrent load was found by Manuel Rigger.
  • Bug #95934 - "innodb_flush_log_at_trx_commit = 0 get performance regression on 8 core machine". If I had access to 8 cores box, I'd try to check this sysbench-based complete test case immediately. I should do this on my quadcore box at least, next week. But here we see some 12 days just wasted and then bug report from Chen Zongzh ended up "Analyzing", till today, with no further feedback.
  • Bug #95954 - "CAST of negative function return value to UNSIGNED malfunctions with BIGINT". Yet another weird bug found by Manuel Rigger. There are patches (separate for 5.6, 5.7 and 8.0) contributed by Oleksandr Peresypkin.
Montjuïc is a place to enjoy nice view of Barcelona and forget about MySQL bugs. So, I could miss some interesting ones at the end of May, 2019...
To summarize:
  1. Percona engineers still contribute a lot of useful and interesting MySQL bug reports, often - with patches.
  2. I am impressed by quality of MySQL bug reports in 2019. Often we see detailed analysis, proper complete test cases and patches.
  3. There is a recent fork of DBD::mysql for MariaDB (and MySQL), DBD::MariaDB!
  4. I badly need some new hardware for running benchmarks and tests from recent bug reports...
  5. I should spend more time in May in Barcelona :)

Sunday, August 4, 2019

Fun with Bugs #87 - On MySQL Bug Reports I am Subscribed to, Part XXI

After a 3 months long break I'd like to continue reviewing MySQL bug reports that I am subscribed to. This issue is devoted to bug reports I've considered interesting to follow in May, 2019:
  • Bug #95215 - "Memory lifetime of variables between check and update incorrectly managed". As demonstrated by Manuel Ung, there is a problem with all InnoDB MYSQL_SYSVAR_STR variables that can be dynamically updated. Valgrind allows to highlight it.
  • Bug #95218 - "Virtual generated column altered unexpectedly when table definition changed". This weird bug (that does not seem to be repeatable on MariaDB 10.3.7 with proper test case modifications like removing NOT NULL and collation settings from virtual column) was reported by Joseph Choi. Unfortunately we do not see any documented attempt to check if MySQL 8.0.x is also affected. My quick test shows MySQL 8.0.17 is NOT affected, but I'd prefer to see check copy/pasted as a public comment to the bug.
  • Bug #95230 - "SELECT ... FOR UPDATE on a gap in repeatable read should be exclusive lock". There are more chances to get a deadlock with InnoDB than one might expect... I doubt this report from Domas Mituzas is a feature request. It took him some extra efforts to insist on the point and get it verified even as S4.
  • Bug #95231 - "LOCK=SHARED rejected contrary to specification". This bug report from Monty Solomon ended up as a documentation request. The documentation and the implementation are not aligned, and it was decided NOT to change the parser to match documented syntax. But why it is still "Verified" then? Should it take months to correct the fine manual?
  • Bug #95232 - "The text of error message 1846 and the online DDL doc table should be updated". Yet another bug report from Monty Solomon. Some (but not ALL) partition specific ALTER TABLE operations do not yet support LOCK clause.
  • Bug #95233 - "check constraint doesn't consider IF function that returns boolean a boolean fun". As pointed out by Daniel Black, IF() function in a check constraint isn't considered a boolean type. He had contributed a patch to fix this, but based on comments it's not clear if it's going to be accepted and used "as is". The following test shows that MariaDB 10.3 is not affected:
    C:\Program Files\MariaDB 10.3\bin>mysql -uroot -proot -P3316 test
    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 9
    Server version: 10.3.7-MariaDB-log binary distribution

    Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    MariaDB [test]> create table t1 (source enum('comment','post') NOT NULL, comment_id int unsigned, post_id int unsigned);
    Query OK, 0 rows affected (0.751 sec)
    MariaDB [test]> alter table t1 add check(IF(source = 'comment', comment_id IS NOT NULL AND post_id IS NULL, post_id IS NOT NULL AND comment_id IS NULL));
    Query OK, 0 rows affected (1.239 sec)
    Records: 0  Duplicates: 0  Warnings: 0
  • Bug #95235 - "ABRT:Can't generate a unique log-filename binlog.(1-999), while rotating the bin". Yet another bug report from Daniel Black. When MySQL 8.0.16 is built with gcc 9.0.x abort is triggered in the MTR suite on the binlog.binlog_restart_server_with_exhausted_index_value test.
  • Bug #95249 - "stop slave permanently blocked". This bug was reported by Wei Zhao, who had contributed a patch.
  • Bug #95256 - "MySQL 8.0.16 SYSTEM USER can be changed by DML". MySQL 8.0.16 had introduced an new privilege, SYSTEM_USER. MySQL manual actually says:
    "The protection against modification by regular accounts that is afforded to system accounts by the SYSTEM_USER privilege does not apply to regular accounts that have privileges on the mysql system schema and thus can directly modify the grant tables in that schema. For full protection, do not grant mysql schema privileges to regular accounts."
    But the report that a user with a privilege to execute DML on the mysql.GLOBAL_GRANTS table from Zhao Jianwei was accepted and verified. I hope Oracle engineers will finally make up their mind and decide either to fix this or to close this report as "Not a bug". I've subscribed in a hope for some fun around this decision making.
  • Bug #95269 - "binlog_row_image=minimal causes assertion failure". This assertion failure happens in debug build when one of standard MTR test cases, rpl.rpl_gis_ddl or rpl.rpl_window_functions is executed with --binlog-row-image=minimal option. In such cases I always wonder what is the reason for a failure NOT to be noted by Oracle MySQL QA and somehow fixed before Community users notice it? Either they don't run tests on debug builds with all possible combinations, or do not care to fix such failures (and thus should suffer from known failures in other test runs). I do not like any of these options, honestly. The bug was reported by Song Libing.
  • Bug #95272 - "Potential InnoDB SPATIAL INDEX corruption during root page split". This bug was reported by Albert Hu based on Valgrind report when running the test innodb.instant_alter. Do they run MTR tests under on Valgrind or ASan builds in Oracle? I assume they do, but then why Community users are reporting such cases first? Note that related MariaDB's bug, MDEV-13942, is fixed in 10.2.24+ and 10.3.15+.
  • Bug #95285 - "InnoDB: Page [page id: space=1337, page number=39] still fixed or dirty". This assertion failure that happens during normal shutdown was reported by LUCA TRUFFARELLI. There are chances that this is a regression bug (without a regression tag), as it does not happen for reporter on MySQL 5.7.21.
  • Bug #95319 - "SHOW SLAVE HOST coalesces certain server_id's into one". This bug was reported by Lalit Choudhary from Percona based on original findings by Glyn Astill.
  • Bug #95416 - "ZERO Date is both NULL and NOT NULL". This funny bug report was submitted Morgan Tocker. Manual actually explains that it's intended behavior (MariaDB 10.3.7 works the same way as MySQL), but it's still funny and unexpected, and the bug report remains "Verified".
  • Bug #95478 - "CREATE TABLE LIKE does not honour ROW_FORMAT." I'd like to add "...when it was not defined explicitly for the original table". The problem was reported by Jean-François Gagné and ended up as a verified feature request. See also this my post on the details of where row_format is stored and is not stored for InnoDB tables...
  • Bug #95484 - "EXCHANGE PARTITION works wrong/werid with different ROW_FORMAT". Another bug report by Jean-François Gagné related to the previous one. He had shown that it's actually possible to get partitions with different row formats in the same InnoDB table in MySQL 5.7.26, but not in the most natural way. It seems the problem may be fixed in 5.7.27 (by the fix for another, internally reported bug), but the bug remains "Verified".
There are some more bugs reported in May 2019 that I was interested in, but let me stop for now. Later in May I've got a chance to spend some days off in Barcelona, without any single MySQL bug report opened for day.

I like this view of Barcelona way more than any MySQL bugs review, including this one.
To summarize:
  1. Oracle engineers who process bugs still sometimes do not care to check if all supported major versions are affected and/or share the results of such checks in public. Instead, some of them care to argue about severity of the bug report, test case details etc.
  2. We still see bug reports that originates from existing, well known MTR test cases runs under Valgrind or in debug builds with some non-default options set. I do not have any good reason in mind to explain why these are NOT reported by Oracle's internal QA first.
  3. Surely some regression bugs still get verified without the regression tag added.
I truly hope my talk "Problems with Oracle's Way of MySQL Bugs Database Maintenance" will be accepted for Percona Live Europe 2019 conference (at least as a lightning talk) and I'll get another chance to speak about the problems highlighted above, and more. There are some "metabugs" in the way Oracle handles MySQL bug report, and these should be discussed and fixed, for the benefits of MySQL quality and all MySQL users and customers.

Saturday, July 27, 2019

Fun with Bugs #86 - On Some Public Bugs Fixed in MySQL 5.7.27

This week new minor releases of MySQL server were announced. Of them all I am mostly interested in MySQL 5.7.27. I plan to concentrate on InnoDB, replication and optimizer bugs reported in public MySQL bugs database and fixed in MySQL 5.7.27. As usual I name bug reporters explicitly and give links to their remaining currently active bug reports, if any.

Let me start with InnoDB bug fixes:
  • Bug #94699 - "Mysql deadlock and bugcheck on aarch64 under stress test". This bug report with a fix for insufficient memory barriers in the rw-lock implementation was contributed by Cai Yibo.
  • Bug #93708 - "Page Cleaner will sleep for long time if clock changes". This bug caused long delays on shutdown. It was reported by Marcelo Altmann. It took some efforts from MySQL Community to have it accepted as a real bug, but now it's fixed!
  • Bug #67526 - "Duplicate key error on auto-inc PK with mixed auto_increment_increment clients". This regression bug was reported back in 2012 by Rob Lineweaver and affected all MySQL versions starting from 5.5.28. Release notes states that the fix was to revert the patch that fixed Bug #65225 - "InnoDB miscalculates auto-increment after changing auto_increment_increment" reported by  Elena Stepanova. I am not sure what had really happened without checking the source code or at least running test cases from both bugs on 5.7.27.
  • Bug #94383 - "simple ALTER cause unnecessary InnoDB index rebuilds, 5.7.23 or later 5.7 rlses". This bug was reported by Mikhail Izioumtchenko, who had contributed diagnostics patch.
  • Bug #93670 - "virtual generated column index data inconsistency". I list this bug that happens when foreign keys are involved as InnoDB one, as no other engine in MySQL 5.7.27 supports foreign keys anyway. The bug was reported by Rui Xu.
Now let me continue with replication bugs:
  • Bug #93771 - "Contribution: Add log_bin_implicit_delete setting". This report was created for the patch contributed by Daniël van Eeden (who had also got a public credit for this contribution in a blog post) and is now closed and documented as fixed. But only part of the original patch was used, adding a comment to the binary log. It's strange to see this happening without any comments on why the rest of the patch was not used.
  • Bug #93440 - "Noop UPDATE query is logged to binlog after read_only flag is set". The problem is actually in a new GTID being generated in this case. The bug was reported by Artem Danilov.
  • Bug #92398 - "point in time recovery using mysqlbinlog broken with temporary table -> errors". This bug was reported by Shane Bester himself. I do not see any complete test case in the public bug report, for some reason. But if you search for private bug number (28642318) at GitHub you can easily identify related commit and changes in the mysql-test/extra/binlog_tests/drop_temp_table.test.
  • Bug #89987 - "super_read_only with master_info_repository=TABLE breaks replication on restart". It was reported by Clemens Weiß and fixed in 5.7.27, but it took some efforts and public comment by Jean-François Gagné to get this clarified.
  • Bug #93395 - "ALTER USER succeeds on master but fails on slave". This bug was reported by Jean-François Gagné and is still marked as "Verified" and is NOT explicitly listed in the release notes. But read this:
    "CREATE USER and ALTER USER did not check the validity of a hashed authentication string when used with IDENTIFIED WITH auth_plugin AS 'hash_string' syntax. (Bug #29395944)"
    The description of the bug sounds very similar, and according to the release notes it's fixed in 5.7.27. How comes the bug above is NOT closed and listed as at least related? It seems somebody does NOT look for duplicates, either when copying public bugs into the internal bugs database, or when closing the bug as fixed. What we have as a result looks like a lack of proper documenting. As bug reporter noted in the comment:
    "This is the 2nd attribution omission I am finding in 5.7.27 release notes (the other one was on Bug#95484). Is this a new policy not to update public bugs with fixed version and not to mention the public bugs in the release notes ?"
    This situation is wrong. It looks awkward like this "house" in Brighton:

and attracts attention. I hope Oracle engineers will care to add public comments to these two bugs to clarify what really happened and why.

Finally a couple of optimizer bugs were also fixed:
  • Bug #92809 - "Inconsistent ResultSet for different Execution Plans". The bug was reported by Juan Arruti. It took a lot of efforts from several Percona engineers to force it to be verified. Complete analysis (by Yura Sorokin) identified the root cause and the fact that the problem is already fixed in MySQL 8.0.x while MySQL 5.6.x is also affected (and seems NOT to be fixed). Good job by MySQL Community, somewhat questionable assistance from Oracle's engineer involved.
  • Bug #90398 - "Duplicate entry for key '<group_key>' error". It was reported by Yoseph Phillips. The fix is actually a more useful error message, nothing more.
That's all I have to say about the release. To summarize:
  1. MySQL 5.7.27 includes fixes to several serious InnoDB and replication bugs, so consider upgrade.
  2. Percona engineers contribute a lot to MySQL, both in form of bug reports, patches and by helping other community users to make their point and get their bugs fixed fast.
  3. There are things to improve in a way Oracle engineers handle bugs processing (verification, checking for duplicates, proper documenting of public bug reports that are fixed).
Some items above are the same as in my summary for the previous 5.7.26 release, isn't it?

Sunday, May 26, 2019

MySQL Support Engineer's Chronicles, Issue #10

As promised, I am trying to write one blog post in this series per week. So, even though writing about InnoDB row formats took a lot of time and efforts this weekend, I still plan to summarize my findings, questions, discussions, bugs and links I've collected over this week.

I've shared two links this week on Facebook that got a lot of comments (unlike links to my typical blog posts). The first one was to Marko Mäkelä's blog post at, "InnoDB Quality Improvements in MariaDB Server". I do not see any comments (or any obvious way to comment) there, but the comments I've got at Facebook were mostly related to the statement that  
"We are no longer merging new MySQL features with MariaDB..."
noted in the text by Mark Callaghan and to the idea that "InnoDB" is a trademark of Oracle, so using it to refer to a fork (that is incompatible with the "upstream" InnoDB in too many ways since MariaDB 10.1 probably) is wrong, as stated by Matt Lord and Sunny Bains. People in the comments mostly agree that a new name makes sense (there are more reasons to give it now anyway than in the case of XtraDB by Percona), and we had a lot of nice and funny suggestions on Slack internally (FudDB was not among them, this is a registered trademark of Miguel Angel Nieto for many years already). We shell see how this may end up, but I would not be surprised by a new name announced soon. I suggest you to read comments in any case if you have a Facebook account, many of them are interesting.

The second link was to Kaj Arnö's post at, "On Contributions, Pride and Cockiness". It's worth checking just because of Monty's photo there. Laurynas Biveinis stated in the comments that any comparison of number of pull requests (open and processed) is meaningless when development model used by other parties is different (closed, with contributions coming mostly via bug reports in case of Oracle, or all changes, external and internal, coming via pull requests in case of Percona). MariaDB uses a mix of a kind, where some contributions from contractors come via pull requests, while engineers from MariaDB Corporation work on GitHub sources of MariaDB Server directly. Anyway (meaningless statistics aside), MariaDB seems to be the easiest target for contributions from Community at the moment, and nobody argued against that. My followers also agreed that the same workflow for internal and external contributions is a preferred development model in ideal world.

This kind of public discussions of (serious and funny) MySQL-related matters on Facebook (along with public discussions on MySQL bugs) make me think the way I use my Facebook page is proper and good for the mankind.

Now back to notes made while working on Support issues. This week I had to explain one case when MariaDB server was shut down normally (but unexpectedly for DBA):
2019-05-22 10:37:55 0 [Note] /usr/libexec/mysqld (initiated by: unknown): Normal shutdown
This Percona blog post summarizes different ways to find a process which sent a HUP/KILL/TERM or other signal to the mysqld process. I've used SystemTap-based solution like suggested in that blog post in the past successfully. In this context I find this summary of the ways to force MySQL to fail useful. for all kinds of testing. SELinux manual is also useful to re-read at times.

This week I've spent a lot of time and some efforts trying to reproduce the error (1942 and/or 1940 if anyone cares) on Galera node acting as an async replication slave. These efforts ended up with a bug report, MDEV-19572. Surely the idea to replicate MyISAM tables outside of mysql database to Galera cluster is bad at multiple levels, but why the error after running for a long time normally? In the process of testing I was reading various remotely related posts, so checked this and that... I also hit other problems in the process. Like this crash that happened probably while sending some signal to the node unintentionally:
190523 17:19:46 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.2.23-MariaDB-log
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467240 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
sql/*, TABLE_LIST*, bool, unsigned long))[0x7f64757c9ec7]
The manual page at contains
information that should help you find out what is causing the crash.
I was not able so far top find the exact some backtrace in any known MariaDB bug, so one day I'll have to try to reproduce this crash as well.

I try to check some MariaDB ColumnStore issues from time ot time, for a change, and this week I ended up reading this KB page while trying to understand how much we can control placement of data there.

Finally, for the records, this is the way to "fix" InnoDB statistics if needed (and the need is real as you can find out from Bug #95507 - "innodb_stats_method is not honored when innodb_stats_persistent=ON" reported by my colleague Sergei Petrunia):
update mysql.innodb_index_stats set last_update=now(), stat_value=445000000 where database_name='test' and table_name='t1' and index_name='i1' and stat_name='n_diff_pfx01';
I like to return to familiar nice places and topics, like Regent's Canal or MySQL bugs...
The last bug not the least, MySQL bugs. This week I've subscribed to the following (already "Verified") interesting bug reports (besides the one mentioned above):
  • Bug #95484 - "EXCHANGE PARTITION works wrong/weird with different ROW_FORMAT.". Jean-François Gagné found out that there is a way to have partitions with different row_format values in the same InnoDB table, at least in MySQL 5.7. So why is this not supported officially? See also his Bug #95478 - "CREATE TABLE LIKE does not honour ROW_FORMAT.". It's a week of ROW_FORMAT studies for me, for sure!
  • Bug #95462 - "Data comparison broke in MySQL 8.0.16". It's common knowledge how much I like regression bugs. MySQL 8.0.16 introduced a new one, reported by
    Raman Haran, probably based on some good and valid intentions. But undocumented changes in behavior in GA versions are hardly acceptable, no matter what are the intentions.
That's all for now. Some more links to MySQL bugs from me are always available on Twitter.