Saturday, November 30, 2019

Fun with Bugs #90 - On MySQL Bug Reports I am Subscribed to, Part XXIV

Previous post in this series was published 3 months ago and the last Bug #96340 from it is already closed as fixed in upcoming MySQL 8.0.19. I've picked up 50+ more bugs to follow since that time, so I think I should send quick status update about interesting public MySQL bug reports that are still active.

As usual I concentrate mostly on InnoDB, replication and optimizer bugs. Here is the list, starting from the oldest:
  • Bug #96374  - "binlog rotation deadlock when innodb concurrency limit setted". This bug was reported by Jia Liu, who used gdb to show threads deadlock details. I admit that recently more bug reporters use gdb and sysbench with custom(ized) Lua scripts to prove the point, and I am happy to see this happening!
  • Bug #96378 - "Subquery with parameter is exponentially slower than hard-coded value". In my primitive test with user variables replaced by constants (on MariaDB 10.3.7) I get the same plan for the query, so I am not 100% sure that the analysis by my dear friend Sinisa Milivojevic was right and it's about optimization (and not comparing values with different collations, for example). But anyway, this problem reported by Jeff Johnson ended up as a verified feature request. Let's see what may happen to it next.
  • Bug #96379 - "First query successful, second - ERROR 1270 (HY000): Illegal mix of collations ". This really funny bug was reported by Владислав Сокол.
  • Bug #96400 - "MTS STOP SLAVE takes over a minute when master crashed during event logging". Nice bug report by Przemyslaw Malkowski from Percona, who used sysbench and dbdeployer to demonstrate the problem. Later Przemysław Skibiński (also from Percona) provided a patch to resolve the problem.
  • Bug #96412 - "Mess usages of latch meta data for InnoDB latches (mutex and rw_lock)". Fungo Wang had to make a detailed code analysis to get this bug verified. I am not sure why it ended up with severity S6 (Debug Builds) though.
  • Bug #96414 - "CREATE TABLE events in wrong order in a binary log.". This bug was reported by Iwo P. His test case to demonstarte the problem included small source code modification, but (unlike with some other bug reports) this had NOT prevented accepting it as a true, verified bug. The bug not affect MySQL 8.0.3+ thanks to WL#6049 "Meta-data locking for FOREIGN KEY tables" implemented there.
  • Bug #96472 - "Memory leak after 'innodb.alter_crash'". Yet another bug affecting only MySQL 4.7 and not MySQL 8.0. It was reported by Yura Sorokin from Percona.
  • Bug #96475 - "ALTER TABLE t IMPORT TABLESPACE blocks SELECT on I_S.tables.".  Clear and simple "How to repeat" instructions (using dbdeployer) by Jean-François Gagné. See also his related Bug #96477 - "FLUSH TABLE t FOR EXPORT or ALTER TABLE t2 IMPORT TABLESPACE broken in 8.0.17" for MySQL 8. The latter is a regression bug (without a regression tag), and I just do not get how the GA releases with such new bugs introduced may happen.
  • Bug #96504 - "Refine atomics and barriers for weak memory order platform". Detailed analysis, with links to code etc from Cai Yibo.
  • Bug #96525 - "Huge malloc when open file limit is high". Looks more like a systemd problem (in versions < 240) to me. Anyway, useful report from Andreas Hasenack.
  • Bug #96615 - "mysql server cannot handle write operations after set system time to the past". A lot of arguments were needed to get this verified, but Shangshang Yu was not going to give up. First time I see gstack used in the bug report to get a stack trace quickly. It's a part of gdb RPM on CentOS 6+. I have to try it vs gdb and pstack one day and decide what is the easiest and most efficient way to get backtraces of all threads in production...
  • Bug #96637 - "Clone fails on just upgraded server from 5.7". I had not used MySQL 8 famous clone plugin yet in practice, but I already know that it has bugs. This bug was reported by Satya Bodapati, who also suggested a patch.
  • Bug #96644 - "Set read_only on a master waiting for semi-sync ACK blocked on global read lock". Yet another problem (documented limitation) report from Przemyslaw Malkowski. Not sure why it was not verified on MySQL 8.0. Without a workaround to set master to read only it is unsafe to use long rpl_semi_sync_master_timeout values, as we may end up with that long downtime.
  • Bug #96677 - ""SELECT ... INTO var_name FOR UPDATE" not working in MySQL 8". This regression bug was reported by Vinodh Krish. Some analysis and patch were later suggested by Zsolt Parragi.
  • Bug #96690 - "sql_require_primary_key should not apply to temporary tables". This bug was also reported by Przemyslaw Malkowski from Percona. It ended up as a verified feature request, but not everyone in community is happy with this. Let me quote:
    "[30 Aug 8:08] Jean-François Gagné
    Could we know what was the original severity of this bug as reported by Przemyslaw ? This is now hidden as it has been reclassified as S4 (Feature Request).

    From my point of view, this is actually a bug, not a feature request and it should be classified as S2. A perfectly working application would break for no reason when a temporary table does not have a Primary Key, so this is actually a big hurdle for using sql_require_primary_key, hence serious bug in the implementation of this otherwise very nice and useful feature.
    "
That's all about bugs I've subscribed to in summer.
Winter is coming, so why not to remember nice warm sunny days and interesting MySQL bugs reported back then.
To summarize:
  1. We still see some strange "games" played during bugs processing and trend to decrease severity of reports. I think this is a waste of time for both Oracle engineers and community bug reporters.
  2. I am still not sure if Oracle's QA does not use ASan or just ignore problems reported for MTR test cases. Anyway, Percona engineers do this for them, and report related bugs :)
  3. dbdeployer and sysbench are really popular among MySQL bug reporters recently!
  4. Importing of InnoDB tablespaces is broken in MySQL 8.0.17+ at least.
  5. There are many interesting MySQL bugs reported during last 3 months, so I epxect more posts in this series soon.

Sunday, November 17, 2019

Dynamic Tracing of MySQL Server with Timestamps Using gdb

Some time ago I wanted a customer to trace some MariaDB function execution and make sure that when it is executed I get both timestamp of execution and some of the arguments printed into some log file. Our InnoDB guru in MariaDB, Marko Mäkelä, suggested to use gdb, set breakpoint on the function and use its command ... end syntax to print whatever we needed, and log the output.

Adding a timestamp to each breakpoint "command" execution was the next step, and for this I've suggested to use plain "shell date" command. Redirecting output to the same file gdb uses to log everything indeed did the trick. Let me document it here for readers and myself. I'll consider the example of logging SQL queries I've used in the previous blog post on dynamic tracing with perf and use recent Percona Server 5.7.28-31 on Ubuntu 16.04 for tests.

Skipping the details already explained in that previous post, let's assume I want to set a breakpoint at the dispatch_command() function and print the value of com_data->com_query.query every time it hit it. This is what I did with gdb after attaching it to the proper mysqld process:
...
[New LWP 6562]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set height 0
(gdb) set log on
Copying output to gdb.txt.
(gdb) b dispatch_command
Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc, line 1254.
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>shell date >> ./gdb.txt
>p com_data->com_query.query
>continue
>end
(gdb) continue
Continuing.

Key detail here is to use command and refer to the breakpoint by number, 1, and then append the output of the date command to the gdb.txt file in the current directory that is used by default to log the gdb output.

Then in another shell I executed the following:
openxs@ao756:~$ mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28-31-log Percona Server (GPL), Release '31', Revision 'd14ef86'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.04 sec)

mysql> select 2;
+---+
| 2 |
+---+
| 2 |
+---+
1 row in set (0.04 sec)

mysql> shutdown;
Query OK, 0 rows affected (0.04 sec)

mysql> quit
Bye

I had the output in the gdb window, but what's more important is that I've got it in the gdb.txt file as well. The content looked as follows:
openxs@ao756:~$ ls -l gdb.txt
-rw-r--r-- 1 root root 7123 лис 17 19:21 gdb.txt
openxs@ao756:~$ cat gdb.txt
Breakpoint 1 at 0xbe9660: file /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc, line 1254.
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
Continuing.
[New Thread 0x7fc566411700 (LWP 6611)]
[Switching to Thread 0x7fc566411700 (LWP 6611)]

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (thd=thd@entry=
    0x7fc548631000, com_data=com_data@entry=0x7fc566410da0, command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc: No such file or directory.
неділя, 17 листопада 2019 19:20:47 +0200
$1 = 0x7fc54865b021 "show databases"
Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:20:47 +0200
$2 = 0x7fc54865b021 "show tables"
...
Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:20:47 +0200
$9 = 0x7fc54865b021 "select @@version_comment limit 1"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:21:01 +0200
$10 = 0x7fc54865b021 "select 1"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:21:08 +0200
$11 = 0x7fc54865b021 "select 2"

Thread 29 "mysqld" hit Breakpoint 1, dispatch_command (
    thd=thd@entry=0x7fc548631000, com_data=com_data@entry=0x7fc566410da0,
    command=COM_QUERY)
    at /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc:1254
1254    in /mnt/workspace/percona-server-5.7-debian-binary-rocks-new/label_exp/min-xenial-x64/test/percona-server-5.7-5.7.28-31/sql/sql_parse.cc
неділя, 17 листопада 2019 19:21:12 +0200
$12 = 0x7fc54865b021 "shutdown"
[Thread 0x7fc566411700 (LWP 6611) exited]

Thread 1 "mysqld" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fc56645b780 (LWP 6525)]
0x00007fc563d8774d in poll () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
Detaching from program: /usr/sbin/mysqld, process 6525
openxs@ao756:~$
That's it. For basic tracing of any function calls in MySQL or MariaDB server (or any binary with debuginfo available) including timestamps and any arguments, variables or expressions printed you do not strictly need anything but gdb. With different format options for date command you can format timestamp any way you need, for example, you can remove day and month names etc and get even nanosecond precision if you prefer:
openxs@ao756:~$ for i in `seq 1 20`; do sleep 0.1; date +'%H:%m:%S.%N'; done
20:11:41.126205172
20:11:41.230087681
...
20:11:42.996521734
20:11:43.100229347
openxs@ao756:~$
* * * 
Sometimes event basic tools allow to get useful results. This photo was made with my dumb Nokia phone while running. Just a stop for a moment. Nice and useful result, IMHO, same as with gdb trick discussed here.
To summarize, for basic dynamic tracing with timestamps and arbitrary information printed you can use a command line debugger, like gdb. In some cases this simple approach is quite useful.

This kind of tracing comes with a cost, both in terms of performance impact (awful) and additional steps to parse the output (we do not really care about all that breakpoint details, we justneed a timestamp and some values printed). eBPF and related dynamic tracing tools (bcc trace and bpftrace) may help with both problems, as I am going to demonstrate in upcoming blog posts. Stay tuned!

Sunday, November 10, 2019

Time in Performance Schema

I've seen questions like this:
"Is there a way to know when (date and time) the last statement captured in ... was actually ran?"
more than once in some discussions (and customer issues) related to Performance Schema. I've seen answers provided by my colleagues and me after some limited testing. I've also noticed statements that it may not be possible.

Indeed, examples of wall clock date and time in the output of queries from the performance_schema are rare (and usually come from tables in the information_schemasys.format_time() function converts time to a nice, human readable format, but it still remains relative - it is not a date and time when something recorded in performance_schema happened.

In this post I'd like to document the answer I've seen and have in mind (and steps to get it) here, to save time for readers and myself faced with similar questions in the future. I'll also show the problem with this answer that I've noticed after testing for more than few minutes.

Let's start with simple setup of testing environment. In my case it is good old MariaDB 10.3.7 running on this netbook under Windows. First, let's check if Performance Schema is enabled:
 MariaDB [test]> select version(), @@performance_schema;
+--------------------+----------------------+
| version()          | @@performance_schema |
+--------------------+----------------------+
| 10.3.7-MariaDB-log |                    1 |
+--------------------+----------------------+
1 row in set (0.233 sec)
Then let's enable recording of time for everything and enable all consumers:
MariaDB [test]> update performance_schema.setup_instruments set enabled='yes', timed='yes';
Query OK, 459 rows affected (0.440 sec)
Rows matched: 707  Changed: 459  Warnings: 0

MariaDB [test]> update performance_schema.setup_consumers set enabled='yes';
Query OK, 8 rows affected (0.027 sec)
Rows matched: 12  Changed: 8  Warnings: 0
Now we can expect recently executed statements to be recorded, like this:
MariaDB [test]> select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current\G*************************** 1. row ***************************
      now(): 2019-11-03 17:42:51
   event_id: 46
timer_start: 22468653162059216
  timer_end: 22468697203533224
   sql_text: select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current
1 row in set (0.045 sec)
Good, but how we can get a real time when the statement was executed (like now() reports)? We all know from the fine manual that timer_start and timer_end values are in "picoseconds". So we can easily convert them into seconds (or whatever units we prefer):
MariaDB [test]> select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current\G
*************************** 1. row ***************************
                    now(): 2019-11-03 17:54:02
                 event_id: 69
timer_start/1000000000000: 23138.8159
                 sql_text: select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current
1 row in set (0.049 sec)
This value is related startup time one might assume, and we indeed can expect that timer in Performance Schema is initialized at some very early stage of startup. But how to get date and time of server startup with SQL statement?

This also seems to be easy, as we have a global status variable called Uptime measured in seconds. Depending on fork and version used we can get the value of Uptime either from the Performance Schema (in MySQL 5.7+) or from the Information Schema (in MariaDB and older MySQL versions). For example:
MariaDB [test]> select variable_value from information_schema.global_status where variable_name = 'Uptime';
+----------------+
| variable_value |
+----------------+
| 23801          |
+----------------+
1 row in set (0.006 sec)
So, server startup time is easy to get with a date_sub() function:
MariaDB [test]> select @start := date_sub(now(), interval (select variable_value from information_schema.global_status where variable_name = 'Uptime') second) as start;
+----------------------------+
| start                      |
+----------------------------+
| 2019-11-03 11:28:18.000000 |
+----------------------------+
1 row in set (0.007 sec)
In the error log of MariaDB server I see:
2019-11-03 11:28:18 0 [Note] mysqld (mysqld 10.3.7-MariaDB-log) starting as process 5636 ...
So, I am sure the result is correct. Now, if we use date_add() to add timer value converted to seconds, for example to the server startup time, we can get the desired answer, date and time when the statement recorded in performance_schema was really executed:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 657
                  ts: 2019-11-03 18:24:00.501654
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:05
timediff(now(), @ts): 00:00:04.498346
1 row in set (0.002 sec)
I was almost ready to publish this blog post a week ago, before paying more attention to the result (that used to be perfectly correct in earlier simple tests) and executing a variation of statement presented above. The problem I noticed is that when Uptime of the server is not just few minutes (as it often happens in quick test environments), but hours or days, timestamp that we get for a recent event from the performance_schema using the approach suggested may differ from current timestamp notably (we see 4.5+ seconds difference highlighted above). Moreover, this difference seem to fluctuate:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 682
                  ts: 2019-11-03 18:24:01.877763
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:07
timediff(now(), @ts): 00:00:05.122237
1 row in set (0.002 sec)
and tend to grow with Uptime. This make the entire idea of converting timer_start and timer_end Performance Schema "counters" in "picoseconds" questionable and unreliable for the precise real timestamp matching and comparing with other timestamp information sources in production.
Same as with this photo of sunset at Brighton taken with my Nokia dumb phone back in June, I do not see a clear picture of time measurement in Performance Schema...

After spending some more time thinking about this I decided to involve MySQL team somehow and created the feature request, Bug #97558 - "Add function (like sys.format_time) to convert TIMER_START to timestamp", that ended up quickly "Verified" (so I have small hope that I had not missed anything really obvious - correct me if I am wrong). I'd be happy to see further comments there and, surely, the function I asked about implemented. But I feel there is some internal problem with this and some new feature at server side may be needed to take the "drift" of time in Performance Schema into account.

There is also a list of currently open questions that I may try to answer in followup posts:
  1. Is the problem of time drift I noticed a MariaDB 10.3.7-specific, or recent MySQL 5..x and 8.0.x are also affected?
  2. Is this difference in time growing monotonically with time or really fluctuating?
  3. When exactly Performance Schema time "counter" starts, where is it in the code?
  4. Are there any other, better or at least more precise and reliable ways to get timestamps of some specific events that happen during MySQL server work? I truly suspect that gdb and especially dynamic tracing on Linux with tools like bpftrace may give us more reliable results...