Saturday, December 9, 2017

Fun with Bugs #58 - Bug of the Day From @mysqlbugs

In 2013 I had a habit of writing about MySQL bugs on Facebook almost every day. Typical post looked like this one, link to the bug and few words of wondering with a bit of sarcasm.
By the way, check last comments in Bug #68892 mentioned there - the problem of LOST_EVENTS in master's binary log and a way to workaround it still valid as of MySQL 5.7.17.
At that time I often got private messages from colleagues that Facebook is a wrong media for this kind of posts, these posts make MySQL look "buggy" etc, and eventually I was shut up (for a year or so) in a more or less official way by combined efforts of my employer at that time and Oracle MySQL officials.

A year later I started to write about MySQL bugs again on Facebook, but these posts were not regular any more, and maybe became a bit less sarcastic. Years later, I agree that Facebook should better be used for sharing photos of cats, or nice places, or family members, or even for hot political discussions, as these things annoy people much less than MySQL bugs. So, the need for different media for short annoying messages about MySQL bugs is clear to me. I do think the right media is Twitter (it is annoying by itself), and I am present there as @mysqlbugs since December 2012 actually. I am not a big fan of it historically and used to open it mostly to share information about yet another my blog post, but recently (after they allowed to write longer messages there) I decided to pay more attention to it (until its gone entirely and replaced by something else). So, I post a link to one MySQL bug there every day for a week or so, with the #bugoftheday tag. I quickly noticed that the tag was used by others to share photos of nice insects/bugs, but I don't mind for my posts to end up among those, and I am ready to share some of my related photos as well.

To summarize, I am going to write short messages about MySQL bugs regularly on Twitter. I try to write about some bug I recently notices just because it is new, improperly handled or was involved in some customer issue that I worked on. Let me share 5 last bugs mentioned there, so you can decide if it makes sense for you to follow me or #bugoftheday:
  • Bug #88827 - "innodb uses too much space in the PK for concurrent inserts into the same table". Interesting finding by Mark Callaghan during his recent benchmarking efforts. Still "Open".
  • Bug #88788 - "log_bin is not considered correctly an makes binary logging unusable!". This report by Oli Sennhauser (quickly declared a duplicate of my older Bug #75507) got a really nice number, and it's also interesting how efforts to name hosts in a nicely structured manner may play against poor DBA...
  • Bug #88776 - "Issues found by PVS-Studio static analyzer". Related post with the detailed analysis of problems found by the tool was mentioned few days ago by somebody from MariaDB on Slack, so I immediately noted that the bug comes from the same author, Sergey Vasiliev.
  • Bug #88765 - "This bug reporting form has a ridiculously short character limit for the bug syn". The bug report about MySQL bugs database itself. I also hit the limit there way more often than I'd like to. Discussion continues, but I feel that this is not going to be fixed...
  • Bug #87526 - "The output of 'XA recover convert xid' is not useful". One of our customers hit this problem recently. I can only agree with Sveta Smirnova here, "Output of XA RECOVER should show ID which can be used in XA COMMIT statement."
So, if the list above seems useful or interesting, please, follow me on Twitter. Let's continue to have regular fun with MySQL bugs, now using (hopefully) a more appropriate media!

As a side note, if you are interested in older bugs opened this day years ago and still "Verified", please, check this great page by Hartmut! You may find really great reports like Bug #79581- "Error 1064 on selects from Information Schema if routine name has '\0'", by Sveta Smirnova. This bug is 2 years old today...

Saturday, December 2, 2017

Using strace for MySQL Troubleshooting

I'd say that strace utility is even more useful for MySQL DBAs than lsof. Basically, it is a useful general purpose diagnostic and debugging tool for tracing system calls some process makes and signals it receives. The name of each system call, its arguments and its return value are printed to stderr or to the file specified with the -o option.

In context of MySQL strace is usually used to find what files mysqld process accesses, and to check the details about any I/O errors. For example, if I'd really want to try to verify Bug #88479 - "Unable to start mysqld using a single config file (and avoiding reading defaults)" by Simon Mudd, I'd just run mysqld from some 5.7.x version as a command argument for strace. On my Ubuntu 14.04 netbook I have the following files:
openxs@ao756:~/dbs/5.7$ ls -l /usr/my.cnf
-rw-r--r-- 1 root root 943 лип 19  2013 /usr/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/my.cnf
-rw-r--r-- 1 root root 260 чер 24 20:40 /etc/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/mysql/my.cnf
-rw-r--r-- 1 root root 116 лют 26  2016 /etc/mysql/my.cnf
openxs@ao756:~/dbs/5.7$ bin/mysqld --version
bin/mysqld  Ver 5.7.18 for Linux on x86_64 (MySQL Community Server (GPL))
So, what happens if I try to run mysqld --defaults-file=/etc/mysql/my.cnf, like this:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
It seems we proved that only the file passed as --defaults-file is read (if it exists). By default other locations are also checked in a specific order (note that return codes are mapped to symbolic errors when possible):
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --print-defaults --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/my.cnf", {st_mode=S_IFREG|0644, st_size=260, ...}) = 0
open("/etc/my.cnf", O_RDONLY)           = 3
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
stat("/home/openxs/dbs/5.7/etc/my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
stat("/home/openxs/.my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
If we think that --print-defaults may matter, we can try without it:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf 2>&1 | grep 'my.cnf'stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
Last example also shows how one can terminate tracing with Ctrl-C.

Now, let me illustrate typical use cases with (surprise!) some public MySQL bug reports where strace was important to find or verify the bug:
  • Bug #20748 - "Configuration files should not be read more than once". In this old bug report Domas Mituzas proved the point by running mysqld as a command via strace. He filtered out lines related to opening my.cnf file with egrep and made it obvious that one file may be read more than once. The bug is closed long time ago, but it is not obvious that all possible cases are covered, based on further comments...
  • Bug #49336 - "mysqlbinlog does not accept input from stdin when stdin is a pipe". Here bug reporter shown how run mysqlbinlog as a command under strace and redirect output to the file with -o option.
  • Bug #62224 - "Setting open-files-limit above the hard limit won't be logged in errorlog". This minor bug in all versions before old 5.6.x (that still remains "Verified", probably forgotten) was reported by Daniël van Eeden. strace allowed him to show what limits are really set by setrlimit() calls. The fact that arguments of system calls are also shown matters sometimes.
  • Bug #62578 - "mysql client aborts connection on terminal resize". This bug report by Jervin Real from Percona is one of my all time favorites! I clearly remember how desperate customer tried to to dump data and load them on a remote server with a nice command line, and after waiting for many days (mysqldump was run for a data set of 5TB+, don't ask, not my idea) complained that they got "Lost connection to MySQL server during query" error message and loading failed. Surely, the command was run from the terminal window on his Mac and in the process of moving it here and there he just resized the window  by chance... You can see nice example of strace usage with MySQL Sandbox in the bug report, as well as some outputs for SIGWINCH signal. Note that the bug is NOT fixed by Oracle in MySQL 5.5.x (and never happened in 5.6+), while Percona fixed it since 5.5.31.
  • Bug #65956 - "client and libmysqlclient VIO drops connection if signal received during read()". In this yet another 5.5 only regression bug that was NOT fixed in 5.5.x by Oracle (until in frames of Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not" the patch was contributed for a special case of it by Laurynas Biveinis from Percona, that in somewhat changed form allowed to, hopefully, get some fix in 5.5.52+), the problem was noted while trying to attach strace to running client program (with -p option).
  • Bug #72340 - "my_print_defaults fails to parse include directive if there is no new line". Here bug reporter used strace (with -f option to trace child/forked processes and -v option get verbose output) to show that file name is NOT properly recognized by my_print_defaults. The bug is still "Verified".
  • Bug #72701 - "mysqlbinlog uses localtime() to print events, causes kernel mutex contention". Yet another nice bug report by Domas Mituzas, who had used -e stat options of strace to trace only stat() calls and show how many times they are applied to /etc/localtime.The bug is fixed since 5.5.41, 5.6.22 and 5.7.6 removed related kernel mutex contention.
  • Bug #76627 - "MySQL/InnoDB mix buffered and direct IO". It was demonstrated with strace that InnoDB was opening each .ibd file twice (this is expected as explained by Marko Mäkelä) in different modes (and this was not any good). The bug is fixed in recent renough MySQL server versions.
  • Bug #80020 - "mysqlfrm doesn't work with 5.7". In this bug report Aleksandr Kuzminsky had to use strace to find out why mysqlfrm utility failed mostly silently even with --verbose option.
  • Bug #80319 - ""flush tables" semantics deviate from manual". Here Jörg Brühe proved with strace that close() system call is not used for individual InnoDB table t when FLUSH TABLES t is executed. manual had to be clarified.
  • Bug #80889 - "PURGE BINARY LOGS TO is reading the whole binlog file and causing MySql to Stall". By attaching strace to running mysqld process and running the command it was shown that when GTID_MODE=OFF the entire file we purge to was read. No reason to do this, really. Note how return value of open(), file descriptor, was further used to track reads from this specific file.
  • Bug #81443 - "mysqld --initialize-insecure silently fails with --user flag". As MySQL DBA, you should be ready to use strace when requested by support or developers. Here bug reporter was asked to use it, and this revealed a permission problem (that was a result of the bug). No need to guess what may go wrong with permissions - just use strace to find out!
  • Bug #84708 - "mysqld fills up error logs with [ERROR] Error in accept: Bad file descriptor". Here strace was used to find out that "When mysqld is secured with tcp_wrappers it will close a socket from an unauthorized ip address and then immediately start polling that socket"... The bug is fixed in MySQL 5.7.19+ and 8.0.2+, so take care!
  • Bug #86117 - "reconnect is very slow". I do not care about MySQL Shell, but in this bug report Daniël van Eeden used -r option of strace to print a relative timestamp for every system call, and this was it was clear how much time was spent during reconnect, and where it was spent. Very useful!
  • Bug #86462 - "mysql_ugprade: improve handling of upgrade errors". In this case strace allowed Simon Mudd to find out what exact SQL statement generated by mysql_upgrade failed.
The last but not the least, note that you may need root or sudo privileges to use strace. On my Ubuntu 14.04 this kind of messages may appear even if I am the user that owns mysqld process (same with gdb):
openxs@ao756:~/dbs/maria10.2$ strace -p 2083
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
To summarize, strace may help MySQL DBA to find out:

  • what files are accessed by the mysqld process or related utilities, and in what order
  • why some MySQL-related command (silently) fails or hangs
  • why some commands end up with permission denied or other errors
  • what signals MySQL server and tools get
  • what system calls could took a lot of time when something works slow
  • when files are opened and closed, and how much data are read from the files
  • where the error log and other logs are really located (we can look for system calls related to writing to stderr, for example)
  • how MySQL really works with files, ports and sockets
It also helps to find and verify MySQL bugs, and clarify missing details in MySQL manual.

There other similar tools for tracing system calls (maybe among other things) on Linux that I am going to review in this blog some day. Performance impact of running MySQL server under this kind of tracing is also a topic to study.