Blog of (former?) MySQL Entomologist: strace

Showing posts with label strace. Show all posts

Saturday, December 2, 2017

Using strace for MySQL Troubleshooting

I'd say that strace utility is even more useful for MySQL DBAs than lsof. Basically, it is a useful general purpose diagnostic and debugging tool for tracing system calls some process makes and signals it receives. The name of each system call, its arguments and its return value are printed to stderr or to the file specified with the -o option.

In context of MySQL strace is usually used to find what files mysqld process accesses, and to check the details about any I/O errors. For example, if I'd really want to try to verify Bug #88479 - "Unable to start mysqld using a single config file (and avoiding reading defaults)" by Simon Mudd, I'd just run mysqld from some 5.7.x version as a command argument for strace. On my Ubuntu 14.04 netbook I have the following files:

openxs@ao756:~/dbs/5.7$ ls -l /usr/my.cnf
-rw-r--r-- 1 root root 943 лип 19 2013 /usr/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/my.cnf
-rw-r--r-- 1 root root 260 чер 24 20:40 /etc/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/mysql/my.cnf
-rw-r--r-- 1 root root 116 лют 26 2016 /etc/mysql/my.cnf
openxs@ao756:~/dbs/5.7$ bin/mysqld --version
bin/mysqld Ver 5.7.18 for Linux on x86_64 (MySQL Community Server (GPL))

So, what happens if I try to run mysqld --defaults-file=/etc/mysql/my.cnf, like this:

openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY) = 3
openxs@ao756:~/dbs/5.7$

It seems we proved that only the file passed as --defaults-file is read (if it exists). By default other locations are also checked in a specific order (note that return codes are mapped to symbolic errors when possible):

openxs@ao756:~/dbs/5.7$ strace bin/mysqld --print-defaults --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/my.cnf", {st_mode=S_IFREG|0644, st_size=260, ...}) = 0
open("/etc/my.cnf", O_RDONLY) = 3
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY) = 3
stat("/home/openxs/dbs/5.7/etc/my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
stat("/home/openxs/.my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
openxs@ao756:~/dbs/5.7$

If we think that --print-defaults may matter, we can try without it:

openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf 2>&1 | grep 'my.cnf'stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY) = 3
^C
openxs@ao756:~/dbs/5.7$

Last example also shows how one can terminate tracing with Ctrl-C.

Now, let me illustrate typical use cases with (surprise!) some public MySQL bug reports where strace was important to find or verify the bug:

Bug #20748 - "Configuration files should not be read more than once". In this old bug report Domas Mituzas proved the point by running mysqld as a command via strace. He filtered out lines related to opening my.cnf file with egrep and made it obvious that one file may be read more than once. The bug is closed long time ago, but it is not obvious that all possible cases are covered, based on further comments...
Bug #49336 - "mysqlbinlog does not accept input from stdin when stdin is a pipe". Here bug reporter shown how run mysqlbinlog as a command under strace and redirect output to the file with -o option.
Bug #62224 - "Setting open-files-limit above the hard limit won't be logged in errorlog". This minor bug in all versions before old 5.6.x (that still remains "Verified", probably forgotten) was reported by Daniël van Eeden. strace allowed him to show what limits are really set by setrlimit() calls. The fact that arguments of system calls are also shown matters sometimes.
Bug #62578 - "mysql client aborts connection on terminal resize". This bug report by Jervin Real from Percona is one of my all time favorites! I clearly remember how desperate customer tried to to dump data and load them on a remote server with a nice command line, and after waiting for many days (mysqldump was run for a data set of 5TB+, don't ask, not my idea) complained that they got "Lost connection to MySQL server during query" error message and loading failed. Surely, the command was run from the terminal window on his Mac and in the process of moving it here and there he just resized the window by chance... You can see nice example of strace usage with MySQL Sandbox in the bug report, as well as some outputs for SIGWINCH signal. Note that the bug is NOT fixed by Oracle in MySQL 5.5.x (and never happened in 5.6+), while Percona fixed it since 5.5.31.
Bug #65956 - "client and libmysqlclient VIO drops connection if signal received during read()". In this yet another 5.5 only regression bug that was NOT fixed in 5.5.x by Oracle (until in frames of Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not" the patch was contributed for a special case of it by Laurynas Biveinis from Percona, that in somewhat changed form allowed to, hopefully, get some fix in 5.5.52+), the problem was noted while trying to attach strace to running client program (with -p option).
Bug #72340 - "my_print_defaults fails to parse include directive if there is no new line". Here bug reporter used strace (with -f option to trace child/forked processes and -v option get verbose output) to show that file name is NOT properly recognized by my_print_defaults. The bug is still "Verified".
Bug #72701 - "mysqlbinlog uses localtime() to print events, causes kernel mutex contention". Yet another nice bug report by Domas Mituzas, who had used -e stat options of strace to trace only stat() calls and show how many times they are applied to /etc/localtime.The bug is fixed since 5.5.41, 5.6.22 and 5.7.6 removed related kernel mutex contention.
Bug #76627 - "MySQL/InnoDB mix buffered and direct IO". It was demonstrated with strace that InnoDB was opening each .ibd file twice (this is expected as explained by Marko Mäkelä) in different modes (and this was not any good). The bug is fixed in recent renough MySQL server versions.
Bug #80020 - "mysqlfrm doesn't work with 5.7". In this bug report Aleksandr Kuzminsky had to use strace to find out why mysqlfrm utility failed mostly silently even with --verbose option.
Bug #80319 - ""flush tables" semantics deviate from manual". Here Jörg Brühe proved with strace that close() system call is not used for individual InnoDB table t when FLUSH TABLES t is executed. manual had to be clarified.
Bug #80889 - "PURGE BINARY LOGS TO is reading the whole binlog file and causing MySql to Stall". By attaching strace to running mysqld process and running the command it was shown that when GTID_MODE=OFF the entire file we purge to was read. No reason to do this, really. Note how return value of open(), file descriptor, was further used to track reads from this specific file.
Bug #81443 - "mysqld --initialize-insecure silently fails with --user flag". As MySQL DBA, you should be ready to use strace when requested by support or developers. Here bug reporter was asked to use it, and this revealed a permission problem (that was a result of the bug). No need to guess what may go wrong with permissions - just use strace to find out!
Bug #84708 - "mysqld fills up error logs with [ERROR] Error in accept: Bad file descriptor". Here strace was used to find out that "When mysqld is secured with tcp_wrappers it will close a socket from an unauthorized ip address and then immediately start polling that socket"... The bug is fixed in MySQL 5.7.19+ and 8.0.2+, so take care!
Bug #86117 - "reconnect is very slow". I do not care about MySQL Shell, but in this bug report Daniël van Eeden used -r option of strace to print a relative timestamp for every system call, and this was it was clear how much time was spent during reconnect, and where it was spent. Very useful!
Bug #86462 - "mysql_ugprade: improve handling of upgrade errors". In this case strace allowed Simon Mudd to find out what exact SQL statement generated by mysql_upgrade failed.

The last but not the least, note that you may need root or sudo privileges to use strace. On my Ubuntu 14.04 this kind of messages may appear even if I am the user that owns mysqld process (same with gdb):

openxs@ao756:~/dbs/maria10.2$ strace -p 2083
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf

To summarize, strace may help MySQL DBA to find out:

what files are accessed by the mysqld process or related utilities, and in what order
why some MySQL-related command (silently) fails or hangs
why some commands end up with permission denied or other errors
what signals MySQL server and tools get
what system calls could took a lot of time when something works slow
when files are opened and closed, and how much data are read from the files
where the error log and other logs are really located (we can look for system calls related to writing to stderr, for example)
how MySQL really works with files, ports and sockets

It also helps to find and verify MySQL bugs, and clarify missing details in MySQL manual.

There other similar tools for tracing system calls (maybe among other things) on Linux that I am going to review in this blog some day. Performance impact of running MySQL server under this kind of tracing is also a topic to study.

Sunday, November 27, 2016

Upstart Basics for Automatic Restarts of MaxScale After Crash

Historically I do not care much about MariaDB's MaxScale, at least since I know how to build it from source when needed. But, as a support engineer who work at MariaDB, sometimes I have to deal with problems related to MaxScale, and this week it happened so that I had to explain what to do to implement automatic restarts of MaxScale "daemon" in case of crashes on RHEL 6.x.

In the process I had found out that two of my most often used Linux distributions, CentOS 6.x and Ubuntu 14.04, actually use Upstart, so good old System V's init tricks and scripts work there only partially and only because somebody cared to integrate them into this "new" approach to starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running. I had to do this years ago, but customer's question finally forced me to check and study some details on how this system actually works.

So, unfortunately, there is no script like mysqld_safe to start and restart MaxScale after installing official RPM from MariaDB (in this case it was maxscale-2.0.1-2.x86_64). My first idea was to write one, but then I asked myself why it is not there yet, decided to check what's already provided and take a closer look at /etc/init.d/maxscale I have on my CentOS 6.8 VM (the closest to what customer used). It's a typical classic shell script to start service and it starts MaxScale like this:

start() {
    echo -n $"Starting MaxScale: "
...
    ulimit -HSn 65535
    daemon --pidfile $MAXSCALE_PIDFILE /usr/bin/maxscale --user=maxscale $MAXSCALE_OPTIONS >& /dev/null

    RETVAL=$?
    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$servicename
...
    # Return right code
    if [ $RETVAL -ne 0 ]; then
       failure
       RETVAL=$_RETVAL_NOT_RUNNING
    fi

    echo

    return $RETVAL
}

Basically, it runs /usr/bin/maxscale --user=maxscale, the rest are details (like location of PID file the rest of the script then relies on). There is no advanced script to monitor status of this process with this PID (like mysqld_safe) or anything to care about automatic restarts. It is supposed that I just execute chkconfig maxscale on and then service starts when system enters proper runlevel.

Simple test shows that when maxscale process is killed it's gone and is not restarted. At good old times I'd just add something like this:

mx:2345:respawn:/usr/bin/maxscale --user=maxscale

to /etc/inittab (as nice articles like this suggests), but quick check and then further reading proves that it's not going to work on CentOS 6.8, as it uses Upstart.

So, either I had to write something similar to mysqld_safe for MaxScale, or (having in mind size of code and number of bugs we had in the past in that script) I had better to find out what is the supposed way to respawn processes in Upstart. Basic ideas are, again, simple. One has to create /etc/init/service_name.conf file and put something like this there (real code quote from one of Ubuntu files for MySQL):

start on runlevel [2345]
stop on starting rc RUNLEVEL=[016]

respawn
respawn limit 2 5
...
pre-start script
...
end script

exec /usr/sbin/mysqld

post-start script
...
end script

The file is easy to understand even without reading the manual. One has to set when service starts and stops, add respawn clause if we want to restart it in case of unexpected crashes or kills of the process, optionally limit the number of restarts and intervals between restarts etc, and, optionally, do something before and after start.

I quickly created /etc/init/maxscale.conf based on the above and it did a great job in starting it automatically upon system startup. I've just used exec /usr/bin/maxscale --user=maxscale basically and decided to deal with options and other details later if needed. But what was strange for the very beginning is that in /var/log/messages I've seen what looked like repeated attempts to start maxscale process, that failed:

Nov 26 17:44:45 centos maxscale[20229]: MaxScale started with 1 server threads.
Nov 26 17:44:45 centos init: maxscale main process ended, respawning
Nov 26 17:44:45 centos maxscale[20229]: Started MaxScale log flusher.
Nov 26 17:44:45 centos maxscale[20235]: Working directory: /var/log/maxscale
Nov 26 17:44:45 centos maxscale[20235]: MariaDB MaxScale 2.0.1 started
Nov 26 17:44:45 centos maxscale[20235]: MaxScale is running in process 20235
...

Nov 26 17:44:45 centos maxscale[20235]: Loaded module qc_sqlite: V1.0.0 from /usr/lib64/maxscale/libqc_sqlite.so
Nov 26 17:44:45 centos maxscale[20235]: MaxScale is already running. Process id: 20229. Use another location for the PID file to run multiple instances of MaxScale on the same machine.
Nov 26 17:44:45 centos init: maxscale main process (20234) terminated with status 4
Nov 26 17:44:45 centos init: maxscale main process ended, respawning

Moreover, when proper maxscale process was killed, it was NOT respawned as expected.

It was just a proper time to read the manual more carefully, this part about expect stanza (that I noted in some of official Upstart scripts):

"To allow Upstart to determine the final process ID for a job, it needs to know how many times that process will call fork(2). Upstart itself cannot know the answer to this question since once a daemon is running, it could then fork a number of "worker" processes which could themselves fork any number of times. Upstart cannot be expected to know which PID is the "master" in this case, considering it does not know if worker processes will be created at all, let alone how many times, or how many times the process will fork initially. As such, it is necessary to tell Upstart which PID is the "master" or parent PID. This is achieved using the expect stanza.
The syntax is simple, but you do need to know how many times your service forks."

Let's check quickly how many times fork() is called in maxscale (I'd know it better if I ever cared to study the source code in details, but I had not checked most part of it yet). Test based on that cookbook gives unexpected results:

[root@centos ~]# strace -o /tmp/strace.log -fFv maxscale &
[1] 2369
[root@centos ~]# sleep 10
[root@centos ~]# ps aux | grep strace
root      2369 2.6 0.0   4476   888 pts/0    S    20:28   0:00 strace -o /tmp/strace.log -fFv maxscale
root      2382 0.0 0.0 103312   868 pts/0    S+   20:28   0:00 grep strace
[root@centos ~]# pkill -9 strace
[1]+ Killed                  strace -o /tmp/strace.log -fFv maxscale
[root@centos ~]# ps aux | grep maxscale
root      2375 1.3 0.2 276168 3896 ?        Ssl 20:28   0:00 maxscale
root      2385 0.0 0.0 103312   868 pts/0    S+   20:28   0:00 grep maxscale
[root@centos ~]# egrep "\<(fork|clone)\>\(" /tmp/strace.log | wc | awk '{print $1}'
5

How comes we have 5 fork calls? Here they are:

[root@centos ~]# egrep "\<(fork|clone)\>\(" /tmp/strace.log
2374 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb024c08ab0) = 2375
2375 clone(child_stack=0x7fb01f819f10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb01f81a9d0, tls=0x7fb01f81a700, child_tidptr=0x7fb01f81a9d0) = 2376
2375 clone(child_stack=0x7fb01e118f10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb01e1199d0, tls=0x7fb01e119700, child_tidptr=0x7fb01e1199d0) = 2377
2375 clone(child_stack=0x7fb01d10af10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb01d10b9d0, tls=0x7fb01d10b700, child_tidptr=0x7fb01d10b9d0) = 2378
2375 clone(child_stack=0x7fb017ffef10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb017fff9d0, tls=0x7fb017fff700, child_tidptr=0x7fb017fff9d0) = 2379

It seems process with PID 2374 actually forked just once, to produce PID 2375, and then several threads were started by that process. We can see them:

[root@centos ~]# ps -T -p `pidof maxscale`
PID SPID TTY          TIME CMD
2375 2375 ?        00:00:00 maxscale
2375 2376 ?        00:00:00 maxscale
2375 2377 ?        00:00:00 maxscale
2375 2378 ?        00:00:00 maxscale
2375 2379 ?        00:00:00 maxscale
[root@centos ~]#

So, it was really one fork() (and I could notify that even from studying /var/log/messages before) and I have to add expect fork stanza to my Upstart configuration file:

[root@centos ~]# cat /etc/init/maxscale.conf
# MaxScale service

description "MaxScale"

start on stopped rc RUNLEVEL=[2345]
stop on starting rc runlevel [!2345]

respawn
respawn limit 2 5

expect fork

exec /usr/bin/maxscale --user=maxscale

This way it works as expected, as one may easily check:

[root@centos ~]# initctl status maxscale
maxscale start/running, process 6600
[root@centos ~]# maxadmin
MaxScale> show servers
Server 0x19e47a0 (server1)
        Server:                              127.0.0.1
        Status:                              Master, Running
        Protocol:                            MySQLBackend
        Port:                                3306
        Server Version:                      5.7.15-9-log
        Node Id:                             1
        Master Id:                           -1
        Slave Ids:
        Repl Depth:                          0
        Number of connections:               0
        Current no. of conns:                0
        Current no. of operations:           0
MaxScale> quit
[root@centos ~]# kill -9 6600
[root@centos ~]# ps aux | grep maxscale
maxscale 6627 2.0 0.2 276168 3884 ?        Ssl 20:41   0:00 /usr/bin/maxscale --user=maxscale
root      6633 0.0 0.0 103312   872 pts/0    S+   20:41   0:00 grep maxscale
[root@centos ~]# initctl status maxscale
maxscale start/running, process 6627

In the /var/log/messages we clearly see that the process is respawned by init:

...
Nov 26 20:38:15 centos maxscale[6600]: Started MaxScale log flusher.
Nov 26 20:40:33 centos maxscale[6600]: Loaded module MaxAdminAuth: V2.0.0 from /usr/lib64/maxscale/libMaxAdminAuth.so
Nov 26 20:41:52 centos init: maxscale main process (6600) killed by KILL signal
Nov 26 20:41:52 centos init: maxscale main process ended, respawning
Nov 26 20:41:52 centos maxscale[6627]: Working directory: /var/log/maxscale
Nov 26 20:42:07 centos maxscale[6627]: Loaded module MaxAdminAuth: V2.0.0 from /usr/lib64/maxscale/libMaxAdminAuth.so
...

After few more checks I asked to implement this officially in packages for the Linux distributions that use Upstart, see MXS-1027.

To summarize, I wish I care more to find out how Upstart works long time ago. Now it's time to study systemd probably :) Anyway, after some reading and testing one can use it efficiently to provide automated service starts and restarts for MySQL server and services used with it.