Saturday, April 15, 2017

How To Find What Thread Had Executed FTWRL

This week one of MariaDB Support customers asked how to find what thread had executed FLUSH TABLES WITH READ LOCK (FTWRL) and thus blocked all changes to data. I decided to list several ways to find this out (as eventually customer wanted to know how to find this out not only in MariaDB 10.1, but also in MySQL 5.6 etc).

Let me start with a quick summary. I know the following ways (other that having all queries logged in general query log, slow log, by some audit plugin or at client side, and checking the log) to find the thread that executed FLUSH TABLES WITH READ LOCK successfully:
  1. In MariaDB starting from 10.0.7 you can use METADATA_LOCK_INFO plugin.
  2. In MySQL starting from 5.7 you can use performance_schema.metadata_locks table.
  3. In MySQL starting from 5.6 (or MariaDB 10.x.y) you can use performance_schema.events_statements_history table.
  4. In all versions of MySQL or MariaDB you can attach gdb and check threads one by one (and I explain one of the ways to do this in the post).
Now, time to present the details. Unfortunately, when FTWRL is executed successfully, it is not visible neither in SHOW PROCESSLIST nor in related INFORMATION_SCHEMA table:
MariaDB [test]> select version();
+-----------------+
| version()       |
+-----------------+
| 10.1.18-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [test]> flush tables with read lock;
Query OK, 0 rows affected (0.06 sec)

MariaDB [test]> show full processlist\G
*************************** 1. row ***************************
      Id: 10
    User: root
    Host: localhost
      db: test
 Command: Query
    Time: 0
   State: init
    Info: show full processlist
Progress: 0.000
*************************** 2. row ***************************
      Id: 11
    User: root
    Host: localhost
      db: test
 Command: Query
    Time: 743
   State: Waiting for global read lock
    Info: delete from t0
Progress: 0.000
2 rows in set (0.00 sec)

MariaDB [test]> select * from information_schema.processlist\G*************************** 1. row ***************************
           ID: 11
         USER: root
         HOST: localhost
           DB: test
      COMMAND: Query
         TIME: 954
        STATE: Waiting for global read lock
         INFO: delete from t0
      TIME_MS: 954627.587
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 67464
EXAMINED_ROWS: 0
     QUERY_ID: 1457
  INFO_BINARY: delete from t0
          TID: 8838
*************************** 2. row ***************************
           ID: 10
         USER: root
         HOST: localhost
           DB: test
      COMMAND: Query
         TIME: 0
        STATE: Filling schema table
         INFO: select * from information_schema.processlist
      TIME_MS: 0.805
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 84576
EXAMINED_ROWS: 0
     QUERY_ID: 1461
  INFO_BINARY: select * from information_schema.processlist
          TID: 8424
2 rows in set (0.02 sec)
There is usually nothing useful in the INNODB STATUS also:
MariaDB [test]> show engine innodb status\G
...
------------
TRANSACTIONS
------------
Trx id counter 20439
Purge done for trx's n:o < 20422 undo n:o < 0 state: running but idle
History list length 176
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 11, OS thread handle 0x7f7f5cdb8b00, query id 1457 localhost root Waiting for global read lock
delete from t0
---TRANSACTION 0, not started
MySQL thread id 10, OS thread handle 0x7f7f5ce02b00, query id 1462 localhost root init
show engine innodb status
--------
...
No wonder, global read lock that FTWRL sets is not an InnoDB lock.

In this case we have just 2 user threads connected, so it's easy to find out what thread could set the global read lock we are waiting for. With dozens or hundreds of threads connected one has to guess which one it can be (often those that sleep for a longest time are blamed), unless he knows for sure how to find the exact one. So, let me list some of the ways to find this out.

I'd like to start with MariaDB 10.x.y. It provides a special METADATA_LOCK_INFO plugin. Surely you can not install it when FTWRL is already executed (as plugin ends up recorded in mysql.plugins table when installed):
MariaDB [test]> INSTALL SONAME 'metadata_lock_info';
ERROR 1223 (HY000): Can't execute the query because you have a conflicting read lock
MariaDB [test]> unlock tables;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSTALL SONAME 'metadata_lock_info';
Query OK, 0 rows affected (0.37 sec)

MariaDB [test]> select * from mysql.plugin;
+--------------------+-----------------------+
| name               | dl                    |
+--------------------+-----------------------+
| BLACKHOLE          | ha_blackhole.so       |
| METADATA_LOCK_INFO | metadata_lock_info.so |
+--------------------+-----------------------+
2 rows in set (0.00 sec)
But when plugin is installed, we can get the information we need from the information_schema.metadata_lock_info table:
MariaDB [test]> flush tables with read lock;
Query OK, 0 rows affected (0.01 sec)

MariaDB [test]> select * from information_schema.metadata_lock_info;
+-----------+------------+---------------+------------------+--------------+------------+
| THREAD_ID | LOCK_MODE  | LOCK_DURATION | LOCK_TYPE        | TABLE_SCHEMA | TABLE_NAME |
+-----------+------------+---------------+------------------+--------------+------------+
|        10 | MDL_SHARED | NULL          | Global read lock |              |            |
|        10 | MDL_SHARED | NULL          | Commit lock      |              |            |
+-----------+------------+---------------+------------------+--------------+------------+
2 rows in set (0.00 sec)

MariaDB [test]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|              10 |
+-----------------+
1 row in set (0.01 sec)

MariaDB [test]> show processlist;
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State                        | Info             | Progress |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| 10 | root | localhost | test | Query   |    0 | init                         | show processlist |    0.000 |
| 12 | root | localhost | test | Query   |   47 | Waiting for global read lock | delete from t0   |    0.000 |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
2 rows in set (0.00 sec)
We are looking for a row where lock_type is "Global read lock":
MariaDB [test]> select thread_id from information_schema.metadata_lock_info where lock_type = 'Global read lock';
+-----------+
| thread_id |
+-----------+
|        10 |
+-----------+
1 row in set (0.00 sec)
Now, let me proceed to MySQL 5.7. It had introduced the performance_schema.metadata_locks table, and we can use it as follows (I've used Percona Server 5.7.17 for this example, but it's not any different from upstream MySQL 5.7 in this regard):
openxs@ao756:~/dbs/maria10.1$ mysql -uroot -proot test
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.17-13-log Percona Server (GPL), Release '13', Revision 'fd33d43'

Copyright (c) 2009-2016 Percona LLC and/or its affiliates
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select @@performance_schema;
+----------------------+
| @@performance_schema |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)

mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'global_instrumentation';
Query OK, 0 rows affected (0.18 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> flush tables with read lock;
Query OK, 0 rows affected (0.06 sec)

mysql> select * from performance_schema.metadata_locks;
+-------------+--------------------+----------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | OBJECT_INSTANCE_BEGIN | LOCK_TYPE   | LOCK_DURATION | LOCK_STATUS | SOURCE            | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| GLOBAL      | NULL               | NULL           |       139917187957472 | SHARED      | EXPLICIT      | GRANTED     | lock.cc:1128      |              39 |             10 || COMMIT      | NULL               | NULL           |       139917300586224 | SHARED      | EXPLICIT      | GRANTED     | lock.cc:1212      |              39 |             10 |
| TABLE       | performance_schema | metadata_locks |       139917192091536 | SHARED_READ | TRANSACTION   | GRANTED     | sql_parse.cc:6406 |              39 |             11 |
+-------------+--------------------+----------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
3 rows in set (0.01 sec)

mysql> show processlist;
+----+----------+--------------------+------+---------+------+----------+------------------+-----------+---------------+
| Id | User     | Host               | db   | Command | Time | State    | Info             | Rows_sent | Rows_examined |
+----+----------+--------------------+------+---------+------+----------+------------------+-----------+---------------+
| 11 | maxscale | 192.168.1.37:55399 | NULL | Sleep   |    4 |          | NULL             |         0 |             0 |
| 12 | root     | localhost          | test | Query   |    0 | starting | show processlist |         0 |             0 |
+----+----------+--------------------+------+---------+------+----------+------------------+-----------+---------------+
2 rows in set (0.02 sec)
In this case we are looking for the value "GLOBAL" in OBJECT_TYPE column and "SHARED" in LOCK_TYPE column. In this row we see OWNER_TREAD_ID, but the value, 39, is not present in the SHOW PROCESSLIST output. We need to search for the proper thread id in the performance_schema.threads table, like this:
mysql> select t.processlist_id from performance_schema.threads t join performance_schema.metadata_locks ml on ml.owner_thread_id = t.thread_id where ml.object_type='GLOBAL' and ml.lock_type='SHARED';
+----------------+
| processlist_id |
+----------------+
|             12 |
+----------------+
1 row in set (0.03 sec)
Now, what if we run older MySQL 5.6.x that does not have metadata_locks table? If we make sure that the history of statements executed by each thread is maintained in performance_schema, we can easily search the history for the FTWRL statement in the performance_schema.events_statements_history table! Steps are also simple:
mysql> update performance_schema.setup_consumers set enabled = 'YES' where NAME = 'events_statements_history';
ERROR 1223 (HY000): Can't execute the query because you have a conflicting read lock
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)

mysql> update performance_schema.setup_consumers set enabled = 'YES' where NAME = 'events_statements_history';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from performance_schema.events_statements_history where sql_text like 'flush tables%'\G
*************************** 1. row ***************************
              THREAD_ID: 39
               EVENT_ID: 21
           END_EVENT_ID: 21
             EVENT_NAME: statement/sql/flush
                 SOURCE: socket_connection.cc:95
            TIMER_START: 94449505549959000
              TIMER_END: 94449505807116000
             TIMER_WAIT: 257157000
              LOCK_TIME: 0
               SQL_TEXT: flush tables with read lock
                 DIGEST: 03682cc3e0eaed3d95d665c976628d02
            DIGEST_TEXT: FLUSH TABLES WITH READ LOCK
...
    NESTING_EVENT_LEVEL: 0
1 row in set (0.00 sec)
 Again, if we need thread id for the processlist, we have to join with performance_schema.threads:
mysql> select t.processlist_id from performance_schema.threads t join performance_schema.events_statements_history h on h.thread_id = t.thread_id where h.digest_text like 'FLUSH TABLES%';
+----------------+
| processlist_id |
+----------------+
|             12 |
+----------------+
1 row in set (0.01 sec)
Note that using digest_text while searching in history is a bit more reliable, as the text is already normalized, so you do not need to care about extra spaces etc. On the other hand, it requires statements_digest consumer to be enabled:

mysql> select * from performance_schema.setup_consumers where enabled = 'YES';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_statements_current | YES     |
| events_statements_history | YES     |
| global_instrumentation    | YES     |
| thread_instrumentation    | YES     |
| statements_digest         | YES     |
+---------------------------+---------+
5 rows in set (0.00 sec)
Now, what if you do not have performance_schema enabled (or compiled in), use older MySQL version or it's too late to run any updates or install any plugins? As usual, gdb to the rescue!

We already know that global shared metadata lock is set, so one can try to apply the approach I've already written about in this post. It looks a bit complicated though, so I decided to try something easier and just traverse the linked list of all MySQL threads in gdb in a way similar to what Shane Bester did years ago while looking for SQL statements executed by thread in the core file.

So, in MariaDB 10.1 with performance_schema disabled I, again, executed the following (surely, delete was executed in another session after FTWRL):
openxs@ao756:~/dbs/maria10.1$ bin/mysql -uroot --socket=/tmp/mariadb.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 13
Server version: 10.1.18-MariaDB MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> select @@performance_schema;
+----------------------+
| @@performance_schema |
+----------------------+
|                    0 |
+----------------------+
1 row in set (0.00 sec)

MariaDB [test]> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> show processlist;
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| Id | User | Host      | db   | Command | Time | State                        | Info             | Progress |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
| 13 | root | localhost | test | Query   |    0 | init                         | show processlist |    0.000 |
| 14 | root | localhost | test | Query   |    9 | Waiting for global read lock | delete from t0   |    0.000 |
+----+------+-----------+------+---------+------+------------------------------+------------------+----------+
2 rows in set (0.00 sec)
and then found the PID for MariaDB server and attached gdb to the process:
openxs@ao756:~$ ps aux | grep 'mysqld ' | grep maria
openxs    7444  0.1  5.7 748028 220744 ?       Sl   кві14   1:47 /home/openxs/dbs/maria10.1/bin/mysqld --no-defaults --basedir=/home/openxs/dbs/maria10.1 --datadir=/home/openxs/dbs/maria10.1/data --plugin-dir=/home/openxs/dbs/maria10.1/lib/plugin --log-error=/home/openxs/dbs/maria10.1/data/ao756.err --pid-file=ao756.pid --socket=/tmp/mariadb.sock --port=3307
openxs@ao756:~$ sudo gdb -p 7444
...
Loaded symbols for /home/openxs/dbs/maria10.1/lib/plugin/metadata_lock_info.so
0x00007f7f5a2f084d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) p threads
$1 = {<base_ilist> = {first = 0x7f7f3fbf3008, last = {
      _vptr.ilink = 0x7f7f5cea3510 <vtable for ilink+16>,
      prev = 0x7f7f3ffa1018, next = 0x0}}, <No data fields>}
As you can see, we have a global variable threads. threads->first is a pointer to (the structure that describes) the first thread (of THD * type) in the double linked list of all MySQL threads, so we can do the following:
(gdb) set $thd=(THD *)(threads->first)
(gdb) p $thd
$3 = (THD *) 0x7f7f3fbf3008
(gdb) p $thd->thread_id$4 = 14
(gdb) p $thd->global_read_lock
$5 = {m_state = Global_read_lock::GRL_NONE, m_mdl_global_shared_lock = 0x0,
  m_mdl_blocks_commits_lock = 0x0}
I had not checked any version before MySQL 5.6 yet, but I assume that all of them have list of threads, each thread's structure has a thread_id item and I suppose to see global_read_lock item as well, if not as a structure with related MDL information, then with something else (to be studied later).

In the case above, I clearly see that thread with 14 in the processlist had not set global read lock. Now, let me proceed to the next thread:
(gdb) set $thd=(THD *)($thd->next)
(gdb) p $thd
$6 = (THD *) 0x7f7f3ffa1008
(gdb) p $thd->thread_id
$7 = 13
(gdb) p $thd->global_read_lock
$8 = {m_state = Global_read_lock::GRL_ACQUIRED_AND_BLOCKS_COMMIT,
  m_mdl_global_shared_lock = 0x7f7f2e90e080,
  m_mdl_blocks_commits_lock = 0x7f7f2e90e180}
So, it is thread with id 13 that had executed FTWRL and had not yet released the global read lock. I was lucky to find it fast, but I can repeat steps above to move on to the next list item until the next is 0x0. Shane had written a similar code as a Python macro for gdb is his post I mentioned. One day I may grow up and write something similar for this case.

Other approach was invented and described by in his blog post on the same topic of getting SQL statements from core files. One can just create a big enough input file for gdb that will switch to process threads one by one and try to get information from MySQL thread structure there, assuming it was created (and ignoring errors), like this:
(gdb) thread 1[Switching to thread 1 (Thread 0x7f7f5ce6d7c0 (LWP 7444))]
#0  0x00007f7f5a2f084d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) p do_command::thd->thread_id
No frame is currently executing in block do_command(THD*).
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7f5ce02b00 (LWP 9232))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
238     ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) p do_command::thd->thread_id
$9 = 14
(gdb) p do_command::thd->global_read_lock
$10 = {m_state = Global_read_lock::GRL_NONE, m_mdl_global_shared_lock = 0x0,
  m_mdl_blocks_commits_lock = 0x0}
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7f5cd6eb00 (LWP 9230))]
#0  0x00007f7f5a2f084d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) p do_command::thd->thread_id
$11 = 13
(gdb) p do_command::thd->global_read_lock
$12 = {m_state = Global_read_lock::GRL_ACQUIRED_AND_BLOCKS_COMMIT,
  m_mdl_global_shared_lock = 0x7f7f2e90e080,
  m_mdl_blocks_commits_lock = 0x7f7f2e90e180}

...
And so on, for 1000 threads or more, if we expect that many. With some smart use of shell and tools like grep and awk on the resulting output we can probably eventually get just what we need - thread id of the thread that did FTWRL and is still keeping global read lock.

One day I may work on this kind of scripts, but for now I just wanted to show how gdb can help you to get this information.

1 comment: