Monday, February 29, 2016

Exploring Metadata Locks with gdb - Studying Simple Case on Percona Server 5.7

I had forgotten the topic of studying metadata locks with gdb for quite a some time. Now that I've upgraded to Percona Server 5.7 and enjoy the remaining of my last free day as an independent MySQL Support Engineer doing nothing but some blogging, I think it's time to get back to gdb interactive sessions and try to answer the question I've got from customer back in Percona but had no chance to answer.

In a bit rewritten way, it sounded as follows:
"Do ALTER TABLE ... STATS_AUTO_RECALC=1" and the reverse ALTER
TABLE...STATS_AUTO_RECALC=default block select, update, delete, and insert on the table?
"
There are two ways to answer this question. First, we can just try to set up a simple test with 3 sessions and simple table, like this:
mysql> create table ti(id int primary key, c1 int, key(c1)) engine=InnoDB stats_auto_recalc=default;Query OK, 0 rows affected (0.16 sec)

mysql> insert into ti values (1,1), (2,2);
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0
Now, in one session we can SELECT from the table in transaction that is not completed:
mysql> start transaction;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from ti;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
and then in another session run the ALTER TABLE in question:
mysql> alter table ti stats_auto_recalc=1;
to find out in the first session that it actually hangs waiting on some metadata lock:
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                               | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
|  3 | root | localhost | test | Query   |   15 | Waiting for table metadata lock | alter table ti stats_auto_recalc=1 |         0 |             0 ||  4 | root | localhost | test | Query   |    0 | starting                        | show processlist                   |         0 |             0 |
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
2 rows in set (0.00 sec)
So, one of metadata locks set by ALTER TABLE even in this very simple case (when we try to modify the way of statistics recalculation for InnoDB table) even in 5.7 is too strict, so that it conflicts with one of metadata locks set by simple SELECT for the duration of the entire transaction. What's more interesting, now in third session we can try to run whatever statements we want to check (those that were "compatible" with SELECT at metadata locks level) and see them also blocked, like this:
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                               | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
|  3 | root | localhost | test | Query   |   52 | Waiting for table metadata lock | alter table ti stats_auto_recalc=1 |         0 |             0 |
|  4 | root | localhost | test | Query   |    0 | starting                        | show processlist                   |         0 |             0 |
|  5 | root | localhost | test | Query   |    4 | Waiting for table metadata lock | select * from ti                   |         0 |             0 |+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
3 rows in set (0.00 sec)
This way we can already conclude that ALTER TABLE of this kind does block even SELECT from the table! Good that it runs fast (there is no need to make a table copy), so will not block for a long time, but if ALTER itself is blocked by some transaction (like in my case) - we are still in trouble!

That simple test would give the answer to customer that he could be satisfied with. But I know that ALTER TABLE sets many metadata locks at different stages and while for this trivial one it may not matter much, I still wanted to find some time to attach gdb and see what locks are requested and when/in what order.

I had not done this with 5.7, so tried the same approach as in the initial post in this series (please, check it carefully for the details from the source code, like types of the locks etc), setting breakpoint on MDL_request::init, but it seems this method does NOT exist in MySQL 5.7. So, I ended up with the following breakpoints set:
(gdb) b MDL_context::acquire_lockBreakpoint 1 at 0xbf9083: file /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc, line 3506.
(gdb) b lock_rec_lock
Breakpoint 2 at 0xf24ffd: file /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/storage/innobase/lock/lock0lock.cc, line 2343.
(gdb) c
Continuing.
With these breakpoints in place, I've executed alter table ti stats_auto_recalc=1; and ended up with the following in gdb:
Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1b2df95b0, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);
(gdb) p mdl_request
$1 = (MDL_request *) 0x7fc1b2df95b0
(gdb) p *mdl_request
$2 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_STATEMENT,
  next_in_list = 0x7fc1e27f7f70, prev_in_list = 0x7fc1b2df9778, ticket = 0x0,
  key = {m_length = 3, m_db_name_length = 0,
    m_ptr = "\000\000\000\000L\274q\324\301...
We see something not that much useful immediately, some metadata lock request at the weakest possible level for the duration of the statement. Just to see how we can see all of requests in the same "batch", let's do the following in gdb:
(gdb) p *(mdl_request->next_in_list)
$5 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_TRANSACTION,
  next_in_list = 0x7fc1e27f7d70, prev_in_list = 0x7fc1b2df95b8, ticket = 0x0,
  key = {m_length = 7, m_db_name_length = 4,
    m_ptr = "\002test", '\000' <repeats 175 times>"\377, \377\377\037\000\000\000\000\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 31 times>, "\a", '\000' <repeats 15 times>, "\004\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\001\001\000\001", '\000' <repeats 44 times>, "\001", '\000' <repeats 69 times>, static m_namespace_to_wait_state_name = {{m_key = 111,
...
So, the next one is also the weakest, but is set for the duration of transaction at the database (named "test") level. We can see the next after it as well:
(gdb) p *(mdl_request->next_in_list->next_in_list)
$6 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION,
  next_in_list = 0x0, prev_in_list = 0x7fc1e27f7f78, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti", '\000' <repeats 378 times>,
    static m_namespace_to_wait_state_name = {{m_key = 111,
...

(gdb) c
Continuing.
Here we can see a stronger metadata lock request for the duration of transaction on the test.ti table. The next_in_list pointer is NULL, so we are done with this "batch" of requests. If we continue, we'll see the same locks requested one by one while hitting breakpoint:
Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1e27f7f70, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);

(gdb) p *mdl_request
$7 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_TRANSACTION,
  next_in_list = 0x7fc1e27f7d70, prev_in_list = 0x7fc1b2df95b8, ticket = 0x0,
  key = {m_length = 7, m_db_name_length = 4,
    m_ptr = "\002test", '\000' <repeats 175 times>"\377, \377\377\037\000\000\000\000\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 31 times>, "\a",
...

(gdb) p *(mdl_request->next_in_list)
$8 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION,
  next_in_list = 0x0, prev_in_list = 0x7fc1e27f7f78, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti", '\000' <repeats 378 times>,
    static m_namespace_to_wait_state_name = {{m_key = 111,
...

(gdb) c
Continuing.

Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1e27f7d70, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);

(gdb) p *mdl_request
$9 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION,
  next_in_list = 0x0, prev_in_list = 0x7fc1e27f7f78, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti", '\000' <repeats 378 times>,
    static m_namespace_to_wait_state_name = {{m_key = 111,
...

(gdb) c
Continuing.
So, we have two ways to study the locks and new one in this post is to check the list of the lock requests in a single mdl_request list of pointers.

Eventually we also get the following breakpoint hit:
(gdb) c
Continuing.

Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1b2df98a0, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);
(gdb) p *mdl_request
$15 = {type = MDL_EXCLUSIVE, duration = MDL_TRANSACTION,
  next_in_list = 0xd044f7, prev_in_list = 0x7fc1b2dfa914, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti\000_3\000\001\000\000\000\003\000\000\000<\240V\001\
...
Now, this is a real problem for any concurrent activity with the table, it's the strongest MDL_EXCLUSIVE lock requested on test.ti table that is hold (when obtained) till the end of ALTER statement...

To summarize this post:
  1. We see that we still can use gdb to study metadata locks in MySQL (or Percona Server) 5.7.x, but the breakpoint should be set on MDL_context::acquire_lock (at least until I'll find time to study changes in the source code for any alternatives).
  2. When breakpoint is hit you can study the content of structure pointed by the mdl_request parameter to find out what kind of lock is requested.
  3. At table level ALTER TABLE ... STATS_AUTO_RECALC=1 in MySQL 5.7 starts with weak enough MDL_SHARED_UPGRADABLE lock, but then eventually we end up with the request for the strongest MDL_EXCLUSIVE lock that is either blocked by any concurrent access to the table until transaction there is completed, and/or blocks everything (including SELECT from the table) until ALTER finally completes.
Next time in this series I plan to check what happens with metadata locks for partitioned tables and how much maintenance of individual partitions may affect other partitions. Other idea would be to check what metadata locks are set when statistics for the InnoDB table is recalculated automatically when 10% of data changes. Stay tuned!

No comments:

Post a Comment