Tuesday, January 15, 2013

How to use PERFORMANCE_SCHEMA to check InnoDB mutex waits

Summary

To set up InnoDB mutex waits monitoring via PERFORMANCE_SCHEMA in general with MySQL 5.6.9 you should do at least the following:
  1. Start MySQL server with all mutex related instruments enabled at startup (performance_schema=ON by default on recent 5.6.x), like this:

    mysqld_safe --performance_schema_instrument='wait/synch/mutex/innodb/%=on' &
  2. Connect to server and set up proper consumers explicitly, like this:

    UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name like 'events_waits%';
  3. Run your problematic load.
  4. Check waits using whatever tables you need, like this:

    mysql>  select event_name, count_star, sum_timer_wait from performance_schema.events_waits_summary_global_by_event_name where event_name like 'wait/synch/mutex/innodb%' and count_star > 0 order by sum_timer_wait desc limit 5;
    +--------------------------------------------+------------+----------------+
    | event_name                                 | count_star | sum_timer_wait |
    +--------------------------------------------+------------+----------------+
    | wait/synch/mutex/innodb/os_mutex           |      50236 |     2018647953 |
    | wait/synch/mutex/innodb/rw_lock_list_mutex |       8299 |      316652455 |
    | wait/synch/mutex/innodb/mutex_list_mutex   |       8378 |      300689235 |
    | wait/synch/mutex/innodb/fil_system_mutex   |       1173 |       56603575 |
    | wait/synch/mutex/innodb/buf_pool_mutex     |        818 |       34652947 |
    +--------------------------------------------+------------+----------------+
    5 rows in set (0.00 sec)
Sounds easy and should be obvious from the manual (but it was not, at least for me). Now, continue reading if you want to know why and how one can find these out...

Whole story


Let's assume you study some InnoDB performance or scalability problem (like Bug #68079) and would like to know what exact mutex waits happened in the process and how much time was spent on them. Surely you can use INNODB STATUS to see some really bad waits (if you are unlucky) and global wait-related counters, or SHOW ENGINE INNODB MUTEX to see waits that were long enough to force OS context switch:

mysql> show engine innodb mutex;
+--------+-------------------------+-----------------+
| Type   | Name                    | Status          |
+--------+-------------------------+-----------------+
| InnoDB | log0log.cc:737          | os_waits=6      |
| InnoDB | buf0buf.cc:1225         | os_waits=3      |
| InnoDB | combined buf0buf.cc:975 | os_waits=117391 |
| InnoDB | log0log.cc:799          | os_waits=124    |
+--------+-------------------------+-----------------+
4 rows in set (0.00 sec)


The output above is enough to understand what mutex waits were long enough to cause OS waits (context switches), but it does not show how many waits happened in total or how much time InnoDB has to wait. You can try to find out is there any way to get more details and manual can give you a hint that debug binaries will show you enough details... but debug binaries may have different scalability properties and, after all, who uses debug binaries in production :)

For this reason or for other (like a lot of buzz everywhere or, again, fine manual) you eventually will figure out that now one should use PERFORMANCE_SCHEMA to study performance-related problems. It is as easy as setting performance_schema option to ON/1 one may conclude based on the manual. Let's try:

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysqld_safe --no-defaults --performance-schema=ON &
[1] 31089


and then run our problematic test case with mutex waits. Now, when it is finished finally:

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysqlslap -uroot --concurrency=9 --create-schema=test --no-drop --number-of-queries=1000 --iterations=10 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 7.230 seconds
        Minimum number of seconds to run all queries: 6.906 seconds
        Maximum number of seconds to run all queries: 7.997 seconds
        Number of clients running queries: 9
        Average number of queries per client: 111


You then try to check the waits, finally, why else you spenht so much time reading about PERFORMANCE_SCHEMA (and improvements to it in 5.6) everywhere:

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 92
Server version: 5.6.9-rc MySQL Community Server (GPL)

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>  select event_name, count_star, sum_timer_wait from performance_schema.events_waits_summary_global_by_event_name where event_name like 'wait/synch/mutex/innodb%' and count_star > 0 order by sum_timer_wait desc limit 5;
Empty set (0.00 sec)


Ups, no waits? How is this possible if SHOW ENGINE INNODB MUTEX shows increased counts? "Ah", (after reading the manual about instrumentation) one may say, we need to enable instruments and timing for them! Like this:

mysql>  UPDATE performance_schema.setup_instruments SET enabled = 'YES', timed = 'YES' WHERE name LIKE 'wait/synch/mutex/innodb/%';                             
Query OK, 0 rows affected (0.00 sec)
Rows matched: 45  Changed: 0  Warnings: 0


That's because by default not so many are enabled, none of those related to InnoDB mutex waits:

mysql> select * from performance_schema.setup_instruments where name like 'wait/synch/mutex/innodb%';
+-------------------------------------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+-------------------------------------------------------+---------+-------+
| wait/synch/mutex/innodb/commit_threads_m | NO | NO |
| wait/synch/mutex/innodb/commit_cond_mutex | NO | NO |
| wait/synch/mutex/innodb/innobase_share_mutex | NO | NO |
| wait/synch/mutex/innodb/autoinc_mutex | NO | NO |
| wait/synch/mutex/innodb/buf_pool_mutex | NO | NO |
| wait/synch/mutex/innodb/buf_pool_zip_mutex | NO | NO |
...
+-------------------------------------------------------+---------+-------+
45 rows in set (0.19 sec)


Yet another fine MySQL 5.6 manual page clearly says that we can enable them dynamically:
"Modifications to any of these tables affect monitoring immediately, with the exception of setup_actors. Modifications to setup_actors affect only foreground threads created thereafter."
So, we try our problematic test case again, check waits as explained above and... get zero rows again. After some more reading of the pages already mentioned somebody as ignorant as me starts to write  posts at Facebook, get hints from nice former colleagues like Mark Leith (who actually had written a lot about the topic and influence PERFORMANCE_SCHEMA design and implementation even more) and finally notes that consumers are NOT set (and manual explains they should be):

mysql> select * from performance_schema.setup_consumers;                        +--------------------------------+---------+
| NAME                           | ENABLED |
+--------------------------------+---------+
| events_stages_current          | NO      |
| events_stages_history          | NO      |
| events_stages_history_long     | NO      |
| events_statements_current      | YES     |
| events_statements_history      | NO      |
| events_statements_history_long | NO      |
| events_waits_current           | NO      |
| events_waits_history           | NO      |
| events_waits_history_long      | NO      |
| global_instrumentation         | YES     |
| thread_instrumentation         | YES     |
| statements_digest              | YES     |
+--------------------------------+---------+
12 rows in set (0.00 sec)


So, let's fix this and make a quick test if it works at all (before running long one again):

mysql>  UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name like 'events_waits%';
Query OK, 3 rows affected (0.00 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> drop table t1;
Query OK, 0 rows affected (1.08 sec)

mysql>  create table test.t1 (i int) engine = innodb;                           Query OK, 0 rows affected (1.48 sec)

mysql>  insert into test.t1 values (1), (2), (3);                               Query OK, 3 rows affected (0.36 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql>  select event_name, count_star, sum_timer_wait from performance_schema.events_waits_summary_global_by_event_name where event_name like 'wait/synch/mutex/innodb%' and count_star > 0 order by sum_timer_wait desc limit 5;
+----------------------------------------+------------+----------------+
| event_name                             | count_star | sum_timer_wait |
+----------------------------------------+------------+----------------+
| wait/synch/mutex/innodb/trx_mutex      |        268 |       14017381 |
| wait/synch/mutex/innodb/trx_undo_mutex |         21 |        1690647 |
+----------------------------------------+------------+----------------+
2 rows in set (0.00 sec)


Yes, it works. Let's start test again and finally get all the benefits of improved MySQL 5.6 diagnostics:

mysql> exit
Bye
[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysqlslap -uroot --concurrency=12 --create-schema=test --no-drop --number-of-queries=1000 --iterations=10 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 7.330 seconds
        Minimum number of seconds to run all queries: 6.926 seconds
        Maximum number of seconds to run all queries: 7.692 seconds
        Number of clients running queries: 12
        Average number of queries per client: 83

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 306
Server version: 5.6.9-rc MySQL Community Server (GPL)

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>  select event_name, count_star, sum_timer_wait from performance_schema.events_waits_summary_global_by_event_name where event_name like 'wait/synch/mutex/innodb%' and count_star > 0 order by sum_timer_wait desc limit 5;
+----------------------------------------+------------+----------------+
| event_name                             | count_star | sum_timer_wait |
+----------------------------------------+------------+----------------+
| wait/synch/mutex/innodb/trx_mutex      |        268 |       14017381 |
| wait/synch/mutex/innodb/trx_undo_mutex |         21 |        1690647 |
+----------------------------------------+------------+----------------+
2 rows in set (0.00 sec)


Now, where are my expected (at least buffer pool related, if somebody had forgotten) mutex waits?

Lucky I am, Mark Leith was reading my Facebook at the moment, so I've got a hint that some instruments can be enabled only at startup (try to find the list of them in the manual one day...) and, since 5.6.4, there are command line options to do that.

So, I have to restart server (imagine you are in production!) to get it working finally the way I've expected:

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysqld_safe --no-defaults --performance_schema=ON --performance_schema_instrument='wait/synch/mutex/innodb/%=on' &
[1] 32432
[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ 130115 11:11:23 mysqld_safe Logging to '/home/valerii.kravchuk/mysql-5.6.9-rc-linux-glibc2.5-x86_64/data/cisco1.office.percona.com.err'.
130115 11:11:23 mysqld_safe Starting mysqld daemon with databases from /home/valerii.kravchuk/mysql-5.6.9-rc-linux-glibc2.5-x86_64/data

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.9-rc MySQL Community Server (GPL)

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>  UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name like 'events_waits%';
Query OK, 3 rows affected (0.00 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> exit
Bye
[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysqlslap -uroot --concurrency=12 --create-schema=test --no-drop --number-of-queries=1000 --iterations=10 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 7.640 seconds
        Minimum number of seconds to run all queries: 6.874 seconds
        Maximum number of seconds to run all queries: 7.930 seconds
        Number of clients running queries: 12
        Average number of queries per client: 83

[valerii.kravchuk@cisco1 mysql-5.6.9-rc-linux-glibc2.5-x86_64]$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 123
Server version: 5.6.9-rc MySQL Community Server (GPL)

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>  select event_name, count_star, sum_timer_wait from performance_schema.events_waits_summary_global_by_event_name where event_name like 'wait/synch/mutex/innodb%' and count_star > 0 order by sum_timer_wait desc limit 5;
+--------------------------------------------+------------+----------------+
| event_name                                 | count_star | sum_timer_wait |
+--------------------------------------------+------------+----------------+
| wait/synch/mutex/innodb/os_mutex           |     244191 |    32147377962 |
| wait/synch/mutex/innodb/trx_sys_mutex      |      30180 |     4228391496 |
| wait/synch/mutex/innodb/buf_pool_mutex     |       4812 |      757141973 |
| wait/synch/mutex/innodb/mutex_list_mutex   |       8870 |      380601032 |
| wait/synch/mutex/innodb/rw_lock_list_mutex |       8304 |      363065843 |
+--------------------------------------------+------------+----------------+
5 rows in set (0.00 sec)


Then I decided to write this blog post with a quick "howto" summary on top. I hope it is useful for readers. This is the end of story for now.

Conclusions

  1. It's great to have Oracle engineers reading your Facebook posts and replying (especially in a real time). Thank you, Mark Leith!
  2. PERFORMANCE_SCHEMA in MySQL 5.6 can be used to study InnoDB mutex waits
  3. MySQL 5.,6 manual is not clear enough about steps and details about instruments involved to do the study explained above.
I've made some other conclusions as well, but will share them later and maybe elsewhere.

6 comments:

  1. But I wonder, what kind of conclusion
    you can make from
    wait/synch/mutex/innodb/os_mutex
    mutes is the most problematic?

    ReplyDelete
  2. No conclusion at the moment :) Was too busy getting at least something from P_S.

    ReplyDelete
  3. Thx for showing getting started with P_S is a little complicated. Beside showing already some quirks you missed performance_schema is ON by default (changed from 5.5 to 5.6). So you don't have to set it to enable P_S :)
    They wan't us to read a lot of docs ;)

    thx
    erkules

    ReplyDelete
  4. Thank you, Erkules, for this correction. I just remember that we have to set it ON explicitly from MySQL 5.5 where I tried to use it last time. Even more surprise then that some (important) instruments still have to be enabled upon startup.

    ReplyDelete
  5. Vadim, os_mutex is surely something that scan be ignored and made less important with tuning. In any case, for the purpose and topic of this blog post details of benchmark or mistakes in investigation do not matter.

    In any case, http://bugs.mysql.com/bug.php?id=68079 is now verified and, based on comments, accepted by Oracle as a valid and important use case to study.

    ReplyDelete
  6. Here is my bug report (already verified) about missing details on when and how important instruments can be enabled: http://bugs.mysql.com/bug.php?id=68097

    ReplyDelete