Sunday, October 11, 2020

Stored Procedures Instrumentation in MariaDB 10.5

MariaDB 10.5 added a lot of instrumentation around stored procedures, functions and events along the lines of MySQL WL#5766. In this blog post I'll try to check how it works and provide some details that are still missing in the MariaDB Knowledge Base.

So, in frames of porting most of Performance Schema from MySQL 5.7 to MariaDB 10.5 four new types of Performance Schema objects were added:

MariaDB [performance_schema]> select distinct(object_type) from setup_objects; 
+-------------+
| object_type |
+-------------+
| EVENT       |
| FUNCTION    |
| PROCEDURE   |
| TABLE       |
| TRIGGER     |
+-------------+
5 rows in set (0,001 sec)

MariaDB [performance_schema]> select * from setup_objects where object_type != 'TABLE' and object_schema not in ('mysql', 'performance_schema', 'information_schema');
+-------------+---------------+-------------+---------+-------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | ENABLED | TIMED |
+-------------+---------------+-------------+---------+-------+
| EVENT       | %             | %           | YES     | YES   |
| FUNCTION    | %             | %           | YES     | YES   |
| PROCEDURE   | %             | %           | YES     | YES   |
| TRIGGER     | %             | %           | YES     | YES   |
+-------------+---------------+-------------+---------+-------+
4 rows in set (0,001 sec)

MariaDB [performance_schema]> select version();
+----------------+
| version()      |
+----------------+
| 10.5.7-MariaDB |
+----------------+
1 row in set (0,000 sec)

As you can see from the above, these objects are enabled and timed by default in all databases besides system ones. Additional 20 instruments were also added, enabled and timed by default:

MariaDB [performance_schema]> select * from setup_instruments where name like 'statement/sp/%' or name like 'statement/scheduler%';
+---------------------------------+---------+-------+
| NAME                            | ENABLED | TIMED |
+---------------------------------+---------+-------+
| statement/sp/stmt               | YES     | YES   |
| statement/sp/set                | YES     | YES   |
| statement/sp/set_trigger_field  | YES     | YES   |
| statement/sp/jump               | YES     | YES   |
| statement/sp/jump_if_not        | YES     | YES   |
| statement/sp/freturn            | YES     | YES   |
| statement/sp/preturn            | YES     | YES   |
| statement/sp/hpush_jump         | YES     | YES   |
| statement/sp/hpop               | YES     | YES   |
| statement/sp/hreturn            | YES     | YES   |
| statement/sp/cpush              | YES     | YES   |
| statement/sp/cpop               | YES     | YES   |
| statement/sp/copen              | YES     | YES   |
| statement/sp/cclose             | YES     | YES   |
| statement/sp/cfetch             | YES     | YES   |
| statement/sp/agg_cfetch         | YES     | YES   |
| statement/sp/cursor_copy_struct | YES     | YES   |
| statement/sp/error              | YES     | YES   |
| statement/sp/set_case_expr      | YES     | YES   |
| statement/scheduler/event       | YES     | YES   |
+---------------------------------+---------+-------+
20 rows in set (0,002 sec)

In the above I do not see any direct match to most of statements used in stored procedures. You may be wondering what statement/sp/jump_if_not is, for example. These instruments are representing instructions of the low level sp_instr language used to implement the semantics of flow control statements and exception handlers. See more details about them in the MySQL Source Code documentation. As I use a non-debug build, attempt to see these instructions from the procedure using SHOW PROCEDURE CODE statement surely failed:

MariaDB [performance_schema]> show procedure code sbtest.p_sbtest1;
ERROR 1289 (HY000): The 'SHOW PROCEDURE|FUNCTION CODE' feature is disabled; you need MariaDB built with '--with-debug' to have it working

The statement/sp/stmt instrument represents usual DDL or DML statement that is executed "as is".

To check how these new instrument work I deci8ded to add a trigger calling a primitive stored procedure to the table created by sysbench:

MariaDB [sbtest]> show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT 0,
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1
1 row in set (0,001 sec)

MariaDB [sbtest]> set sql_mode='ORACLE';
Query OK, 0 rows affected (0,000 sec)

MariaDB [sbtest]> delimiter //
MariaDB [sbtest]> create procedure p_sbtest1(id int) as begin select k into @k from sbtest1 t where t.id = id; set @i := 0; for i in 1..round(@k/1000)+1 loop set @i := @i + 1; end loop; end;//
Query OK, 0 rows affected (0,058 sec)

MariaDB [sbtest]> create trigger tr1 after update on sbtest1 for each row call p_sbtest1(new.id);//
Query OK, 0 rows affected (0,086 sec)

MariaDB [sbtest]> delimiter ;

This is a primitive procedure that runs some loop to add delay. For fun I've used ORACLE mode and for loop inherited from PL/SQL that is really nice.

With all these in place I ran the following simple sysbench oltp_update_index.lua test for 50 seconds:

openxs@ao756:~/dbs/maria10.5$ sysbench --table-size=1000000 --threads=4 --time=50 --mysql-socket=/tmp/mariadb105.sock --mysql-user=openxs --mysql-db=sbtest --report-interval=5 /usr/share/sysbench/oltp_update_index.lua run
sysbench 1.1.0-faaff4f (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Report intermediate results every 5 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 5s ] thds: 4 tps: 130.61 qps: 130.61 (r/w/o: 0.00/130.61/0.00) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00
...

and while it was running checked current stored procedure-related statements multiple times with the following SELECT:

MariaDB [performance_schema]> select * from events_statements_current where event_name like 'statement/sp/%'\G
Empty set (0,001 sec)

In all cases I've got zero rows. That's because related consumer is not enabled by default:

MariaDB [performance_schema]> select * from setup_consumers;
+----------------------------------+---------+
| NAME                             | ENABLED |
+----------------------------------+---------+
| events_stages_current            | NO      |
| events_stages_history            | NO      |
| events_stages_history_long       | NO      |
| events_statements_current        | NO      |
| events_statements_history        | NO      |
| events_statements_history_long   | NO      |
| events_transactions_current      | NO      |
| events_transactions_history      | NO      |
| events_transactions_history_long | NO      |
| events_waits_current             | NO      |
| events_waits_history             | NO      |
| events_waits_history_long        | NO      |
| global_instrumentation           | YES     |
| thread_instrumentation           | YES     |
| statements_digest                | YES     |
+----------------------------------+---------+
15 rows in set (0,001 sec)

I've enables all of them with  update setup_consumers set enabled = 'YES'; and retried the test:

MariaDB [performance_schema]> select * from events_statements_current where event_name like 'statement/sp/%'\G
...
*************************** 2. row ***************************
              THREAD_ID: 54
               EVENT_ID: 1375298
           END_EVENT_ID: NULL
             EVENT_NAME: statement/sp/set
                 SOURCE:
            TIMER_START: 1751798753952000
              TIMER_END: 1751800198407000
             TIMER_WAIT: 1444455000
              LOCK_TIME: 0
               SQL_TEXT: NULL
                 DIGEST: NULL
            DIGEST_TEXT: NULL
         CURRENT_SCHEMA: sbtest
            OBJECT_TYPE: PROCEDURE
          OBJECT_SCHEMA: sbtest
            OBJECT_NAME: p_sbtest1
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
...
      NESTING_EVENT_ID: 1373590
     NESTING_EVENT_TYPE: STATEMENT
    NESTING_EVENT_LEVEL: 2
*************************** 3. row ***************************
              THREAD_ID: 56
               EVENT_ID: 1335724
           END_EVENT_ID: NULL
             EVENT_NAME: statement/sp/stmt
                 SOURCE:
            TIMER_START: 1751791595187000
              TIMER_END: 1751800213219000
             TIMER_WAIT: 8618032000
              LOCK_TIME: 0
               SQL_TEXT: call p_sbtest1(new.id)
                 DIGEST: NULL
            DIGEST_TEXT: NULL
         CURRENT_SCHEMA: sbtest
            OBJECT_TYPE: TRIGGER
          OBJECT_SCHEMA: sbtest
            OBJECT_NAME: tr1
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
...
      NESTING_EVENT_ID: 1335719
     NESTING_EVENT_TYPE: STATEMENT
    NESTING_EVENT_LEVEL: 1

*************************** 4. row ***************************
              THREAD_ID: 56
               EVENT_ID: 1337567
           END_EVENT_ID: NULL
             EVENT_NAME: statement/sp/stmt
                 SOURCE:
            TIMER_START: 1751799809228000
              TIMER_END: 1751800221029000
             TIMER_WAIT: 411801000
              LOCK_TIME: 0
               SQL_TEXT: SET @i := @i + 1
                 DIGEST: NULL
            DIGEST_TEXT: NULL
         CURRENT_SCHEMA: sbtest
            OBJECT_TYPE: PROCEDURE
          OBJECT_SCHEMA: sbtest
            OBJECT_NAME: p_sbtest1
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
      RETURNED_SQLSTATE: NULL
           MESSAGE_TEXT: NULL
                 ERRORS: 0
...
      NESTING_EVENT_ID: 1335724
     NESTING_EVENT_TYPE: STATEMENT
    NESTING_EVENT_LEVEL: 2
...
*************************** 6. row ***************************
              THREAD_ID: 57
               EVENT_ID: 1342865
           END_EVENT_ID: NULL
             EVENT_NAME: statement/sp/jump_if_not
                 SOURCE:
            TIMER_START: 1751800236059000
              TIMER_END: 1751800238311000
             TIMER_WAIT: 2252000
              LOCK_TIME: 0
               SQL_TEXT: NULL
                 DIGEST: NULL
            DIGEST_TEXT: NULL
         CURRENT_SCHEMA: sbtest
            OBJECT_TYPE: PROCEDURE
          OBJECT_SCHEMA: sbtest
            OBJECT_NAME: p_sbtest1
  OBJECT_INSTANCE_BEGIN: NULL
            MYSQL_ERRNO: 0
      RETURNED_SQLSTATE: NULL
           MESSAGE_TEXT: NULL
                 ERRORS: 0
               WARNINGS: 0
          ROWS_AFFECTED: 0
              ROWS_SENT: 0
          ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
     CREATED_TMP_TABLES: 0
       SELECT_FULL_JOIN: 0
 SELECT_FULL_RANGE_JOIN: 0
           SELECT_RANGE: 0
     SELECT_RANGE_CHECK: 0
            SELECT_SCAN: 0
      SORT_MERGE_PASSES: 0
             SORT_RANGE: 0
              SORT_ROWS: 0
              SORT_SCAN: 0
          NO_INDEX_USED: 0
     NO_GOOD_INDEX_USED: 0
       NESTING_EVENT_ID: 1342859
     NESTING_EVENT_TYPE: STATEMENT
    NESTING_EVENT_LEVEL: 2
6 rows in set (0,001 sec)

I highlighted some of the interesting column values above. We see SOURCE is always empty (like in  MDEV-23827 I've reported while working on the previous post). Looks like this is a common problem for all kinds of instrumentation and I suspect it may have something to do with compile time adding of PSI keys discussed in MDEV-22841. We also see statements from both trigger and procedure, with different NESTING_EVENT_LEVEL. So, basically this new instrumentation work, even though on non-debug builds matching the stored procedure statements reported to flow control statements may become non-trivial. Let's hope most of the time spent is actually spent on nested DML and DDL statements and not on flow control itself.

Additionally there is a new summary table, events_statements_summary_by_program:

MariaDB [performance_schema]> select * from events_statements_summary_by_program\G
*************************** 1. row ***************************
                OBJECT_TYPE: TRIGGER
              OBJECT_SCHEMA: sbtest
                OBJECT_NAME: tr1
                 COUNT_STAR: 16858
             SUM_TIMER_WAIT: 171716322309000
             MIN_TIMER_WAIT: 0
             AVG_TIMER_WAIT: 10186043000
             MAX_TIMER_WAIT: 69867594000
           COUNT_STATEMENTS: 9193
        SUM_STATEMENTS_WAIT: 107504946033000
        MIN_STATEMENTS_WAIT: 4378538000
        AVG_STATEMENTS_WAIT: 11694217000
        MAX_STATEMENTS_WAIT: 69849730000
              SUM_LOCK_TIME: 0
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 0
          SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
 SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 0
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 0
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 0
              SUM_SORT_SCAN: 0
          SUM_NO_INDEX_USED: 0
     SUM_NO_GOOD_INDEX_USED: 0
*************************** 2. row ***************************
                OBJECT_TYPE: PROCEDURE
              OBJECT_SCHEMA: sbtest
                OBJECT_NAME: p_sbtest1
                 COUNT_STAR: 16858
             SUM_TIMER_WAIT: 170865294923000
             MIN_TIMER_WAIT: 0
             AVG_TIMER_WAIT: 10135561000
             MAX_TIMER_WAIT: 69774831000
           COUNT_STATEMENTS: 18368282
        SUM_STATEMENTS_WAIT: 70946124125000
        MIN_STATEMENTS_WAIT: 112000
        AVG_STATEMENTS_WAIT: 3862000
        MAX_STATEMENTS_WAIT: 26757028000
              SUM_LOCK_TIME: 0
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 0
          SUM_ROWS_EXAMINED: 8686
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
 SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 0
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 0
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 0
              SUM_SORT_SCAN: 0
          SUM_NO_INDEX_USED: 0
     SUM_NO_GOOD_INDEX_USED: 0
2 rows in set (0,001 sec)

and there we have the entire new section of 4 columns:

        SUM_STATEMENTS_WAIT: 70946124125000
        MIN_STATEMENTS_WAIT: 112000
        AVG_STATEMENTS_WAIT: 3862000
        MAX_STATEMENTS_WAIT: 26757028000

for time spent on executing individual statements within procedure or trigger. We can subtract SUM_STATEMENTS_WAIT from SUM_WAIT to find out how much time was spent on "flow control" overhead inside the stored procedure.

We now have more insights into the internal working of all this machinery...

To summarize, MariaDB 10.5, among other things, added instrumentation for stored procedures, functions, triggers and events. It is enabled by default for all non-system databases and is ready to use if you enable related consumers. I hope lack of official documentation at the moment will not prevent users from checking most time consuming stored program units and details of their internal work via Performance Schema. They may easily find out that they are actually bad for performance...






Sunday, September 27, 2020

Metadata Locks Instrumentation in MariaDB 10.5

There are different ways to study metadata locks in MySQL and MariaDB, as I once described in details. Until recently MariaDB had not provided the performance_schema.metadata_locks table, but it was finally added in 10.5. So, now you can easily get the details in a way that became the easiest and most well known since MySQL 5.7. You just have to make sure performance_schema is enabled at startup:

openxs@ao756:~/dbs/maria10.5$ bin/mysqld_safe --no-defaults --port=3311 --socket=/tmp/mariadb105.sock --performance_schema=1 &
[1] 23626
openxs@ao756:~/dbs/maria10.5$ 200927 18:26:41 mysqld_safe Logging to '/home/openxs/dbs/maria10.5/data/ao756.err'.
200927 18:26:41 mysqld_safe Starting mariadbd daemon with databases from /home/openxs/dbs/maria10.5/data

openxs@ao756:~/dbs/maria10.5$ bin/mysql --socket=/tmp/mariadb105.sock
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.5.6-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> select @@performance_schema;
+----------------------+
| @@performance_schema |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0,000 sec)

MariaDB [(none)]> use performance_schema
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [performance_schema]> show tables like '%metadata%';
+-------------------------------------------+
| Tables_in_performance_schema (%metadata%) |
+-------------------------------------------+
| metadata_locks                            |
+-------------------------------------------+
1 row in set (0,001 sec)

MariaDB [performance_schema]> select * from setup_instruments where name like '%metadata%';
+---------------------------------------------------------+---------+-------+
| NAME                                                    | ENABLED | TIMED |
+---------------------------------------------------------+---------+-------+
| wait/io/file/csv/metadata                               | YES     | YES   |
| stage/sql/Waiting for schema metadata lock              | NO      | NO    |
| stage/sql/Waiting for table metadata lock               | NO      | NO    |
| stage/sql/Waiting for stored function metadata lock     | NO      | NO    |
| stage/sql/Waiting for stored procedure metadata lock    | NO      | NO    |
| stage/sql/Waiting for stored package body metadata lock | NO      | NO    |
| stage/sql/Waiting for trigger metadata lock             | NO      | NO    |
| stage/sql/Waiting for event metadata lock               | NO      | NO    |
| memory/performance_schema/metadata_locks                | YES     | NO    |
| wait/lock/metadata/sql/mdl                              | NO      | NO    |
+---------------------------------------------------------+---------+-------+
10 rows in set (0,034 sec)

So, the metadata_locks table is there and we just need to enable wait/lock/metadata/sql/mdl instrument:

MariaDB [performance_schema]> update setup_instruments set enabled = 'YES', timed='YES' where name like 'wait%mdl';
Query OK, 1 row affected (0,030 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [performance_schema]> select * from metadata_locks\G
*************************** 1. row ***************************
          OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: performance_schema
          OBJECT_NAME: metadata_locks
OBJECT_INSTANCE_BEGIN: 140032981253664
            LOCK_TYPE: SHARED_READ
        LOCK_DURATION: TRANSACTION
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 13
       OWNER_EVENT_ID: 1
1 row in set (0,001 sec)

MariaDB [performance_schema]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|               3 |
+-----------------+
1 row in set (0,000 sec)

MariaDB [performance_schema]> select * from threads where processlist_id = 3\G
*************************** 1. row ***************************
          THREAD_ID: 13
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 3
   PROCESSLIST_USER: openxs
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: performance_schema
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 0
  PROCESSLIST_STATE: Sending data
   PROCESSLIST_INFO: select * from threads where processlist_id = 3
   PARENT_THREAD_ID: 1
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 23719
1 row in set (0,018 sec)

The output above shows that one has to join to threads table on thread_id = metadata_locks.owner_thread_id to get the details for thread that set metadata lock: 

MariaDB [performance_schema]> SELECT OBJECT_TYPE, OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_STATUS, THREAD_ID, PROCESSLIST_ID, PROCESSLIST_INFO FROM performance_schema.metadata_locks INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID WHERE PROCESSLIST_ID <> CONNECTION_ID()\G
*************************** 1. row ***************************
     OBJECT_TYPE: BACKUP
   OBJECT_SCHEMA: NULL
     OBJECT_NAME: NULL
       LOCK_TYPE: BACKUP_DDL
     LOCK_STATUS: PENDING
       THREAD_ID: 50
  PROCESSLIST_ID: 21
PROCESSLIST_INFO: drop database test
1 row in set (0,001 sec)

In the cases above we see the lock, SHARED_READ, on the table we access, performance_schema.metadata_locks, set by thread that reads from the table. Then we see how to avoid seeing locks set by current connection and how to get current statement (or other SHOW PROCESSLIST details) for the thread associated with the lock.

I noted that SOURCE column in the metadata_locks table is empty. It seems to be the case for other locks too, like those caused by sysbench test running:

MariaDB [performance_schema]> select * from metadata_locks limit 5\G
*************************** 1. row ***************************
          OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: sbtest
          OBJECT_NAME: sbtest1
OBJECT_INSTANCE_BEGIN: 140032787904800
            LOCK_TYPE: SHARED_WRITE
        LOCK_DURATION: TRANSACTION
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 34
       OWNER_EVENT_ID: 1
*************************** 2. row ***************************
          OBJECT_TYPE: BACKUP
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140032778203712
            LOCK_TYPE: BACKUP_TRANS_DML
        LOCK_DURATION: STATEMENT

          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 34
       OWNER_EVENT_ID: 1
*************************** 3. row ***************************
          OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: sbtest
          OBJECT_NAME: sbtest1
OBJECT_INSTANCE_BEGIN: 140033249782768
            LOCK_TYPE: SHARED_WRITE
        LOCK_DURATION: TRANSACTION
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 33
       OWNER_EVENT_ID: 1
*************************** 4. row ***************************
          OBJECT_TYPE: BACKUP
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140033248057584
            LOCK_TYPE: BACKUP_TRANS_DML
        LOCK_DURATION: STATEMENT
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 33
       OWNER_EVENT_ID: 1
*************************** 5. row ***************************
          OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: sbtest
          OBJECT_NAME: sbtest1
OBJECT_INSTANCE_BEGIN: 140033449398688
            LOCK_TYPE: SHARED_WRITE
        LOCK_DURATION: TRANSACTION
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 32
       OWNER_EVENT_ID: 1
5 rows in set (0,001 sec)

MariaDB [performance_schema]>

I do not think that it is normal, as in MySQL 8.0.21, for example, I still see the reference to the line of source code where the lock is set:

openxs@ao756:~/dbs/8.0$ bin/mysql --socket=/tmp/mysql8.sock -uroot performance_schema
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.21 Source distribution

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from metadata_locks\G
*************************** 1. row ***************************
          OBJECT_TYPE: GLOBAL
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
          COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139972651847648
            LOCK_TYPE: SHARED
        LOCK_DURATION: EXPLICIT
          LOCK_STATUS: GRANTED
               SOURCE: lock.cc:1033
      OWNER_THREAD_ID: 49
       OWNER_EVENT_ID: 112
*************************** 2. row ***************************
          OBJECT_TYPE: COMMIT
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
          COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139972651711440
            LOCK_TYPE: SHARED
        LOCK_DURATION: EXPLICIT
          LOCK_STATUS: GRANTED
               SOURCE: lock.cc:1108
      OWNER_THREAD_ID: 49
       OWNER_EVENT_ID: 112
...

I reported this immediately as a bug, MDEV-23827 - "performance_schema.metadata_locks.source column is always empty".

The details in this table that are not covered by the MySQL manual require some investigation. You probably noted BACKUP_TRANS_DML as a lock type above. Another example (that you may get while mariabackup a.k.a mariadb-backup in 10.5 is running):

MariaDB [performance_schema]> select * from metadata_locks\G
*************************** 1. row ***************************
          OBJECT_TYPE: BACKUP
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140032980286816
            LOCK_TYPE: BACKUP_START
        LOCK_DURATION: EXPLICIT
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 44
       OWNER_EVENT_ID: 1
...

I've investigated similar details in the past for MySQL and Percona Server, by code review and some gdb tests. I'd expect to see these values documented somewhere in the mdl.h file in the source code. Here is what we have:

openxs@ao756:~/git/server$ grep -rn BACKUP_TRANS_DML *
mysql-test/main/mdl.result:13:MDL_BACKUP_TRANS_DML      Backup lock
mysql-test/main/mdl.result:19:MDL_BACKUP_TRANS_DML      Backup lock
sql/mdl.h:295:#define MDL_BACKUP_TRANS_DML enum_mdl_type(8)
sql/mdl.cc:134:  { C_STRING_WITH_LEN("MDL_BACKUP_TRANS_DML") },
sql/mdl.cc:490:               MDL_BIT(MDL_BACKUP_TRANS_DML)));
sql/mdl.cc:1633:  MDL_BIT(MDL_BACKUP_DML) | MDL_BIT(MDL_BACKUP_TRANS_DML) | MDL_BIT(MDL_BACKUP_SYS_DML) | MDL_BIT(MDL_BACKUP_DDL) | MDL_BIT(MDL_BACKUP_ALTER_COPY),
sql/mdl.cc:1634:  MDL_BIT(MDL_BACKUP_DML) | MDL_BIT(MDL_BACKUP_TRANS_DML) | MDL_BIT(MDL_BACKUP_SYS_DML) | MDL_BIT(MDL_BACKUP_DDL) | MDL_BIT(MDL_BACKUP_ALTER_COPY) | MDL_BIT(MDL_BACKUP_COMMIT),
sql/sql_base.cc:2048:      mdl_type= MDL_BACKUP_TRANS_DML;
storage/perfschema/table_helper.cc:636:    case MDL_BACKUP_TRANS_DML:
storage/perfschema/table_helper.cc:637:      PFS_engine_table::set_field_varchar_utf8(f, STRING_WITH_LEN("BACKUP_TRANS_DML"));
openxs@ao756:~/git/server$

This is what we have in sql/mdl.h in MariaDB 10.5:

    258 /** Backup locks */
    259
    260 /**
    261   Block concurrent backup
    262 */
    263 #define MDL_BACKUP_START enum_mdl_type(0)
    264 /**
    265    Block new write requests to non transactional tables
    266 */
    267 #define MDL_BACKUP_FLUSH enum_mdl_type(1)
    268 /**
    269    In addition to previous locks, blocks running requests to non trans tables
    270    Used to wait until all DML usage of on trans tables are finished
    271 */
    272 #define MDL_BACKUP_WAIT_FLUSH enum_mdl_type(2)
    273 /**
    274    In addition to previous locks, blocks new DDL's from starting
    275 */
    276 #define MDL_BACKUP_WAIT_DDL enum_mdl_type(3)
    277 /**
    278    In addition to previous locks, blocks commits
    279 */
    280 #define MDL_BACKUP_WAIT_COMMIT enum_mdl_type(4)
    281
    282 /**
    283   Blocks (or is blocked by) statements that intend to modify data. Acquired
    284   before commit lock by FLUSH TABLES WITH READ LOCK.
    285 */
    286 #define MDL_BACKUP_FTWRL1 enum_mdl_type(5)
    287
    288 /**
    289   Blocks (or is blocked by) commits. Acquired after global read lock by
    290   FLUSH TABLES WITH READ LOCK.
    291 */
    292 #define MDL_BACKUP_FTWRL2 enum_mdl_type(6)
    293
    294 #define MDL_BACKUP_DML enum_mdl_type(7)
    295 #define MDL_BACKUP_TRANS_DML enum_mdl_type(8)
    296 #define MDL_BACKUP_SYS_DML enum_mdl_type(9)
...

I see creative re-use of the values from enum_mdl_type.

The last bug not the least, this is how the global read lock and the metadata lock wait look like:

MariaDB [performance_schema]> flush tables with read lock;
Query OK, 0 rows affected (0,090 sec)

MariaDB [performance_schema]> select * from metadata_locks\G
*************************** 1. row ***************************
          OBJECT_TYPE: BACKUP
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140033115590992
            LOCK_TYPE: BACKUP_FTWRL1
        LOCK_DURATION: EXPLICIT
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 56
       OWNER_EVENT_ID: 1
...
MariaDB [performance_schema]> select * from metadata_locks\G
*************************** 1. row ***************************
          OBJECT_TYPE: BACKUP
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140033115590992
            LOCK_TYPE: BACKUP_FTWRL1
        LOCK_DURATION: EXPLICIT
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 56
       OWNER_EVENT_ID: 1
*************************** 2. row ***************************
          OBJECT_TYPE: BACKUP
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140032979653216
            LOCK_TYPE: BACKUP_DDL
        LOCK_DURATION: STATEMENT
          LOCK_STATUS: PENDING
               SOURCE:
      OWNER_THREAD_ID: 50
       OWNER_EVENT_ID: 1

*************************** 3. row ***************************
          OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: performance_schema
          OBJECT_NAME: metadata_locks
OBJECT_INSTANCE_BEGIN: 140033113928032
            LOCK_TYPE: SHARED_READ
        LOCK_DURATION: TRANSACTION
          LOCK_STATUS: GRANTED
               SOURCE:
      OWNER_THREAD_ID: 56
       OWNER_EVENT_ID: 1
3 rows in set (0,000 sec)

MariaDB [performance_schema]> select * from information_schema.metadata_lock_info\G
*************************** 1. row ***************************
    THREAD_ID: 23
    LOCK_MODE: MDL_BACKUP_FTWRL2
LOCK_DURATION: NULL
    LOCK_TYPE: Backup lock
 TABLE_SCHEMA:
   TABLE_NAME:
1 row in set (0,001 sec)

The last statement above shows that the old way to check metadata locks, via metadata_lock_info plugin, had provided much less details. We could not even distinguish lock from lock wait (pending lock) with it!

Are there any locks on that bridge? Go figure...

Finally, even with metadata_locks table in place, finding the details about the exact blocking lock may be not trivial. See this blog post for some idea on how to do this based on assumption that blocking lock is surely set earlier than pending one.

Sunday, September 20, 2020

Dynamic Tracing of pthread_mutex_lock in MariaDB - First Steps

 I've got several comments and questions during my session on dynamic tracing at MariaDB Server Fest 2020. One of them was from a MariaDB developer at Slack and sounded as follows:

"Can perf profile contention on pthread_mutex_lock?"

I surely stated that it can (it can profile anything), but tried to clarify what exactly is needed whenever we hit the probe. The replies where the following:

"ideally, collect stack traces of mariadbd, and sort them in descending order by time spent between pthread_mutex_lock and next pthread_mutex_unlock."

and:

"less ideally, just collect stacktraces and sort in descending order by frequency"

I am not sure about the former (I am yet to try to trace more than one probe at a time with perf and find out how to aggregate the result properly). It is surely doable (in a more efficient way) with bcc tools if one writes proper program, and probably even with bpftrace. I am going to show how to do this in the next posts one day... But the question was about perf.

Fortunately I can easily show how to do the latter. For the primitive test I've started MariaDB 10.5 built from GitHub source on my Ubuntu 16.04 netbook with the following options:

openxs@ao756:~/dbs/maria10.5$ bin/mysqld_safe --no-defaults --port=3311 --socket=/tmp/mariadb105.sock --performance_schema=1 --performance-schema-instrument='memory/%=ON' --thread-handling='pool-of-threads' &

 and used the following sysbench test as a load that I expected to cause some mutex waits:

openxs@ao756:~/dbs/maria10.5$ sysbench --table-size=1000000 --threads=34 --time=300 --report-interval=5 --mysql-socket=/tmp/mariadb105.sock --mysql-user=openxs --mysql-db=sbtest /usr/share/sysbench/oltp_read_only.lua run

First step is to find out what exact pthread library is used:

openxs@ao756:~/dbs/maria10.5$ ldd bin/mariadbd
        linux-vdso.so.1 =>  (0x00007ffc03437000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f585d0c6000)
        liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f585ceae000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f585cc8c000)
        libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f585ca7c000)
        libsnappy.so.1 => /usr/lib/x86_64-linux-gnu/libsnappy.so.1 (0x00007f585c874000)
        libaio.so.1 => /lib/x86_64-linux-gnu/libaio.so.1 (0x00007f585c672000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f585c467000)
        libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f585c1ff000)
        libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f585bdba000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f585bb9d000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f585b999000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f585b617000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f585b30e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f585b0f8000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f585ad2e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f585f4e6000)

Adding the probe after that is trivial:

openxs@ao756:~/dbs/maria10.5$ sudo perf probe -x  /lib/x86_64-linux-gnu/libpthread.so.0 pthread_mutex_lock
Added new event:
  probe_libpthread:pthread_mutex_lock (on pthread_mutex_lock in /lib/x86_64-linux-gnu/libpthread-2.23.so)

You can now use it in all perf tools, such as:

        perf record -e probe_libpthread:pthread_mutex_lock -aR sleep 1

I had not checked yet what are the arguments and not tried to record mutex address or anything besides the fact of the call to this function. Then I started sysbench and while it worked:

openxs@ao756:~/dbs/maria10.5$ sysbench --table-size=1000000 --threads=34 --time=300 --report-interval=5 --mysql-socket=/tmp/mariadb105.sock --mysql-user=openxs --mysql-db=sbtest /usr/share/sysbench/oltp_read_only.lua run
sysbench 1.1.0-faaff4f (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 34
Report intermediate results every 5 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 5s ] thds: 34 tps: 564.25 qps: 9079.14 (r/w/o: 7944.45/0.00/1134.69) lat (ms,95%): 99.33 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 34 tps: 570.68 qps: 9136.92 (r/w/o: 7995.56/0.00/1141.37) lat (ms,95%): 104.84 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 34 tps: 585.57 qps: 9345.87 (r/w/o: 8175.33/0.00/1170.53) lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 34 tps: 588.03 qps: 9409.23 (r/w/o: 8232.98/0.00/1176.25) lat (ms,95%): 97.55 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 34 tps: 587.39 qps: 9399.31 (r/w/o: 8224.92/0.00/1174.39) lat (ms,95%): 97.55 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 34 tps: 584.80 qps: 9360.32 (r/w/o: 8190.33/0.00/1169.99) lat (ms,95%): 77.19 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 34 tps: 573.02 qps: 9173.87 (r/w/o: 8028.24/0.00/1145.63) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 34 tps: 572.60 qps: 9160.00 (r/w/o: 8014.00/0.00/1146.00) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00
[ 45s ] thds: 34 tps: 529.54 qps: 8468.10 (r/w/o: 7409.02/0.00/1059.09) lat (ms,95%): 101.13 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 34 tps: 393.04 qps: 6276.27 (r/w/o: 5491.38/0.00/784.88) lat (ms,95%): 121.08 err/s: 0.00 reconn/s: 0.00
[ 55s ] thds: 34 tps: 382.60 qps: 6134.80 (r/w/o: 5369.20/0.00/765.60) lat (ms,95%): 110.66 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 34 tps: 386.36 qps: 6210.71 (r/w/o: 5436.79/0.00/773.91) lat (ms,95%): 134.90 err/s: 0.00 reconn/s: 0.00
[ 65s ] thds: 34 tps: 397.01 qps: 6320.60 (r/w/o: 5528.37/0.00/792.23) lat (ms,95%): 153.02 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: 34 tps: 419.01 qps: 6699.60 (r/w/o: 5861.98/0.00/837.63) lat (ms,95%): 248.83 err/s: 0.00 reconn/s: 0.00
...

I've tried to run perf record for the mariadbd process as follows, with -g option:

openxs@ao756:~/dbs/maria10.5$ sudo perf record -e probe_libpthread:pthread_mutex_lock -g -p`pidof mariadbd` sleep 30
^C[ perf record: Woken up 735 times to write data ]
[ perf record: Captured and wrote 183.516 MB perf.data (1150509 samples) ]

 I had noty set the frequency and with the default one we see notable drop of QPS from sysbench and huge amount of data collected in less than 30 seconds (highlighted). So we can not speak about really small impact with such an approach. In production case I'd have to play with sampling frequency and time to run for sure.

Raw results can be checked with perf script:

openxs@ao756:~/dbs/maria10.5$ sudo perf script | more
mariadbd  9621 [000] 30901.858515: probe_libpthread:pthread_mutex_lock: (7f28dd88fd40)
                    9d40 pthread_mutex_lock (/lib/x86_64-linux-gnu/libpthread-2.23.so)
                  6a9233 close_thread_tables (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  70b87c mysql_execute_command (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  72a0ae Prepared_statement::execute (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  72a546 Prepared_statement::execute_loop (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  72af58 mysql_stmt_execute_common (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  72b0b5 mysqld_stmt_execute (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  7090d8 dispatch_command (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  70777c do_command (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  887c40 tp_callback (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  a76650 worker_main (/home/openxs/dbs/maria10.5/bin/mariadbd)
                  b31911 pfs_spawn_thread (/home/openxs/dbs/maria10.5/bin/mariadbd)
                    76ba start_thread (/lib/x86_64-linux-gnu/libpthread-2.23.so)
--More--

Now we can process the results collected. The easiest way is to use perf report:

openxs@ao756:~/dbs/maria10.5$ sudo perf report -g >/tmp/perf_mutex.txt

In the resulting file we can see the following:

openxs@ao756:~/dbs/maria10.5$ cat /tmp/perf_mutex.txt | head -100
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of event 'probe_libpthread:pthread_mutex_lock'
# Event count (approx.): 1150509
#
# Children      Self  Command   Shared Object        Symbol                     
# ........  ........  ........  ...................  ......................................................
#
   100.00%   100.00%  mariadbd  libpthread-2.23.so   [.] pthread_mutex_lock     
            |
            |--99.98%-- start_thread
            |          |
            |          |--99.97%-- pfs_spawn_thread
            |          |          |
            |          |          |--99.95%-- worker_main
            |          |          |          |
            |          |          |          |--86.17%-- tp_callback
            |          |          |          |          do_command
            |          |          |          |          dispatch_command
            |          |          |          |          |
            |          |          |          |          |--46.56%-- mysqld_stmt_execute
            |          |          |          |          |          mysql_stmt_execute_common
            |          |          |          |          |          Prepared_statement::execute_loop
            |          |          |          |          |          Prepared_statement::execute
            |          |          |          |          |          |
            |          |          |          |          |          |--27.70%-- mysql_execute_command
            |          |          |          |          |          |          |
            |          |          |          |          |          |          |--17.60%-- close_thread_tables
            |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |          |--8.80%-- close_thread_table
            |          |          |          |          |          |          |          |          pthread_mutex_lock
            |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |           --8.80%-- pthread_mutex_lock
            |          |          |          |          |          |          |
            |          |          |          |          |          |          |--9.47%-- execute_sqlcom_select
            |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |          |--9.46%-- open_and_lock_tables
            |          |          |          |          |          |          |          |          open_tables
            |          |          |          |          |          |          |          |          open_table
            |          |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |          |          |--8.80%-- pthread_mutex_lock
            |          |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |          |          |--0.63%-- open_table_get_mdl_lock
            |          |          |          |          |          |          |          |          |          MDL_context::acquire_lock
            |          |          |          |          |          |          |          |          |          MDL_context::try_acquire_lock_impl
            |          |          |          |          |          |          |          |          |          MDL_map::find_or_insert
            |          |          |          |          |          |          |          |          |          pthread_mutex_lock
            |          |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |          |           --0.03%-- tdc_acquire_share
            |          |          |          |          |          |          |          |                     pthread_mutex_lock
            |          |          |          |          |          |          |          |
            |          |          |          |          |          |          |           --0.01%-- handle_select
            |          |          |          |          |          |          |                     mysql_select
            |          |          |          |          |          |          |                     JOIN::optimize
            |          |          |          |          |          |          |                     JOIN::optimize_inner
            |          |          |          |          |          |          |                     |
            |          |          |          |          |          |          |                     |--0.01%-- join_read_const_table
            |          |          |          |          |          |          |                     |          join_read_const
            |          |          |          |          |          |          |                     |          handler::ha_index_read_idx_map
            |          |          |          |          |          |          |                     |          handler::index_read_idx_map
            |          |          |          |          |          |          |                     |          ha_innobase::index_read
            |          |          |          |          |          |          |                     |          row_search_mvcc
            |          |          |          |          |          |          |                     |          |
            |          |          |          |          |          |          |                     |          |--0.01%-- btr_cur_search_to_nth_level_func
            |          |          |          |          |          |          |                     |          |          buf_page_get_gen
            |          |          |          |          |          |          |                     |          |          buf_page_get_low
            |          |          |          |          |          |          |                     |          |          buf_read_page
            |          |          |          |          |          |          |                     |          |          buf_read_page_low
            |          |          |          |          |          |          |                     |          |          pthread_mutex_lock
            |          |          |          |          |          |          |                     |          |
            |          |          |          |          |          |          |                     |           --0.00%-- btr_pcur_move_to_next_page
            |          |          |          |          |          |          |                     |                     buf_page_get_gen
            |          |          |          |          |          |          |                     |                     buf_page_get_low
            |          |          |          |          |          |          |                     |                     buf_read_page
            |          |          |          |          |          |          |                     |                     buf_read_page_low
            |          |          |          |          |          |          |                     |                     pthread_mutex_lock
            |          |          |          |          |          |          |                     |
            |          |          |          |          |          |          |                      --0.01%-- SQL_SELECT::test_quick_select
            |          |          |          |          |          |          |                                get_key_scans_params
            |          |          |          |          |          |          |                                DsMrr_impl::dsmrr_info_const
            |          |          |          |          |          |          |                                handler::multi_range_read_info_const
            |          |          |          |          |          |          |                                ha_innobase::records_in_range
            |          |          |          |          |          |          |                                btr_estimate_n_rows_in_range_low
            |          |          |          |          |          |          |                                |
            |          |          |          |          |          |          |                                |--0.01%-- btr_cur_search_to_nth_level_func
            |          |          |          |          |          |          |                                |          buf_page_get_gen
            |          |          |          |          |          |          |                                |          buf_page_get_low
            |          |          |          |          |          |          |                                |          buf_read_page
            |          |          |          |          |          |          |                                |          buf_read_page_low
            |          |          |          |          |          |          |                                |          pthread_mutex_lock
            |          |          |          |          |          |          |                                |
            |          |          |          |          |          |          |                                 --0.00%-- buf_page_get_gen
            |          |          |          |          |          |          |                                           buf_page_get_low
            |          |          |          |          |          |          |                                           buf_read_page
            |          |          |          |          |          |          |                                           buf_read_page_low
            |          |          |          |          |          |          |                                           pthread_mutex_lock
            |          |          |          |          |          |          |
...

but the result is so large that you can not clearly see the full picture even with the smallest font set. One would need another way to visualize the data, for example, as a flame graph:

openxs@ao756:~/dbs/maria10.5$ sudo perf script | ~/git/FlameGraph/stackcollapse-perf.pl > /tmp/perf-folded.txt
openxs@ao756:~/dbs/maria10.5$ ~/git/FlameGraph/flamegraph.pl /tmp/perf-folded.txt >/tmp/mutex.svg

The resulting flame graph looks as follows:

Flame Graph for the pthread_mutex_lock calls in MariaDB server.

It makes it easier to see were most of waits originate from.

We can also check the /tmp/perf-folded.txt file created in the process:

openxs@ao756:~/dbs/maria10.5$ head -10 /tmp/perf-folded.txt                    
mariadbd;[libstdc++.so.6.0.21];tpool::thread_pool_generic::worker_main;pthread_mutex_lock 82
mariadbd;[libstdc++.so.6.0.21];tpool::thread_pool_generic::worker_main;tpool::task::execute;pthread_mutex_lock 41
mariadbd;[libstdc++.so.6.0.21];tpool::thread_pool_generic::worker_main;tpool::task::execute;tpool::thread_pool_generic::timer_generic::execute;pthread_mutex_lock 82
mariadbd;[libstdc++.so.6.0.21];tpool::thread_pool_generic::worker_main;tpool::task::execute;tpool::thread_pool_generic::timer_generic::execute;srv_error_monitor_task;pthread_mutex_lock 19
mariadbd;[libstdc++.so.6.0.21];tpool::thread_pool_generic::worker_main;tpool::task::execute;tpool::thread_pool_generic::timer_generic::execute;srv_master_callback;pthread_mutex_lock 18
mariadbd;start_thread;buf_flush_page_cleaner;buf_flush_lists;buf_flush_do_batch;pthread_mutex_lock 46
mariadbd;start_thread;buf_flush_page_cleaner;pthread_mutex_lock 64
mariadbd;start_thread;pfs_spawn_thread;ma_checkpoint_background;pthread_mutex_lock 1
mariadbd;start_thread;pfs_spawn_thread;ma_checkpoint_background;translog_get_horizon;pthread_mutex_lock 1
mariadbd;start_thread;pfs_spawn_thread;pthread_mutex_lock 112
openxs@ao756:~/dbs/maria10.5$ cat /tmp/perf-folded.txt | sort -r -n -k 1,1 | more
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat
ch_command;THD::reset_killed;pthread_mutex_lock 231467
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat
ch_command;pthread_mutex_lock 115734
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat
ch_command;mysqld_stmt_execute;mysql_stmt_execute_common;Prepared_statement::exe
cute_loop;Prepared_statement::execute;THD::set_statement;pthread_mutex_lock 2169
99
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat
ch_command;mysqld_stmt_execute;mysql_stmt_execute_common;Prepared_statement::exe
cute_loop;Prepared_statement::execute;mysql_execute_command;MDL_context::release
_locks_stored_before;MDL_context::release_lock;MDL_lock::remove_ticket;pthread_m
utex_lock 7232
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat
ch_command;mysqld_stmt_execute;mysql_stmt_execute_common;Prepared_statement::exe
cute_loop;Prepared_statement::execute;mysql_execute_command;execute_sqlcom_selec
t;open_and_lock_tables;open_tables;open_table;tdc_acquire_share;pthread_mutex_lo
ck 301
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat
ch_command;mysqld_stmt_execute;mysql_stmt_execute_common;Prepared_statement::exe
cute_loop;Prepared_statement::execute;mysql_execute_command;execute_sqlcom_selec
t;open_and_lock_tables;open_tables;open_table;pthread_mutex_lock 101268

mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispat

to find out that it contains unique stack traces as the first field and number of such stack traces as the second. So, and answer the real question from a developer I have sort the result properly:

openxs@ao756:~/dbs/maria10.5$ cat /tmp/perf-folded.txt | sort -r -n -k 2 | head -5
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispatch_command;THD::reset_killed;pthread_mutex_lock 231467
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispatch_command;mysqld_stmt_execute;mysql_stmt_execute_common;Prepared_statement::execute_loop;Prepared_statement::execute;THD::set_statement;pthread_mutex_lock 216999
mariadbd;start_thread;pfs_spawn_thread;worker_main;pthread_mutex_lock 158530
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispatch_command;pthread_mutex_lock 115734
mariadbd;start_thread;pfs_spawn_thread;worker_main;tp_callback;do_command;dispatch_command;delete_explain_query;pthread_mutex_lock 101269
openxs@ao756:~/dbs/maria10.5$

To summarize, perf (augmented with some post processing tools) does allow to collect stacktraces leading to some probe and sort in descending order by frequency.

Finally, this is my presentation that inspired the question:


You can watch the session recorded if you missed it..