Saturday, December 22, 2018

How to Get Details About MyRocks Deadlocks in MariaDB and Percona Server

In my previous post on ERROR 1213 I noted that Percona Server does not support the SHOW ENGINE ROCKSDB TRANSACTION STATUS statement to get deadlock details in "text" form. I've got some clarifications in my related feature request, PS-5114. So I decided to write this followup post and show what is the way to get deadlock details for the ROCKSDB tables in current versions of MariaDB and Percona Server.

First of all, I'd like to check MariaDB's implementation of MyRocks. For this I'll re-create deadlock scenario from that my post with MariaDB 10.3.12 I have at hand. We should start with installing ROCKSDB plugin according to this KB article:
openxs@ao756:~/dbs/maria10.3$ bin/mysql --no-defaults --socket=/tmp/mariadb.sock -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.3.12-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


MariaDB [test]> install soname 'ha_rocksdb';
Query OK, 0 rows affected (36,338 sec)

MariaDB [test]> show engines;
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                          | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                           | YES          | YES  | YES        |
...
| CONNECT            | YES     | Management of External Data (SQL/NOSQL/MED), including many file formats         | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                                           | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                               | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                   | YES          | NO   | YES        |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
10 rows in set (8,753 sec)

MariaDB [test]> show plugins;
+-------------------------------+----------+--------------------+---------------+---------+
| Name                          | Status   | Type               | Library       | License |
+-------------------------------+----------+--------------------+---------------+---------+
| binlog                        | ACTIVE   | STORAGE ENGINE     | NULL          | GPL     |
| mysql_native_password         | ACTIVE   | AUTHENTICATION     | NULL          | GPL     |
| mysql_old_password            | ACTIVE   | AUTHENTICATION     | NULL          | GPL     |
| wsrep                         | ACTIVE   | STORAGE ENGINE     | NULL          | GPL     |
...
| CONNECT                       | ACTIVE   | STORAGE ENGINE     | ha_connect.so | GPL     |
| ROCKSDB                       | ACTIVE   | STORAGE ENGINE     | ha_rocksdb.so | GPL     |
...
| ROCKSDB_LOCKS                 | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_TRX                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DEADLOCK              | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
+-------------------------------+----------+--------------------+---------------+---------+
68 rows in set (2,451 sec)
Note that in MariaDB just one simple INSTALL SONAME ... statement is enough to get ROCKSDB with all related plugins loaded. Do not mind time to execute statements - I am running them on a netbook that is busy compiling Percona Server 8.0.13 from GitHub concurrently, to post something about it later :)

Now, let me re-create the same deadlock scenario:

MariaDB [test]> create table t1(id int, c1 int, primary key(id)) engine=rocksdb;
Query OK, 0 rows affected (4,163 sec)

MariaDB [test]> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0,641 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> set global rocksdb_lock_wait_timeout=50;
Query OK, 0 rows affected (0,644 sec)

MariaDB [test]> set global rocksdb_deadlock_detect=ON;
Query OK, 0 rows affected (0,037 sec)

MariaDB [test]> show global variables like 'rocksdb%deadlock%';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| rocksdb_deadlock_detect       | ON    |
| rocksdb_deadlock_detect_depth | 50    |
| rocksdb_max_latest_deadlocks  | 5     |
+-------------------------------+-------+
3 rows in set (0,022 sec)
We need two sessions. In the first one:
MariaDB [test]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|              11 |
+-----------------+
1 row in set (0,117 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0,000 sec)

MariaDB [test]> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0,081 sec)

In the second:
MariaDB [test]> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
|              12 |
+-----------------+
1 row in set (0,000 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0,000 sec)

MariaDB [test]> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0,001 sec)
Back in the first:

MariaDB [test]> select * from t1 where id=2 for update;
It hangs waiting for incompatible lock. In the second:

MariaDB [test]> select * from t1 where id=1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Now, can we get any details about the deadlock that just happened using upstream SHOW ENGINE statement? Let's try:
MariaDB [test]> SHOW ENGINE ROCKSDB TRANSACTION STATUS\G
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'TRANSACTION STATUS' at line 1
Does not work, same as in Percona Server 5.7.x. Here is a related MariaDB task: MDEV-13859 - "Add SHOW ENGINE ROCKSDB TRANSACTION STATUS or its equivalent?". Still "Open" and without any target version set, not even 10.4.

The idea behind NOT supporting this statement by Percona, according to comments I've got in  PS-5114, is to rely on tables in the information_schema. That's what we have in the related tables, rocksdb_locks, rocksdb_trx and rocksdb_deadlock after deadlock above was detected:
MariaDB [test]> select * from information_schema.rocksdb_deadlock;
+-------------+------------+----------------+---------+------------------+-----------+------------+------------+-------------+
| DEADLOCK_ID | TIMESTAMP  | TRANSACTION_ID | CF_NAME | WAITING_KEY      | LOCK_TYPE | INDEX_NAME | TABLE_NAME | ROLLED_BACK |
+-------------+------------+----------------+---------+------------------+-----------+------------+------------+-------------+
|           0 | 1545481878 |              6 | default | 0000010080000002 | EXCLUSIVE | PRIMARY    | test.t1    |           0 |
|           0 | 1545481878 |              7 | default | 0000010080000001 | EXCLUSIVE | PRIMARY    | test.t1    |           1 |
+-------------+------------+----------------+---------+------------------+-----------+------------+------------+-------------+
2 rows in set (0,078 sec)

MariaDB [test]> select * from information_schema.rocksdb_trx;
+----------------+---------+------+-------------+------------+-------------+-------------+--------------------------+----------------+--------------+-----------+------------------------+----------------------+-----------+-------+
| TRANSACTION_ID | STATE   | NAME | WRITE_COUNT | LOCK_COUNT | TIMEOUT_SEC | WAITING_KEY | WAITING_COLUMN_FAMILY_ID | IS_REPLICATION | SKIP_TRX_API | READ_ONLY | HAS_DEADLOCK_DETECTION | NUM_ONGOING_BULKLOAD | THREAD_ID | QUERY |
+----------------+---------+------+-------------+------------+-------------+-------------+--------------------------+----------------+--------------+-----------+------------------------+----------------------+-----------+-------+
|              6 | STARTED |      |           0 |          2 |          50 |             |                        0 |              0 |            0 |         0 |                      1 |                    0 |        11 |       |
+----------------+---------+------+-------------+------------+-------------+-------------+--------------------------+----------------+--------------+-----------+------------------------+----------------------+-----------+-------+
1 row in set (0,001 sec)

MariaDB [test]> select * from information_schema.rocksdb_locks;
+------------------+----------------+------------------+------+
| COLUMN_FAMILY_ID | TRANSACTION_ID | KEY              | MODE |
+------------------+----------------+------------------+------+
|                0 |              6 | 0000010080000002 | X    |
|                0 |              6 | 0000010080000001 | X    |
+------------------+----------------+------------------+------+
2 rows in set (0,025 sec)
I was not able to find any really good documentation about these tables (I checked here, there and more), especially rocksdb_deadlock that is totally undocumented, so let me try to speculate and explain my ideas on how they are supposed to work and be used together. Information about up to rocksdb_max_latest_deadlocks is stored in the rocksdb_deadlock table, each deadlock is identified by deadlock_id and in case of MariaDB you can find out when it happened using the timestamp column that is a UNIX timestamp:
MariaDB [test]> select distinct deadlock_id, from_unixtime(timestamp) from information_schema.rocksdb_deadlock;+-------------+--------------------------+
| deadlock_id | from_unixtime(timestamp) |
+-------------+--------------------------+
|           0 | 2018-12-22 14:31:18      |
+-------------+--------------------------+
1 row in set (0,137 sec)

For each deadlock you have a row per lock wait for each transaction involved, identified by transaction_id. You can see for what key value (waited_key) in that index (index_name) of what table (table_name) the transaction was waited. Victim (transaction that was rolled back to prevent deadlock) of deadlock detection is identified by the value 1 in the rolled_back column. This is all the information that persists for any notable time, and I don't like it that much, as we can not see what lock(s) transactions had at the moment. We can guess conflicting lock based on what was the waiting_key, but I'd prefer InnoDB way of showing this clearly with all the details.

If you hurry up and query the rocksdb_trx table fast enough, you can get more details about those transaction(s) involved in deadlock that are NOT rolled back (and not yet committed). Join by the transaction_id column, obviously, to get the details up to current running query and processlist connection id (rocksdb_trx.thread_id column) involved. 

If you hurry up to query rocksdb_locks table also by the transaction_id of still active transaction, you can get a list of locks it holds and then guess which one was a blocking lock. If you are not fast enough and transaction is gone you have just to assume there was some blocking lock. One day gap locks may be added, and some lock would become not good enough guess.

I miss these rocks at Beaulieu-sur-Mer. MyRocks in all implementations but Facebook's, misses one useful way to get the details about deadlock(s) that happened in the past.

To summarize, while storing information about configurable number of last deadlocks in the table seems to be a good idea, in case of ROCKSDB in both Percona and MariaDB servers (as soon as all transactions involved in deadlock are completed one way or the other) we miss some details (like thread ids for sessions involved, exact locks that each transaction held etc) comparing to the text output provided by the original upstream statement (and SHOW ENGINE INNODB STATUS\G, surely). Even if we are lucky to query all tables in time, we still probably miss lock waits table (like innodb_lock_waits) and any built in way to store information in the error log about deadlocks that happened (and all locks involved). 

Note also lack of consistency in naming (rocksdb_locks, plural vs  rocksdb_deadlock, singular, in case of MariaDB), rocksdb_deadlock.lock_type with value EXCLUSIVE vs rocksdb_locks.mode with value X etc, and and very limited documentation available. In my opinion current state is unacceptable if we hope to see wide use of MyRocks by community users and DBAs outside of Facebook.

2 comments:

  1. With respect to current state, it takes time to grow an ecosystem. Experts like you writing about MyRocks move us in the right direction.

    ReplyDelete
  2. If you care, this is what one may get in Facebook's MySQL 5.6 for the same deadlock scenario:

    mysql> select * from t1 where id=2 for update;
    ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
    mysql> show engine rocksdb transaction status\G
    *************************** 1. row ***************************
    Type: rocksdb
    Name:
    Status:
    ============================================================
    2018-12-24 20:30:36 ROCKSDB TRANSACTION MONITOR OUTPUT
    ============================================================
    ---------
    SNAPSHOTS
    ---------
    LIST OF SNAPSHOTS FOR EACH SESSION:
    ---SNAPSHOT, ACTIVE 66 sec
    MySQL thread id 6, OS thread handle 0x7f8a48157700, query id 42 localhost root
    lock count 2, write count 0
    insert count 0, update count 0, delete count 0
    ----------LATEST DETECTED DEADLOCKS----------

    *** DEADLOCK PATH
    =========================================
    TIMESTAMP: 1545676192
    TRANSACTION ID: 7
    COLUMN FAMILY NAME: default
    WAITING KEY: 0000010180000001
    LOCK TYPE: EXCLUSIVE
    INDEX NAME: PRIMARY
    TABLE NAME: test.t1
    ---------------WAITING FOR---------------
    TIMESTAMP: 1545676192
    TRANSACTION ID: 8
    COLUMN FAMILY NAME: default
    WAITING KEY: 0000010180000002
    LOCK TYPE: EXCLUSIVE
    INDEX NAME: PRIMARY
    TABLE NAME: test.t1

    --------TRANSACTION ID: 8 GOT DEADLOCK---------
    -----------------------------------------
    END OF ROCKSDB TRANSACTION MONITOR OUTPUT
    =========================================

    1 row in set (0.00 sec)

    mysql> show create table information_schema.rocksdb_deadlock\G
    *************************** 1. row ***************************
    Table: ROCKSDB_DEADLOCK
    Create Table: CREATE TEMPORARY TABLE `ROCKSDB_DEADLOCK` (
    `DEADLOCK_ID` bigint(8) NOT NULL DEFAULT '0',
    `TIMESTAMP` bigint(8) NOT NULL DEFAULT '0',
    `TRANSACTION_ID` bigint(8) NOT NULL DEFAULT '0',
    `CF_NAME` varchar(193) NOT NULL DEFAULT '',
    `WAITING_KEY` varchar(513) NOT NULL DEFAULT '',
    `LOCK_TYPE` varchar(193) NOT NULL DEFAULT '',
    `INDEX_NAME` varchar(193) NOT NULL DEFAULT '',
    `TABLE_NAME` varchar(193) NOT NULL DEFAULT '',
    `ROLLED_BACK` bigint(8) NOT NULL DEFAULT '0'
    ) ENGINE=MEMORY DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)

    mysql> select * from information_schema.rocksdb_deadlock\G
    *************************** 1. row ***************************
    DEADLOCK_ID: 0
    TIMESTAMP: 1545676192
    TRANSACTION_ID: 7
    CF_NAME: default
    WAITING_KEY: 0000010180000001
    LOCK_TYPE: EXCLUSIVE
    INDEX_NAME: PRIMARY
    TABLE_NAME: test.t1
    ROLLED_BACK: 0
    *************************** 2. row ***************************
    DEADLOCK_ID: 0
    TIMESTAMP: 1545676192
    TRANSACTION_ID: 8
    CF_NAME: default
    WAITING_KEY: 0000010180000002
    LOCK_TYPE: EXCLUSIVE
    INDEX_NAME: PRIMARY
    TABLE_NAME: test.t1
    ROLLED_BACK: 1
    2 rows in set (0.00 sec)

    So, both ways work, and there is TIMESTAMP column in the table (unlike with Percona Server, even 8.0.13 that I checked).

    In text output it is absolutely clear what transaction got deadlock error and was rolled back.

    ReplyDelete