Saturday, December 8, 2018

What May Cause MySQL ERROR 1213

Probably all of us, MySQL users, DBAs and developers had seen error 1213 more than once, in one context or the other:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
The first thing that comes to mind in this case is: "OK, we have InnoDB deadlock, let's check the details", followed by the SHOW ENGINE INNODB STATUS check, like this:
mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2018-12-08 17:41:11 0x7f2f8b8db700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 12 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 59 srv_active, 0 srv_shutdown, 14824 srv_idle
srv_master_thread log flush and writes: 14882
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 326
OS WAIT ARRAY INFO: signal count 200
RW-shared spins 0, rounds 396, OS waits 195
RW-excl spins 0, rounds 120, OS waits 4
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 396.00 RW-shared, 120.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 14960
Purge done for trx's n:o < 14954 undo n:o < 0 state: running but idle
History list length 28
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421316960193880, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421316960192752, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
...
Now, what if you get the output like the one above? Without any LATEST DETECTED DEADLOCK section? I've seen people wondering how is it even possible and trying to find some suspicious bug somewhere...

Do not be in a hurry - time to recall that there are actually at least 4 quite common reasons to get error 2013 in modern (5.5+) version of MySQL, MariaDB and Friends:
  1. InnoDB deadlock happened
  2. Metadata deadlock happened
  3. If you are lucky enough to use Galera cluster, Galera conflict happened
  4. Deadlock happened in some other storage engine (for example, MyRocks)
I am not lucky enough to use MySQL's group replication yet, but I know that conflicts there are also possible. I am just not sure if error 1213 is also reported in that case. Feel free to check with a test case similar to the one I've used for Galera below.

I also suspect deadlocks with other engines are also possible. As a bonus point, I'll demonstrate the deadlock with MyRocks also.

Let's reproduce these 3 cases one by one and check how to get more information on them. In all cases it's enough to have at most 2 InnoDB tables with just two rows:
mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from t1;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)

mysql> select * from t2;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
We'll need two sessions, surely.

InnoDB Deadlock

With InnoDB and tables above it's really easy to end up with a deadlock. In the first session execute the following:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)
In the second session execute:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0.02 sec)
Now in the first session try to access the row with id=2 asking for incompatible lock:
mysql> select * from t1 where id = 2 for update;
This statement hangs waiting for a lock (up to innodb_lock_wait_timeout seconds). Try to access the row with id=1 asking for incompatible lock in the second session, and you'll get the deadlock error:
mysql> select * from t1 where id = 1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
at this moment SELECT in the first transaction returns data:
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (5.84 sec)
It's that simple, one table and two rows is enough. We can get the details in the output of SHOW ENGINE INNODB STATUS:
...
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-12-08 18:32:59 0x7f2f8b8db700
*** (1) TRANSACTION:
TRANSACTION 15002, ACTIVE 202 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 8, OS thread handle 139842181244672, query id 8545 localhost root statistics
select * from t1 where id = 2 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15002 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 15003, ACTIVE 143 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 9, OS thread handle 139842181510912, query id 8546 localhost root statistics
select * from t1 where id = 1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15003 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 94 page no 3 n bits 72 index PRIMARY of table `test`.`t1` trx id 15003 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
...
In the case above I've used Percona Server  5.7.24-26 (why not). Details of output may vary depending on version (and bugs it has :).  If you use MariaDB 5.5+, in case of InnoDB deadlock special innodb_deadlocks status variable is also incremented.

Metadata Deadlock

Unlike with InnoDB deadlocks, chances that you've seen deadlocks with metadata locks involved are low. One may spend notable time trying to reproduce such a deadlock, but (as usual) quck check of MySQL bugs database may help to find an easy to reproduce case. I mean Bug #65890 - "Deadlock that is not a deadlock with transaction and lock tables".

So, let's try the following scenario with two sessions and out InnoDB tables, t1 and t2. In one session:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t2 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
In another session:
mysql> lock tables t1 write, t2 write;
It hangs, waiting as long as lock_wait_timeout. We can check what happens with metadata locks using performance_schema.metadata_locks table (as we use MySQL or Percona Server 5.7+, more on setup, alternatives for MariaDB etc here and there). In the first session:
mysql> select * from performance_schema.metadata_locks;
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | OBJECT_INSTANCE_BEGIN | LOCK_TYPE            | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
| TABLE       | test               | t2             |       139841686765904 | SHARED_WRITE         | TRANSACTION   | GRANTED     |        |              45 |           2850 || GLOBAL      | NULL               | NULL           |       139841688088672 | INTENTION_EXCLUSIVE  | STATEMENT     | GRANTED     |        |              46 |            205 |
| SCHEMA      | test               | NULL           |       139841688088912 | INTENTION_EXCLUSIVE  | TRANSACTION   | GRANTED     |        |              46 |            205 |
| TABLE       | test               | t1             |       139841688088992 | SHARED_NO_READ_WRITE | TRANSACTION   | GRANTED     |        |              46 |            207 |
| TABLE       | test               | t2             |       139841688089072 | SHARED_NO_READ_WRITE | TRANSACTION   | PENDING     |        |              46 |            208 |
| TABLE       | performance_schema | metadata_locks |       139841686219040 | SHARED_READ          | TRANSACTION   | GRANTED     |        |              45 |           3003 |
+-------------+--------------------+----------------+-----------------------+----------------------+---------------+-------------+--------+-----------------+----------------+
6 rows in set (0.00 sec)
As soon as we try this in the first session:
mysql> select * from t1;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
we get the same deadlock error 1213 and LOCK TABLES in the second session completes. We can find nothing about this deadlock in the output of SHOW ENGINE INNODB STATUS (as shared at the beginning of this post). I am also not aware about any status variables to count metadata deadlocks.

You can find some useful information about metadata deadlocks in the manual.

Galera Conflict

For simplicity I'll use MariaDB 10.1.x and simple 2 nodes setup on the same box as I described here. I'll start first node as a new cluster and create tables for this test:
openxs@ao756:~/dbs/maria10.1$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &
[1] 13022
openxs@ao756:~/dbs/maria10.1$ 181208 20:40:52 mysqld_safe Logging to '/tmp/mysql-node1.err'.
181208 20:40:52 mysqld_safe Starting mysqld daemon with databases from /home/openxs/galera/node1

openxs@ao756:~/dbs/maria10.1$ bin/mysql  --socket=/tmp/mysql-node1.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.34-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> drop table t1, t2;
ERROR 1051 (42S02): Unknown table 'test.t2'
MariaDB [test]> create table t1(id int, c1 int, primary key(id));
Query OK, 0 rows affected (0.29 sec)

MariaDB [test]> create table t2(id int, c1 int, primary key(id));
Query OK, 0 rows affected (0.22 sec)

MariaDB [test]> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0.07 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into t2 values (1,1), (2,2);
Query OK, 2 rows affected (0.18 sec)
Records: 2  Duplicates: 0  Warnings: 0
Then I'll start second node, make sure it joined the cluster and has the same data:
openxs@ao756:~/dbs/maria10.1$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &
[2] 15110
openxs@ao756:~/dbs/maria10.1$ 181208 20:46:11 mysqld_safe Logging to '/tmp/mysql-node2.err'.
181208 20:46:11 mysqld_safe Starting mysqld daemon with databases from /home/openxs/galera/node2

openxs@ao756:~/dbs/maria10.1$ bin/mysql --socket=/tmp/mysql-node2.sock test     Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.34-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 4                                    |
| wsrep_cluster_size       | 2                                    |
| wsrep_cluster_state_uuid | b1d227b1-0211-11e6-8ce0-3644ad2b03dc |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.04 sec)

MariaDB [test]> select * from t2;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.02 sec)
Now we are ready to try to provoke Galera conflict. For this we have to try to update the same data in transactions on two different nodes. In one session connected to node1:
MariaDB [test]> select @@wsrep_node_name;
+-------------------+
| @@wsrep_node_name |
+-------------------+
| node1             |
+-------------------+
1 row in set (0.00 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> update test.t1 set c1 = 5 where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


In another session connected to other node:

MariaDB [test]> select @@wsrep_node_name;
+-------------------+
| @@wsrep_node_name |
+-------------------+
| node2             |
+-------------------+
1 row in set (0.00 sec)

MariaDB [test]> start transaction;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> update test.t1 set c1 = 6 where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
Now in the first we can COMMIT successfully:
MariaDB [test]> commit;
Query OK, 0 rows affected (0.12 sec)
But if we try to COMMIT in the second:
MariaDB [test]> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
We get that same error 1213 about the deadlock. Surely you'll see nothing about this deadlock in INNODB STATUS output, as it was NOT an InnoDB deadlock, but Galera conflict. Check these status variables on the node2:
MariaDB [test]> show status like 'wsrep_local%';
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_local_bf_aborts      | 1                                    |
| wsrep_local_cached_downto  | 75                                   |
| wsrep_local_cert_failures  | 0                                    |
| wsrep_local_commits        | 0                                    |
| wsrep_local_index          | 0                                    |
| wsrep_local_recv_queue     | 0                                    |
| wsrep_local_recv_queue_avg | 0.000000                             |
| wsrep_local_recv_queue_max | 1                                    |
| wsrep_local_recv_queue_min | 0                                    |
| wsrep_local_replays        | 0                                    |
| wsrep_local_send_queue     | 0                                    |
| wsrep_local_send_queue_avg | 0.000000                             |
| wsrep_local_send_queue_max | 1                                    |
| wsrep_local_send_queue_min | 0                                    |
| wsrep_local_state          | 4                                    |
| wsrep_local_state_comment  | Synced                               |
| wsrep_local_state_uuid     | b1d227b1-0211-11e6-8ce0-3644ad2b03dc |
+----------------------------+--------------------------------------+
17 rows in set (0.01 sec)
If wsrep_local_bf_aborts > 0, you had conflicts and local transaction was rolled back to prevent them. We can see that remote one wins, on node2:
MariaDB [test]> select * from t1;
+----+------+
| id | c1   |
+----+------+
|  1 |    5 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
To summarize, in Galera "first commit wins" and local transaction involved in conflict is always a looser. You can get a lot of information about conflicts in the error log if you enable conflict logging features through wsrep_log_conflicts and cert.log_conflicts. See this fine manual for details.

MyRocks Deadlock

We can easily check how deadlocks are processed by MyRocks by just loading the plugin for the engine, converting tables to MyRocks and trying the same InnoDB scenario with the same Percona Server we used initially. But first, if you use Percona binaries you have to install a separate package:
openxs@ao756:~$ dpkg -l | grep rocksdb
openxs@ao756:~$ sudo apt-get install percona-server-rocksdb-5.7
[sudo] password for openxs:
Reading package lists... Done
Building dependency tree
...
Unpacking percona-server-rocksdb-5.7 (5.7.24-26-1.trusty) ...
Setting up percona-server-rocksdb-5.7 (5.7.24-26-1.trusty) ...


 * This release of Percona Server is distributed with RocksDB storage engine.
 * Run the following script to enable the RocksDB storage engine in Percona Server:

        ps-admin --enable-rocksdb -u <mysql_admin_user> -p[mysql_admin_pass] [-S <socket>] [-h <host> -P <port>]
Percona's manual has a lot more details and relies on separate ps-admin script, but basically you have to INSTALL PLUGINs like this (check script's code):
mysql> INSTALL PLUGIN ROCKSDB SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.86 sec)

mysql> INSTALL PLUGIN ROCKSDB_CFSTATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_DBSTATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.08 sec)

mysql> INSTALL PLUGIN ROCKSDB_PERF_CONTEXT SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_PERF_CONTEXT_GLOBAL SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_CF_OPTIONS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_GLOBAL_INFO SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_COMPACTION_STATS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_DDL SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)

mysql> INSTALL PLUGIN ROCKSDB_INDEX_FILE_MAP SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_LOCKS SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_TRX SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.05 sec)

mysql> INSTALL PLUGIN ROCKSDB_DEADLOCK SONAME 'ha_rocksdb.so';
Query OK, 0 rows affected (0.06 sec)
Then check that the engine is there and convert tables:
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                    | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------+--------------+------+------------+
| ROCKSDB            | YES     | RocksDB storage engine                                                     | YES          | YES  | YES        |
...

mysql> alter table t1 engine=rocksdb;
Query OK, 2 rows affected (0.64 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> alter table t2 engine=rocksdb;
Query OK, 2 rows affected (0.58 sec)
Records: 2  Duplicates: 0  Warnings: 0
Now we are ready to try the same InnoDB scenario. Just note that lock wait timeout for MyRocks is defined by the rocksdb_lock_wait_timeout that is small by default, 1 second, do you have have to increase it first. You also have to set rocksdb_deadlock_detect to ON (as it's OFF by default):
mysql> set global rocksdb_lock_wait_timeout=50;
Query OK, 0 rows affected (0.00 sec)

mysql> set global rocksdb_deadlock_detect=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> \r
Connection id:    14
Current database: test

mysql> start transaction;
Query OK, 0 rows affected (0.02 sec)

mysql> select * from t1 where id = 1 for update;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
+----+------+
1 row in set (0.00 sec)
Then in the second session:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where id = 2 for update;
+----+------+
| id | c1   |
+----+------+
|  2 |    2 |
+----+------+
1 row in set (0.00 sec)
In the first:
mysql> select * from t1 where id = 2 for update;
and in the second we can get deadlock error:
mysql> select * from t1 where id = 1 for update;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
mysql> show global status like '%deadlock%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| rocksdb_row_lock_deadlocks | 1     |
+----------------------------+-------+
1 row in set (0.00 sec)
Note that MyRocks has status variable to count deadlocks. Note that Percona Server still does NOT seem to support SHOW ENGINE ROCKSDB TRANSACTION STATUS statement available upstream:
mysql> show engine rocksdb transaction status\G
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'transaction status' at line 1
I was not able to find a bug about this (sorry if I missed it), and just reported new task to Percona's JIRA: PS-5114 - "Add support for SHOW ENGINE ROCKSDB TRANSACTION STATUS".

That's probably more than enough for single blog post (that is mostly NOT about bugs). One day I'll refresh my knowledge of MyRocks etc and maybe write more about deadlocks troubleshooting there.

Do not be surprised if you can not find anything in INNODB STATUS when you get error 1213, just proceed with further steps. There are other reasons to explore. Venice hides a lot early in the morning...

To summarize, do not be surprised that after you got MySQL error 1213 you see no information about recent InnoDB deadlock - there are at least 3 more reasons for this error to be reported, as explained above. You should know your configuration and use several other commands and sources of information to pinpoint what exactly happened and why.

No comments:

Post a Comment