Today I want to check if this topic is explained well in the manual and existing blog posts and understood properly. So, it's an exercise for my dear readers and those who like to report bugs as much as I do.
Let's consider a very simple example. In session #1 with default transaction isolation level execute the following:
CREATE TABLE `tt` (
`id` int(11) NOT NULL,
`c1` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c1` (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tt values(1,1); -- insert a row there
select * from tt; -- check that we have row (1,1)
begin work;
select * from tt where c1=1 for update; -- we see a row (1,1) and now it's locked for us
update tt set id=id+1 where c1=1; -- and we safely increment the id
So, just a table with one row and two columns. One of columns is a PRIMARY KEY, other is UNIQUE. Why one may need a table like this? Some people (not me) may think this is a good way to create sequences and get globally unique ids generated for a set of rows, with sequence identifier being stored in column I've named c1. This happens, that's why...
Now, in session #2 execute the following:
select * from tt; -- we have row (1,1) there
select * from tt where c1=1 for update; -- let's try to use the sequence...
At this moment session #2 hangs waiting for a lock. It's expected - we actually updated the same row in an yet uncommitted transaction in session #1. This is clear.
Now, in session #1, try to do the following:
select * from tt where c1=1 for update; -- you'll see a row (2,1), as you've already incremented counter
But interesting thing happens in session #2 in the meantime:
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
If you do not trust me, just try this with the following isolation level for both sessions:
mysql> select @@tx_isolation;
+-----------------+
| @@tx_isolation |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.00 sec)
At this moment you can take a break in reading, and check the output of SHOW ENGINE INNODB STATUS from session #2. On MySQL 5.6.16 here I see the following (details of report may depend on server version and settings):
------------------------
LATEST DETECTED DEADLOCK
------------------------
2014-02-15 18:09:20 16a8
*** (1) TRANSACTION:
TRANSACTION 36657, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 3, OS thread handle 0x1884, query id 55 localhost ::1 root statistics
select * from tt where c1=1 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36657 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000001; asc ;;
*** (2) TRANSACTION:
TRANSACTION 36656, ACTIVE 10 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1184, 6 row lock(s), undo log entries 2
MySQL thread id 5, OS thread handle 0x16a8, query id 56 localhost ::1 root statistics
select * from tt where c1=1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36656 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000001; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36656 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000001; asc ;;
1: len 4; hex 80000001; asc ;;
*** WE ROLL BACK TRANSACTION (1)
This is your problem statement: can you explain the deadlock above based on the output of INNODB STATUS or some other sources of information? Do you know what exact locks are really set by each session?
Take your time to write down your exact explanation, in a comment here or in separate blog post. I wonder if your explanation mentions "intention" locks of any kind, "gap" locks, "next key" locks and so on. I've read very creative texts on similar topic, and would not mind to read some more in the comments... I also wonder to know how you are going to prove your points.
We see one record lock that session holds and two lock waits in the above. All of them are obviously related to poor record with values (1,1) in the unique index named c1... You surely know that primary key value is a part of any secondary key in InnoDB, so this is all clear. But why session that already got X lock on this record has to wait to get X lock on the same(?) record? Note also 6 row locks for one of transactions in the report above...
You can surely try to rely on the manual in explaining this. But there I fail to see even clear explanation of what exact locks are set by SELECT ... FOR UPDATE in cases like this. All it says is the following:
"Forand later:SELECT ... FOR UPDATE
orSELECT ... LOCK IN SHARE MODE
, locks are acquired for scanned rows, and expected to be released for rows that do not qualify for inclusion in the result set (for example, if they do not meet the criteria given in theWHERE
clause). However, in some cases, rows might not be unlocked immediately because the relationship between a result row and its original source is lost during query execution."
"For index records the search encounters,SELECT ... FROM ... FOR UPDATE
blocks other sessions from doingSELECT ... FROM ... LOCK IN SHARE MODE
or from reading in certain transaction isolation levels."
I've already reported some documentation requests while trying to explain this deadlock based on the manual. I'll list them all eventually. I hope you'll report several more bugs while trying to solve and discussing this exercise.
Now, why should you care? Because I am going to buy a beer (or any drink you prefer) for everybody who will post good (even if not entirely correct) explanation of this deadlock and make me aware of it, and/or report new valid bug based on this test case! I may be able to do this during PLMCE 2014 (still not sure if I go there, waiting for the decisions on 2 BOF sessions I've submitted) or during some conference later.
So, let's fun begin! You can expect summary post about this exercise (with a list of winners, my own explanation of this case, if still needed, and some ways to study the details) before April 1, 2014 :)