Saturday, February 15, 2014

Magic deadlock: what locks are really set by InnoDB?

Megabytes of text had been written already on InnoDB locking and deadlocks. Still, even very simple cases of deadlocks while working with a table having only one row sometimes make people wonder what happened and why.

Today I want to check if this topic is explained well in the manual and existing blog posts and understood properly. So, it's an exercise for my dear readers and those who like to report bugs as much as I do.

Let's consider a very simple example. In session #1 with default transaction isolation level execute the following:

  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `c1` (`c1`)
insert into tt values(1,1); -- insert a row there
select * from tt; -- check that we have row (1,1)
begin work;
select * from tt where c1=1 for update; -- we see a row (1,1) and now it's locked for us
update tt set id=id+1 where c1=1; -- and we safely increment the id

So, just a table with one row and two columns. One of columns is a PRIMARY KEY, other is UNIQUE. Why one may need a table like this? Some people (not me) may think this is a good way to create sequences and get globally unique ids generated for a set of rows, with sequence identifier being stored in column I've named c1. This happens, that's why...

Now, in session #2 execute the following:

select * from tt; -- we have row (1,1) there
select * from tt where c1=1 for update; -- let's try to use the sequence...

At this moment session #2 hangs waiting for a lock. It's expected - we actually updated the same row in an yet uncommitted transaction in session #1. This is clear.

Now, in session #1, try to do the following:

select * from tt where c1=1 for update; -- you'll see a row (2,1), as you've already incremented counter

But interesting thing happens in session #2 in the meantime:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

If you do not trust me, just try this with the following isolation level for both sessions:

mysql> select @@tx_isolation;
| @@tx_isolation  |
1 row in set (0.00 sec)

At this moment you can take a break in reading, and check the output of SHOW ENGINE INNODB STATUS from session #2. On MySQL 5.6.16 here I see the following (details of report may depend on server version and settings):

2014-02-15 18:09:20 16a8
TRANSACTION 36657, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 3, OS thread handle 0x1884, query id 55 localhost ::1 root statistics
select * from tt where c1=1 for update
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36657 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32

 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

TRANSACTION 36656, ACTIVE 10 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1184, 6 row lock(s), undo log entries 2
MySQL thread id 5, OS thread handle 0x16a8, query id 56 localhost ::1 root statistics
select * from tt where c1=1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36656 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32

 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36656 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32

 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;


This is your problem statement: can you explain the deadlock above based on the output of INNODB STATUS or some other sources of information? Do you know what exact locks are really set by each session?

Take your time to write down your exact explanation, in a comment here or in separate blog post. I wonder if your explanation mentions "intention" locks of any kind, "gap" locks, "next key" locks and so on. I've read very creative texts on similar topic, and would not mind to read some more in the comments... I also wonder to know how you are going to prove your points.

We see one record lock that session holds and two lock waits in the above. All of them are obviously related to poor record with values (1,1) in the unique index named c1... You surely know that primary key value is a part of any secondary key in InnoDB, so this is all clear. But why session that already got X lock on this record has to wait to get X lock on the same(?) record? Note also 6 row locks for one of transactions in the report above...

You can surely try to rely on the manual in explaining this. But there I fail to see even clear explanation of what exact locks are set by SELECT ... FOR UPDATE in cases like this. All it says is the following:
"For SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE, locks are acquired for scanned rows, and expected to be released for rows that do not qualify for inclusion in the result set (for example, if they do not meet the criteria given in the WHERE clause). However, in some cases, rows might not be unlocked immediately because the relationship between a result row and its original source is lost during query execution."
and later:
"For index records the search encounters, SELECT ... FROM ... FOR UPDATE blocks other sessions from doing SELECT ... FROM ... LOCK IN SHARE MODE or from reading in certain transaction isolation levels."

I've already reported some documentation requests while trying to explain this deadlock based on the manual. I'll list them all eventually. I hope you'll report several more bugs while trying to solve and discussing this exercise.

Now, why should you care? Because I am going to buy a beer (or any drink you prefer) for everybody who will post good (even if not entirely correct) explanation of this deadlock and make me aware of it, and/or report new valid bug based on this test case! I may be able to do this during PLMCE 2014 (still not sure if I go there, waiting for the decisions on 2 BOF sessions I've submitted) or during some conference later.

So, let's fun begin! You can expect summary post about this exercise (with a list of winners, my own explanation of this case, if still needed, and some ways to study the details) before April 1, 2014 :)

Sunday, February 9, 2014

On responsible bugs reporting

Let me start with questions related to responsible MySQL bugs reporting that I'd like to be discussed and then present a history behind them.

Assuming that you, my dear reader from MySQL Community, noted or found some simple sequence of SQL statements that, when executed by authenticated MySQL user explicitly having all the privileges needed to execute these statements, crashes some version of your favorite MySQL fork, please, answer the following questions:
  1. Do you consider this kind of a bug a "security vulnerability"?
  2. Should you share complete test case at any public site (MySQL bugs database, Facebook, your personal blog, any)?
  3. Should you share just a description of possible "attack vector", as Oracle does when they publish security bug fixes?
  4. Should you share just a stack trace or failed assertion information, without any details on how to get it?
  5. Should you inform vendor of your MySQL fork about it in a private email officially ( in case of Oracle)?
  6. Should you try to have it added to CVE database and, if you should, what is the best way to do this?
  7. Should you try to use any informal, but still private ways share this information with MySQL vendors and other interested parties?
  8. What is the best way to do "responsible bugs reporting" in cases like this?

Now, the history behind these questions. Usually when I have to deal with any MySQL issue not obviously caused by a user mistake or misconfiguration, I suspect that the problem may be related to some (known or yet unknown, fixed or still active) MySQL bug. I search MySQL bugs database for error messages and codes, keywords, stack trace entries or assertions user noted, at the earliest stage of investigation. This approach works well enough for me for more than 8 years, as even if the problem was not related to any bug, similar reports may give explantions, workarounds or other hints on possible reasons.

To be able to search bugs faster and have more things readily available "from the top of my head" I try to read new public bug reports whenever they appear. Sometimes I send comments or hints. Often enough I see simple test cases there to check. And I do try to run these test cases in a hope to see the problem more clearly or find out that it is not repeatable for me (to try to understand why is that so later). I do this during my free time and while I work, I do this this for fun and when colleagues ask me for some assistance or, specifically, for any bug that may be related to their problem at hand.

I used to share my finding immediately on Facebook by posting URLs to bugs with my comments, but this activity is now stopped at least till PLMCE 2014 in a hope to find some better way of cooperation on bug fixing and as a part of one simple experiment for my talk proposed for that conference (that was not selected by the committee anyway). So, since the end of November, 2013 I share (rarely) only some questions or test cases from time to time, and ask colleagues to try them on their favorite MySQL, MariaDB or Percona Server versions, if they are also curious.

This week during my usual review of new bug reports in the morning I've noted an "Open" bug with a simple test case, just a couple of UPDATEs for some InnoDB table having a single row of data, that, according to the bug reporter, crashed recently released MySQL 5.6.16 (that I blogged about just a day before) and 5.5.36. Surely I've immediately tried this test case, that crashed my test instances because of assertion failure. I've waited for some time to see that the bug is still open, informed few of my colleagues hanging around at IRC chat about new crashing bug found, waited for some more time and... posted test case in a status update on Facebook. This post got some comments that suspected crash as a result, but not so many. I have maybe 200 friends there and most of them are actually current or former colleagues, MySQL engineers or software developers who work on MySQL and other databases in Oracle, HP, SkySQL, Percona and few other companies. My idea was to make them all immediately aware about potential problem. I've noted that very soon after this post the bug report had "disappeared" - it was marked as Private probably and, I hope, verified and got proper attention from Oracle engineers.

Next morning while checking recent emails quickly I've noted message from my former colleague, who now works in Oracle, asking to remove that Facebook post as explicit publishing of a way to crash MySQL is not a "responsible reporting" - I had to give Oracle a chance and time to fix this, as a potential "security vulnerability".

I've immediately deleted the post based just on respect to the person who asked, but internally I had doubts on what was actually a proper thing to do in this case, and I even felt insulted for a while by this (yet another) attempt to shut me up and stop me from discussing MySQL bugs in public (not that this really happens, but I have feelings like that from time to time). So, I tried to discuss this with my colleagues and found out that some of them do think the bug is actually a potential security vulnerability and thus had to be private. Others (like me) do not believe in any "security through obscurity" practice, especially for open source software, where any interested person can just read the code, find assertion there and then just think when and why it can be hit, based on code analysis.

Formal Oracle policy in this case is well known: "If you are not a customer or partner, please email with your discovery". Customers should just open a Support Request.

I was explicitly against applying this policy to MySQL (I've suggested to create public bug reports in my New Year 2013 wishes to Oracle customers back in 2012) among other reasons because "security bugs" and "security issues" have more chances internally in Oracle to become forgotten as they get less visibility (at the same time as they get higher priority) and very few people in Oracle can access or escalate or otherwise monitor them.

Surely, this policy can not be enforced on community members and individuals like me. In the worst case if somebody discloses details of the security vulnerability, she will not get a credit, as: "In order to receive credit, security researchers must follow responsible disclosure practices".

That's the whole story so far. In my case it was not me who found the bug, so why should I care even about credit. My action caused immediate attention from Oracle side and thus I can be sure this specific bug is not forgotten and fixed as soon as possible (by the way, MySQL 5.5.35 is also affected by the same assertion failure, so this is NOT a recent regression and those who use older versions in production are potentially affected). I am sure my post was also noted by "security officers" in MariaDB Foundation and Percona, so in case these forks are also affected we now have chances to get fixes there as well, no matter when and how Oracle fixes the problem.

Still, this case and related private discussions led me to the questions at the beginning of this blog post. I'd like to get your opinions on them.

Sunday, February 2, 2014

Fun with Bugs #29 - important bug fixes in MySQL 5.6.16

As you should already know, Oracle had released MySQL 5.6.16 officially on January 31, 2014. You can find all the details in official release notes. In this post I decided to concentrate on important fixes for bugs reported by community in 4 most important categories: InnoDB, replication, partitioning and optimizer.

Let's start with Bug #70768, "Persistent optimizer statistics often causes LOCK_open stalls". It's well known that persistent statistics for InnoDB tables stored in two (again, InnoDB) tables in mysql database may cause various problems, not only bad plans based on outdated statistics there. One of these problems seems solved in 5.6.16, but note that the bug report itself is closed without any specific comment on what exactly was fixed.

Hardly you are affected by bugs like Bug #70087, "InnoDB can not use the doublewrite buffer properly", every day. But it's great that yet another case of corruption when recovery is possible is now covered. Another problem in case of InnoDB recovery on a server where crash-safe (with information stored in tables) replication is configured was reported as Bug #69907. Here instead of a real fix (if it's at all possible) we've just got more clear error message explaining the inability to store replication related information in tables while innodb_force_recovery > 0.

Having ALTER TABLE statement on InnoDB to work "in place", without making copy of the entire table, is a great feature. Unfortunately for some cases it had not performed the change expected. One of this cases, resetting AUTO_INCREMENT with ALTER TABLE, now works again without explicitly mentioned ALGORITHM thanks to the fix for Bug #69882. Good to see this kind of a regression comparing to older MySQL versions fixed. Another fix of a regression bug, Bug #69424, finally allowed to use raw devices for InnoDB shared tablespace in MySQL 5.6. Not that it's common these days, but still could cause problems during upgrade.

Yet another bug in ALTER TABLE, Bug #69325, noted and reported by my colleague Justin Swanhart, could badly affect users with big partitioned InnoDB tables. Attempts to add indexes to such a table caused huge and unexpected use of memory. Now the amount of memory used is somewhat reduced it seems.

The last but not the least, Oracle fixed my Bug #68079! A year of work, detailed analysis, workarounds suggested and now a real fix with totally new implementation! Well done, really! I should test the fix tomorrow. It's not the first time I write about this bug (I've created entire presentation around it), and not the last...

For partitioning itself (besides InnoDB-related cases mentioned above) MySQL 5.6.16 had not added many fixes. Still, fix for Bug #70588 should be mentioned, as now we have one less case of wrong results against partitioned table. I am surprised badly that, for example, fix for my Bug #71095 (that is public since December 6, 2013, and really helped to solve a problem) is NOT included.

In replication category Oracle engineers were mostly concentrated on fixing bugs reported internally. I see only this (looking minor enough) semi-sync replication public bug fixed, Bug #70410.

I had not expected many fixes in optimizer, but there are fixes to mention. Check Bug #69005, for example, if you order by columns with utf8_bin collation. Now this regression bug is fixed and order should be correct. Another case of ORDER BY failing, this time with views, is fixed in frames of Bug #69678.

Some other optimizer bug fixes to check are the following:
  • Bug #70025, "Complex updates of Performance Schema tables involving joins or subqueries failed to update every row."
  • Bug #71028, "COUNT(DISTINCT) sometimes produced an incorrect result when the last read row contained a NULL value."
  • Bug #70608, "Incorrect result could be returned for a query with an IF() predicate in the WHERE clause combined with OUTER JOIN in a subquery that is transformed to a semi-join.". It was a regression comparing to 5.5, so more good news to those who plan to upgrade.
 There are some other fixes I can not comment on right now, as bug report still remains private. So, we can only trust release notes in cases like this:

"A full-text search combined with derived tables (subqueries in the FROM clause) caused a server exit.

Now if a full-text operation depends on a derived table, the server produces an error indicating that a full-text search cannot be done on a materialized table. (Bug #68751, Bug #16539903)"

To summarize, 5.6.16 seems useful release, with many fixes in InnoDB and optimizer, including fixes to regression bugs and performance improvements. I had not tested it in production yet, but at least upgrade from 5.6.15 on Windows was seamless and I have it up and running for further checks. Stay tuned!