Saturday, May 31, 2014

Fun with Bugs #33 - bugs fixed in MySQL 5.6.19

It seems Oracle released MySQL 5.6.19 yesterday. So, it's time to check what community bug reports are fixed there.

Let's start with InnoDB. We have the following bugs fixed:
  • Bug #72079, "Upgrade from 5.6.10 to 5.6.16 crashes and leaves unusable DB". Honestly I had not noted this bug, even though I upgrade several instances on Windows to each and every release since 5.6.8. Probably because I had no need to use FTS indexes till recently...
  • Bug #71014, about two extra memset calls that are now removed. I hope the patch bug report provided was just used.
  • Bug #69641, "Crash when enabling innodb_table_monitor (assertion: ib_table->stat_initialized)". Good to see one more crash fixed.
  • Bug #62534, "off by one error in innodb_max_dirty_pages_pct logic". Reported by Domas and fixed by Facebook long time ago. But it seems current fix from Oracle is somewhat more complex than just replacing ">" with ">="... Maybe my comment about Informix having decimal (instead of integer) as a data type for similar variable helped :)
The following replication bugs are fixed in 5.6.19:
  • Bug #71070, "kill of truncate table will lead to binary log written while rows remains". I am really happy to this it fixed finally!
  • Bug #71376, "restart of slave instance fail in GTID replication if we use replicate-ignore-db". Also great to see this fixed. Note that to add more confusion, Bug #71326 is mentioned instead as replication bug that is fixed, but this is actually a MySQL Workbench bug. I tried to guess what other community bug it could really be, and ended up with this one. Let's home release notes will be fixed soon accordingly.
  • Bug #71179, "CREATE TABLE .. SELECT produces invalid structure, breaks RBR". Replication is just one of the areas affected though. Let me quote details of the fix (added by Laurynas Biveinis, he does this for each and every community bug fixed. I wonder why Oracle does NOT care to do that themselves?):

      This fix changes the DIV operation implementation to correctly
      evaluate the display width of its result. We check if DIV's
      results estimated width crosses maximum width for integer
      value (21) and if yes set it to this maximum value.
  • Bug #70891 is also listed as fixed, but it is still private, so we can only guess what exactly it was about or just trust release notes... 
 On top of that, few old bugs from other categories were also fixed:
  • Bug #66803, "Allow compilation using Clang in C++11 mode". Clang is becoming popular. Still not sure why it took so much time to fix the problem when original bug report had a patch.
  • Bug #71089, "CMake warning when generating Makefile". No more warning for new CMake versions.
  • Bug #51840, "infinite hang with 100% cpu usage with load data local and import errors". Reported by Shane more than 4 years ago, it was fixed only in 5.6.19. This fix may be the only really good reason to upgrade to 5.6.19, by the way!
  • Bug #69683, "Optimize after a delete returns error 0 can't get stat of MYD file". Actually, it was a problem for MyISAM tables with .MYD files larger than 4GB on Windows. If you care about Windows and MyISAM (really?), then you have one more good reason to upgrade.
  • Bug #69684, "The update and select behave differently under the same conditions". Probably the only fix from the Optimizer team in this release.
That's all, folks! Just 13 community bugs fixed over 2 months, according to the official Release Notes (that has a bug...). I am not impressed and I see a reason to upgrade only if you care about top INSERT performance for InnoDB, or if you use GTID and/or row-based replication setup, or may get LOAD DATA LOCAL statements for files with uncontrolled content. Other fixed are for some rare/corner cases or compilation problems, and I'd happily wait for 5.6.20 to fix some more bugs I so much care about...

Wednesday, April 2, 2014

Fun with Bugs #32 - some bugs I've reported in March

Comparing to the previous month I was not really productive bug reporter in March 2014 (partially because I spent few days at a nice FLOSS UK conference where I tried to give a session on PERFORMANCE_SCHEMA). Just 12 reports, of them 5 documentation requests are already closed. There are some interesting reports among other 7 to write about though.

But let me start with good (or not entirely good) news about my older report, Bug #71858 (easy way to crash MySQL with single SELECT statement, at least on Windows and maybe everywhere). The bug seems to be fixed by Oracle, but the fix will appear in 5.5.38, 5.6.18, and 5.7.5, not in recent official releases.

Off all the documentation requests I'd like to highlight this one, Bug #71916, "Manual does NOT explain locks set for UPDATE ... WHERE PK='const' properly". I was so much surprised to see even basic details about InnoDB locks ad monitoring them not entirely or correctly documented thatI even decided to submit a talk about this of next Oracle's MySQL Connect event (and I tried once, but then probably clicked at some wrong place and my submission disappeared).... Do you think this makes any sense, or should we better just wait for MySQL Documentation team to explain it all eventually?

Another important cases of the "missing manual"are presented by my Bug #72047, "Manual does not explain names for indexes generated for FOREIGN keys properly", that I've reported while trying to study what happens when MySQL creates index for the foreign key implicitly and then InnoDB "switches" to a new suitable index that is explicitly created later (more on this to be written one day...), and by the latest my report, Bug #72133, "Manual does not provide enough details about InnoDB FTS implementation".

While working on a (usual) customer issue about ibdata1 growing even with innodb_file_per_table=1 in MySQL 5.6 I've noted more serious missing detail about FTS implementation represented by a sever bug report, Bug #72132, "Auxiliary tables for InnoDB FTS indexes are always created in shared tablespace". Had you even noted tables like these?

mysql> select name, space from information_schema.innodb_sys_tables where name like 'test/FTS%';
+----------------------------------------------------+-------+
| name                                               | space |
+----------------------------------------------------+-------+
| test/FTS_000000000000002e_0000000000000059_INDEX_1 |     0 |
| test/FTS_000000000000002e_0000000000000059_INDEX_2 |     0 |
| test/FTS_000000000000002e_0000000000000059_INDEX_3 |     0 |
| test/FTS_000000000000002e_0000000000000059_INDEX_4 |     0 |
| test/FTS_000000000000002e_0000000000000059_INDEX_5 |     0 |
| test/FTS_000000000000002e_0000000000000059_INDEX_6 |     0 |
| test/FTS_000000000000002e_BEING_DELETED            |    35 |
| test/FTS_000000000000002e_BEING_DELETED_CACHE      |    36 |
| test/FTS_000000000000002e_CONFIG                   |    37 |
| test/FTS_000000000000002e_DELETED                  |    33 |
| test/FTS_000000000000002e_DELETED_CACHE            |    34 |
... 

Not only they are not documented anywhere outside this old article, but those *_INDEX_* ones are always created in shared tablespace! I consider this a server bug to fix, not just something to document.

Probably I am not the first one, but I consider this mysql client problem, Bug #72108, related to libedit now used by Oracle mysql client instead of readline really annoying... Some users even claim that other versions/forks of MySQl are already good enough just because they still use readline :)

Finally, one of my reports ended up as "Not a bug": Bug #71978, "Server silently allows to set PERFORMANCE_SCHEMA as default_storage_engine". I've got a kind and polite offer like this:
"Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php"
that I probably has to be insulted by, as a person who had not only reported many dozens of good "Verified" valid bugs, but also the one who had a chance to use the same boilerplate standard reply for 7+ years and hardly used it more than maybe 10 times over these years... What's even more interesting, for a bug that was "Verified" by Oracle engineer (and my former manager in bugs verification team) originally I've got explanations (that I am thankful for) and kind offer:
"So your request is at best a feature request (and feel free to re-open as such). But I see little value in implementing this, specially since you're getting a meaningful error message."
So, I did that. I still wonder why the decision to make feature request out of it was not made without original reporter involved...

Sunday, March 30, 2014

Fun with Bugs #31 - what's new in MySQL 5.6.17

MySQL 5.6.17 will probably be announced loudly at or immediately before Percona Live MySQL Conference & Expo next week. But official release announcement via email was made on March 28, release notes and binaries to download are already available, so why not to check them carefully to find out what to expect from this 8th minor release of MySQL 5.6 GA...

First of all, it seems Oracle still does not hesitate to introduce new features and behavior in the process. Just check these major changes:
  • Starting with 5.6.17, MySQL now supports rebuilding regular and partitioned InnoDB tables using online DDL (ALGORITHM=INPLACE) for OPTIMIZE TABLES, ALTER TABLE ... FORCE and ALTER TABLE ... ENGINE=InnoDB. Rebuilding tables (to defragment them and free some disk space mostly) is probably as popular kind of ALTER as adding new columns or changing indexes, so it's great to see a built-in way to do these ALTERs without blocking DML (well, assuming the table does not have foreign keys at least). Previously one had to rely on third party tools like pt-online-schema-change for these cases.
  • "The AES_ENCRYPT() and AES_DECRYPT() functions now permit control of the block encryption mode and take an optional initialization vector argument ... These changes make statements that use AES_ENCRYPT() or AES_DECRYPT() unsafe for statement-based replication and they cannot be stored in the query cache." So, we should expect more warnings now for STATEMENT-based replication setups, but has more secure implementation now. Note that having the details of call to these functions visible in the logs make them less secure, as pointed out in recent Bug #72158.
Oracle also continues to push deprecation of features actively:
  • "The deprecated ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE SQL modes now do nothing. Instead, their previous effects are included in the effects of strict SQL mode (STRICT_ALL_TABLES or STRICT_TRANS_TABLES)." - it seems it's time to check all the code and remove these modes, as they may just disappear in 5.6 even before MySQL 5.7 GA with this kind of activity.
  • "The IGNORE clause for ALTER TABLE is now deprecated and will be removed in a future version of MySQL." - as this deprecation happened in 5.6.x minor release probably the clause will be removed well before 5.8. So it's not like your code is safe for next 2+ years.
Speaking about fixes for public bugs (these I mostly care about), I consider the following as most important:
  • Bug #71197. "mysql slave sql thread memory leak". I've pointed out this bug this week (and not for the first time) as a possible explanation for the OOM conditions in big replication cluster. It's really nice to see it fixed!
  • Bug #70669. "Slave can't continue replication after master's crash recovery", even with sync_binlog=1. This is now fixed.
  • One more fix in replication: Bug #68429 - "When running the slave with --slave-parallel-workers at 1 or greater, setting --slave-skip-errors=all caused the error log to be filled with with instances of the warning Slave SQL: Could not execute Query event. Detailed error: ;, Error_code: 0."
  • Bug #69680. "Auto_inc value not properly generated with RBR and auto_inc column only on slave". It was reported by my colleague Ovais Tariq long time ago and is now fixed. 
  • Bug #71436. "Compressed Tables Regression from 5.6.13". Finally this performance regression when large InnoDB compressed tables are used is fixed. It was reported by my colleague Jervin Real.
  • Bug #70430. "Race condition between purge coordinator and InnoDB monitor crash on shutdown". It was reported by Laurynas Biveinis from Percona.
  • Bug #70063. "Create table is slower in 5.6". It was reported by my former colleague Arnaud Adant from Oracle and the problem was tracked down to inserts into the mysql.innodb_index_stats table (one of two to store InnoDB persistent statistics). Maybe this fix will help to run CREATE or ALTER for partitioned tables faster as well - the problem reported recently as a regression comparing to MySQL 5.5 as Bug #72115 - "Very slow create/alter table with partitions and InnoDB".
  • Bug #65225. "InnoDB miscalculates auto-increment after changing auto_increment_increment" - this one was reported by Elena Stepanova almost 2 years ago and was considered a regression comparing to... MySQL 5.0. Nice to see that we have one less case when old MySQL 5.0 can be considered "better" that recent greatest MySQL 5.6 :)
There are more important fixed in replication and InnoDB categories. Fixes in optimizer are less visible in release notes, but these are really important optimizer bugs that are fixed finally:
  • Bug #70657. "SELECT DISTINCT...GROUP BY returns wrong results in some cases".
  • Bug #69969. "Failing assertion: prebuilt->trx->conc_state == 1 from subselect". It was reported by my colleague Raghavendra Prabhu.
  • Bug #71244. "Wrong result computation using ALL() and GROUP BY".
The last but not the least, for a second release in a row Oracle engineers had fixed something I've reported (even though it's minor enough problem):
  • Bug #71554. Release notes says: " innodb_ft_result_cache_limit now has a hardcoded maximum value of 4294967295 bytes or (2**32 -1). The maximum value was previously defined as the maximum value of ulong." One of those cases when fixing the server code happens as a result of a simple documentation request.
To summarize, MySQL 5.6.17 seems to be a very good release for those who use new InnoDB and replication features in MySQL 5.6. It fixed several regressions and (assuming no new regression bugs introduced) should be implemented everywhere as soon as possible.

Sunday, March 9, 2014

What's wrong with MySQL Manual

I think that MySQL Manual is one of the reasons why MySQL became and still remains popular. I find a lot of useful information (at least references to share with customers, if not real insights) there even after 9+ years of working with MySQL every day and with all numerous articles and blog posts on most important topics available now. I still have MySQL manual page open at every browser instance on every laptop I use on a regular basis.

It's simply great, well indexed by Google and has meaningful human-readable URLs, so one can even guess them for the topics he need. I have http://dev.mysql.com/doc/refman/5.6/en/merge-storage-engine.html open right now and looking at it I clearly understand without any search that if I need a reference for SELECT syntax in 5.5 I have just replace "5.6" with "5.5" and "merge-storage-engine" with "select" to get there. It works that way and many users even had created some nice marcos or shortcuts  for their browser to end up at the proper manual page after typing just the topic they are interested it.

These being said, today I'd like to write about problems with MySQL manual that I care about. Some of them are not new and are hard to fix/avoid, others I consider recent enough. Today I'd like to discuss 4 of them:
  1. MySQL Manual does NOT have enough careful readers who are ready to report problems or missing details.
  2. Old documentation bug reports seem to be ignored without any obvious good reason.
  3. Some documentation pages are useless with their current content.
  4. "How to" kind of official MySQL documentation is missing, and this seems intentional.
The most important problem for MySQL Manual is the fact that it seems it has not enough careful readers who are interested in making it better. Great MySQL Documentation team works hard on improving documentation, but even now, in Oracle, with all the funds theoretically available and almost unlimited resources in general, it seems the team is hardly can keep up with documenting new features and releases in time, so they do not have either time or free pairs of eyes to do real "QA" for the manual other than proofreading maybe.

Also, it's easy to understand that authors who had written the text are not the best people to read it later with a hope to find something useful or some mistake there. They deal with their "ideas" and may have too many things assumed in mind comparing to average user/reader of the manual. They are also very experienced MySQL users and even developers in the past. I had written a book myself long time ago and translated dozens of technical books into Russian, so I know the feeling - for you, as a writer, your text is clear, useful and may be just great. While it is NOT, especially for readers with different technical or cultural background.

As a result we have manual pages that are less than clear and has a lot of useful things missing or improperly documented. When I started to read the manual carefully some time ago (after I stopped writing about MySQL bugs at Facebook I needed some other thing to do to keep my mind in peace) I had found out that I can easily report a bug (if not 5 or more) for any page I open. I see missing details or wrong statements everywhere. I had even "invented" separate "missing manual" tag to classify some of the documentation bugs I've reported. It's so easy to find these problems and report them as bugs in "Server: Documentation" category...

So, MySQL manual has bugs, but not so many it seems (I see 86 active ones at the moment). This is NOT because it's ideal, but more because too few people care to read the manual carefully and report anything they consider wrong or missing. Actually, besides MySQL Support engineers and few other Oracle employees, few of my colleagues at Percona and myself, I know only 3 Community members who report documentation bugs recently:
  • Daniel van Eeden. He is a well known Community bug reporter and used to be great Oracle customer (at least this is how I remember him from the days when I worked in Oracle MySQL Support). Out of his 78 active bug reports I see 11 for MySQL Server manual, but he also reports bugs in documentation for Oracle's "commercial" MySQL software, like MySQL Enterprise Backup.
  • Peter Laursen. He is also a well known and productive MySQL bugs reporter for a decade probably. Out of his 83 active bug reports 5 are related to MySQL Manual. They are not very recent though.
  • Federico Razzoli. He had started to report himself only recently because of privacy-related issues with information requested by Oracle to obtain SSO account for bugs reporting. Before that he asked me to report bugs sometimes, and I did. So, now of his 5 active bugs 2 are related to MySQL Manual.
So, just few reporters who care and, at the same time, problems and deficiencies easy to find everywhere. This is a main problem, IMHO, and the only way to get a hope to solve it is to start reading MySQL Manual and reporting bugs/problems noted! Surely, it's just the first step, somebody has to write/provide real missing content or fixes. If you can suggest some better text, please, do it. If you do not want to wait there are probably alternative ways: you have to document MySQL related things elsewhere. For example, this is what Federico did when he noted that no way to deal with materialized views is documented in usual sources of MySQL-related information. I plan to discuss alternatives to MySQL Manual later. Let's concentrate on what to do to make it better for now.

Other important problem of the manual related to bugs is the fact that some of the bugs reported are NOT fixed in time. Let me share my favorite example here, my Bug #68097. It was reported almost 14 months ago and asked to actually properly document PERFORMANCE_SCHEMA behavior that was intended and explained long time ago - the only way to enable instrumentation for (InnoDB) rwlocks and mutexes is upon server startup. If these instruments where not enabled at startup they are just not available, otherwise you can disable them and enable them back dynamically. I fail to understand what's so hard to write exactly this, as we can find statements about this from Oracle engineers in many places.

As a result, MySQL Manual stays misleading for years! By the way, the oldest documentation bug I see in "Verified" status, Bug #30538, remains in this state without further comments since October 19, 2010. Based on recent comments it may be even somehow fixed, but nobody cares to change status then. And this is for a bug reported back in 2007 by famous customer (probably Google at that time), Mark Callaghan. Obviously we see not enough care about older documentation requests. My recent ones are usually fixed fast though.

Some pages of MySQL Manual exist for years, but their content is almost useless without looking into the code or checking with developers. Great examples can be found in documents devoted to NDB Cluster internals, for example, this page (it's one of many): https://dev.mysql.com/doc/ndbapi/en/ndb-internals-dump-command-2401.html. Nobody even tried to explain the output here, interpretation of the fields, use cases for this DUMP command, nothing. Just few words and example of output that I can probably get myself. How this page is useful to anybody?

Unfortunately, the problem is NOT limited to NDB Cluster internals (that I am happy to honestly not care about at all at the moment). Same happens even to InnoDB. Check http://dev.mysql.com/doc/refman/5.6/en/innodb-monitors.html and tell me where it explains the details about any lock waits related information you can find out in TRANSACTIONS section of INNODB STATUS output. I see only this text:
"If this section reports lock waits, your applications might have lock contention. The output can also help to trace the reasons for transaction deadlocks."
That's all! Really. It does not even explain how "next-key lock" looks in the output vs "record lock"... As a result, one can not explain the way to troubleshoot even simple deadlock based on the manual or, if she tries, she can NOT prove the points with anything but experiments and quotes from the source code. And all that happens when InnoDB lock model does NOT change for decade(s) and source code seems commented in details.

The fourth problem, the lack of official "How to" or "FAQ" documents, is closely related to the above. Not only MySQL Manual sometimes misses a lot of details, you can not get them from http://dev.mysql.com/doc/ (the place to look for official MySQL guides) at all! How to read the output of SHOW ENGINE INNODB STATUS? Now to use PERFORMANCE_SCHEMA to troubleshoot different types of performance problems? You can find answers (of different quality, level of details and age) in great blog posts, books and presentations, maybe in Oracle's knowledge base articles (if you are lucky to be their customer and good in search there) but NOT in official MySQL documentation.

I do not consider this normal or acceptable, especially for the open source software. Check FreeBSD Documentation Project, for example, or Linux Documentation Project to get some ideas of how other open source software approaches the problem.

Surely, adding a collection of "How to" guides probably requires input from Community, but why at least not to try to collect links to all the good existing documents of this kind somewhere at http://dev.mysql.com? Having third party documents of this kind with "officially approved" status after a review by Oracle's MySQL Documentation and/or Support teams would help a lot, I think.

That's all I can complain about at the moment. One day I'll try to continue...

Sunday, March 2, 2014

Fun with Bugs #30 - quick review of my reports in February, 2014

I've got only one comment to my previous post about deadlock, and it was more like a hint based on a different use case, not a real explanation. So far there is nobody who wants to get free beer... Maybe this is even good, as I do not go to the conference and BOF I've submitted will be supervised by my colleague Przemysław Malkowski. But you still have entire month till the conference to get a chance for a beer from him (we'll arrange this somehow).

In the meantime I'd like to review bug reports for MySQL server (few) and fine manual (many) that I've submitted in February, 2014. 22 in total, one was just plain wrong and I closed it as "Not a bug" almost immediately. So, 21 to consider.

Let's start with serious problem. Based on our previous discussion on responsible bug reporting I had not shared the test case with the entire world immediately, but you should know that you probably can crash MySQL 5.5, 5.6 and 5.7 (at least on Windows, but I can not exclude older 5.5 on Linux also) with a single SELECT statement. This was reported as Bug #71858 (security bug from the very beginning, so you will not see it directly).

While working on a real customer problem I had to report a bug related to replication, Bug #71732, "Garbage value in output when MASTER_LOG_FILE='' is set". Having this kind of messages in the error log does not really help to find out fast what was the reason of the problem.





I've considered Bug #71818, "Type column in SHOW PERFORMANCE_SCHEMA STATUS is redundant" annoying at least, but it ended up as "Not a bug" for a reason that SHOW ENGINE INNODB STATUS also has "Type" column for its single row of results, so it's by design, "The column might not be very useful, but it works as intended".

I've spent some time trying to build recent MySQL 5.5+ on Linux using Solaris Studio 12.3 compiler, and mostly failed. In the meantime Bug #71847 was reported, "Predefined compiler flags for SunPro/Solaris Studio 12.x are missing in cmake". Using section from Solaris OS part helped a bit in the process.

I was annoying enough to complain about warnings two times. See  Bug #71640, "Somewhat misleading deprecation warning for innodb_lock_monitor" and Bug #71817, "Warnings text is somewhat misleading (mentions max_open_files and table_cache)" for the details.

Surely I don't forget PERFORMANCE_SCHEMA and added a feature request, Bug #71755, "Provide per partition summary information in PERFORMANCE_SCHEMA".

The rest were actually documentation requests. I plan to write a long post about MySQL documentation in general eventually, so let me point out only bug reports I've created while working on explanation for that deadlock:
  • Bug #71637 - "Manual mentions IS_GAP and IX_GAP locks, but never explains them"
  • Bug #71638 - "Manual does NOT explain "insert intention" lock mode properly"
  • Bug #71735 - "Manual does not explain locks set by SELECT ... FOR UPDATE properly"
  • Bug #71736 - "Manual does not explain locks set by UPDATE properly"
They are all "Verified" for now, so it seems my concerns are valid.

February was a really hot and sad month here in Ukraine, so I was not productive enough with MySQL bugs reporting. Sorry.

Saturday, February 15, 2014

Magic deadlock: what locks are really set by InnoDB?

Megabytes of text had been written already on InnoDB locking and deadlocks. Still, even very simple cases of deadlocks while working with a table having only one row sometimes make people wonder what happened and why.

Today I want to check if this topic is explained well in the manual and existing blog posts and understood properly. So, it's an exercise for my dear readers and those who like to report bugs as much as I do.

Let's consider a very simple example. In session #1 with default transaction isolation level execute the following:

CREATE TABLE `tt` (
  `id` int(11) NOT NULL,
  `c1` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `c1` (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tt values(1,1); -- insert a row there
select * from tt; -- check that we have row (1,1)
begin work;
select * from tt where c1=1 for update; -- we see a row (1,1) and now it's locked for us
update tt set id=id+1 where c1=1; -- and we safely increment the id


So, just a table with one row and two columns. One of columns is a PRIMARY KEY, other is UNIQUE. Why one may need a table like this? Some people (not me) may think this is a good way to create sequences and get globally unique ids generated for a set of rows, with sequence identifier being stored in column I've named c1. This happens, that's why...

Now, in session #2 execute the following:

select * from tt; -- we have row (1,1) there
select * from tt where c1=1 for update; -- let's try to use the sequence...

At this moment session #2 hangs waiting for a lock. It's expected - we actually updated the same row in an yet uncommitted transaction in session #1. This is clear.

Now, in session #1, try to do the following:

select * from tt where c1=1 for update; -- you'll see a row (2,1), as you've already incremented counter

But interesting thing happens in session #2 in the meantime:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

If you do not trust me, just try this with the following isolation level for both sessions:

mysql> select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.00 sec)


At this moment you can take a break in reading, and check the output of SHOW ENGINE INNODB STATUS from session #2. On MySQL 5.6.16 here I see the following (details of report may depend on server version and settings):

------------------------
LATEST DETECTED DEADLOCK
------------------------
2014-02-15 18:09:20 16a8
*** (1) TRANSACTION:
TRANSACTION 36657, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 3, OS thread handle 0x1884, query id 55 localhost ::1 root statistics
select * from tt where c1=1 for update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36657 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32

 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 36656, ACTIVE 10 sec starting index read, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1184, 6 row lock(s), undo log entries 2
MySQL thread id 5, OS thread handle 0x16a8, query id 56 localhost ::1 root statistics
select * from tt where c1=1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36656 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32

 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 242 page no 4 n bits 72 index `c1` of table `test`.`tt` trx id 36656 lock_mode X waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 32

 0: len 4; hex 80000001; asc     ;;
 1: len 4; hex 80000001; asc     ;;

*** WE ROLL BACK TRANSACTION (1)



This is your problem statement: can you explain the deadlock above based on the output of INNODB STATUS or some other sources of information? Do you know what exact locks are really set by each session?

Take your time to write down your exact explanation, in a comment here or in separate blog post. I wonder if your explanation mentions "intention" locks of any kind, "gap" locks, "next key" locks and so on. I've read very creative texts on similar topic, and would not mind to read some more in the comments... I also wonder to know how you are going to prove your points.

We see one record lock that session holds and two lock waits in the above. All of them are obviously related to poor record with values (1,1) in the unique index named c1... You surely know that primary key value is a part of any secondary key in InnoDB, so this is all clear. But why session that already got X lock on this record has to wait to get X lock on the same(?) record? Note also 6 row locks for one of transactions in the report above...

You can surely try to rely on the manual in explaining this. But there I fail to see even clear explanation of what exact locks are set by SELECT ... FOR UPDATE in cases like this. All it says is the following:
"For SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE, locks are acquired for scanned rows, and expected to be released for rows that do not qualify for inclusion in the result set (for example, if they do not meet the criteria given in the WHERE clause). However, in some cases, rows might not be unlocked immediately because the relationship between a result row and its original source is lost during query execution."
and later:
"For index records the search encounters, SELECT ... FROM ... FOR UPDATE blocks other sessions from doing SELECT ... FROM ... LOCK IN SHARE MODE or from reading in certain transaction isolation levels."

I've already reported some documentation requests while trying to explain this deadlock based on the manual. I'll list them all eventually. I hope you'll report several more bugs while trying to solve and discussing this exercise.

Now, why should you care? Because I am going to buy a beer (or any drink you prefer) for everybody who will post good (even if not entirely correct) explanation of this deadlock and make me aware of it, and/or report new valid bug based on this test case! I may be able to do this during PLMCE 2014 (still not sure if I go there, waiting for the decisions on 2 BOF sessions I've submitted) or during some conference later.

So, let's fun begin! You can expect summary post about this exercise (with a list of winners, my own explanation of this case, if still needed, and some ways to study the details) before April 1, 2014 :)

Sunday, February 9, 2014

On responsible bugs reporting

Let me start with questions related to responsible MySQL bugs reporting that I'd like to be discussed and then present a history behind them.

Assuming that you, my dear reader from MySQL Community, noted or found some simple sequence of SQL statements that, when executed by authenticated MySQL user explicitly having all the privileges needed to execute these statements, crashes some version of your favorite MySQL fork, please, answer the following questions:
  1. Do you consider this kind of a bug a "security vulnerability"?
  2. Should you share complete test case at any public site (MySQL bugs database, Facebook, your personal blog, any)?
  3. Should you share just a description of possible "attack vector", as Oracle does when they publish security bug fixes?
  4. Should you share just a stack trace or failed assertion information, without any details on how to get it?
  5. Should you inform vendor of your MySQL fork about it in a private email officially (secalert_us@oracle.com in case of Oracle)?
  6. Should you try to have it added to CVE database and, if you should, what is the best way to do this?
  7. Should you try to use any informal, but still private ways share this information with MySQL vendors and other interested parties?
  8. What is the best way to do "responsible bugs reporting" in cases like this?



Now, the history behind these questions. Usually when I have to deal with any MySQL issue not obviously caused by a user mistake or misconfiguration, I suspect that the problem may be related to some (known or yet unknown, fixed or still active) MySQL bug. I search MySQL bugs database for error messages and codes, keywords, stack trace entries or assertions user noted, at the earliest stage of investigation. This approach works well enough for me for more than 8 years, as even if the problem was not related to any bug, similar reports may give explantions, workarounds or other hints on possible reasons.

To be able to search bugs faster and have more things readily available "from the top of my head" I try to read new public bug reports whenever they appear. Sometimes I send comments or hints. Often enough I see simple test cases there to check. And I do try to run these test cases in a hope to see the problem more clearly or find out that it is not repeatable for me (to try to understand why is that so later). I do this during my free time and while I work, I do this this for fun and when colleagues ask me for some assistance or, specifically, for any bug that may be related to their problem at hand.

I used to share my finding immediately on Facebook by posting URLs to bugs with my comments, but this activity is now stopped at least till PLMCE 2014 in a hope to find some better way of cooperation on bug fixing and as a part of one simple experiment for my talk proposed for that conference (that was not selected by the committee anyway). So, since the end of November, 2013 I share (rarely) only some questions or test cases from time to time, and ask colleagues to try them on their favorite MySQL, MariaDB or Percona Server versions, if they are also curious.

This week during my usual review of new bug reports in the morning I've noted an "Open" bug with a simple test case, just a couple of UPDATEs for some InnoDB table having a single row of data, that, according to the bug reporter, crashed recently released MySQL 5.6.16 (that I blogged about just a day before) and 5.5.36. Surely I've immediately tried this test case, that crashed my test instances because of assertion failure. I've waited for some time to see that the bug is still open, informed few of my colleagues hanging around at IRC chat about new crashing bug found, waited for some more time and... posted test case in a status update on Facebook. This post got some comments that suspected crash as a result, but not so many. I have maybe 200 friends there and most of them are actually current or former colleagues, MySQL engineers or software developers who work on MySQL and other databases in Oracle, HP, SkySQL, Percona and few other companies. My idea was to make them all immediately aware about potential problem. I've noted that very soon after this post the bug report had "disappeared" - it was marked as Private probably and, I hope, verified and got proper attention from Oracle engineers.

Next morning while checking recent emails quickly I've noted message from my former colleague, who now works in Oracle, asking to remove that Facebook post as explicit publishing of a way to crash MySQL is not a "responsible reporting" - I had to give Oracle a chance and time to fix this, as a potential "security vulnerability".

I've immediately deleted the post based just on respect to the person who asked, but internally I had doubts on what was actually a proper thing to do in this case, and I even felt insulted for a while by this (yet another) attempt to shut me up and stop me from discussing MySQL bugs in public (not that this really happens, but I have feelings like that from time to time). So, I tried to discuss this with my colleagues and found out that some of them do think the bug is actually a potential security vulnerability and thus had to be private. Others (like me) do not believe in any "security through obscurity" practice, especially for open source software, where any interested person can just read the code, find assertion there and then just think when and why it can be hit, based on code analysis.


Formal Oracle policy in this case is well known: "If you are not a customer or partner, please email secalert_us@oracle.com with your discovery". Customers should just open a Support Request.

I was explicitly against applying this policy to MySQL (I've suggested to create public bug reports in my New Year 2013 wishes to Oracle customers back in 2012) among other reasons because "security bugs" and "security issues" have more chances internally in Oracle to become forgotten as they get less visibility (at the same time as they get higher priority) and very few people in Oracle can access or escalate or otherwise monitor them.

Surely, this policy can not be enforced on community members and individuals like me. In the worst case if somebody discloses details of the security vulnerability, she will not get a credit, as: "In order to receive credit, security researchers must follow responsible disclosure practices".


That's the whole story so far. In my case it was not me who found the bug, so why should I care even about credit. My action caused immediate attention from Oracle side and thus I can be sure this specific bug is not forgotten and fixed as soon as possible (by the way, MySQL 5.5.35 is also affected by the same assertion failure, so this is NOT a recent regression and those who use older versions in production are potentially affected). I am sure my post was also noted by "security officers" in MariaDB Foundation and Percona, so in case these forks are also affected we now have chances to get fixes there as well, no matter when and how Oracle fixes the problem.

Still, this case and related private discussions led me to the questions at the beginning of this blog post. I'd like to get your opinions on them.