Tuesday, December 31, 2013

Fun with Bugs #28 - regression bugs in MySQL 5.6

2013 was a great year for MySQL Community. New MySQL 5.6 GA release with its increased throughput, scalability and new features as well as more interaction and cooperation with MySQL Community from Oracle side brought us a lot of new perspectives and good feelings over the year.

Unfortunately new MySQL 5.6 GA release also reminded about old and well known problem with new MySQL versions. They all introduce new regression bugs. MySQL 5.6 had not become an exception.

Note that according to good old tradition (that I hope will be followed in 2014) bugs that demonstrate a regression (make some feature that previously worked stop functioning as intended in a new release) are marked with "regression" tag at http://bugs.mysql.com. So, it's easy to find them, and here is the list of regression bugs that affect MySQL 5.6 (sometimes regression bug happens in several major versions, as fixes also happens in several versions).

You'll see just 31 bugs in the list of active ones affecting 5.6. Do not become very optimistic because of this. Unfortunately good tradition to use "regression" tag is not followed by some Oracle engineers , so some bugs clearly demonstrating a regression are not tagged properly. Check well known Bug #69623, "since 5.5.32 & 5.6.12, innodb cant start with own multi-file tablespace", for example. The fact that it's a regression is clear from the phrase above even (synopsis), but still I do not see a "regression" tag.

So, regressions still happens (and sometimes it's hard to note/find them before you hit them), even though we all know that "Automated testing and well-written test cases can reduce the likelihood of a regression.". It seems we have reasons to suspect that either test cases development, or proper automated testing (or both) is still far from perfect in Oracle (and maybe only a bit better in other companies that provide MySQL builds or forks).

To give you some ideas of what kind of regression bugs were noted in MySQL 5.6 in 2013, let me recent 10 verified regression bugs:
  • Bug #71244, "Wrong result computation using ALL() and GROUP BY". Older versions, like 5.6.11, do not have this problem. It can be easily workarounded by NOT using alias of aggregated column in HAVING clause, but still it's sad to see that this simple enough kind of query is not covered by regression tests.
  • Bug #71220, "pow() function returns an error for bad values". It's a representative of an interesting class of "regression"-like bugs. New versions change something in good old MySQL behavior, to make it "better", more close to SQL Standard or "improve" it. But this improvement affects users who upgrade, and manual does not help them to be better prepared for this kind of "improvement", as this change is not listed as incompatible in any prominent place. Just a change to better that ends up as a regression from user's point of view.
  • Bug #71095, "Wrong results with PARTITION BY LIST COLUMNS()". I've reported it based on customer issue after we found a workaround (different way to partition a table). Still, it was a very bad surprise (as most regression bugs) and it took notable efforts to reduce the problem to a simple test case. Bug got immediate attention and simple patch from Mattias Jonsson (patch solved the problem according to my test), so I hope to see the fix in 5.6.16.
  • Bug #71055, "Using IF EXISTS(SELECT * ...) acquires a lock when using read uncommitted". READ UNCOMMITTED isolation level had never been widely used and thus one should not expect it to be well tested, but on the other hand locking SELECT is even less expected with this level. It seems READ UNCOMMITTED is not covered by regression tests properly, otherwise Oracle could not miss this regression/change of behavior comparing to MySQL 5.1.x.
  • Bug #70819, "SHOW ENGINE INNODB MUTEX does NOT work with timed_mutex properly". I've noted this undocumented change in behavior of 5.6 vs 5.5 while working on my PERFORMANCE_SCHEMA-related presentation. It's clear that in MySQL 5.7+ PERFORMANCE_SCHEMA will probably replace not only SHOW PROFILE, but also some other SHOW statements. But while functionality is there officially (according to the manual), it should not just disappear, whatever good intention one may have or whatever corner case we speak about.
  • Bug #70617, "Default persistent stats can cause unexpected long query times". This is a good example of visible performance regression related to a new feature and lack of details on how it works or how to use it properly. For poor users it's just like MySQL 5.6 started to work really slow on some queries in some cases. Then it take many hours (if not days) for several people to really find out what's wrong, fix real life case with a workaround and then reduce it to something that can be accepted as a verified bug. I had entire post about this specific case. It has a lot of details about the process of forcing Oracle to consider this case seriously...
  • Bug #70598, "Premature expression evaluation prevents short-circuited conditionals". One more case when code that just worked in MySQL 5.1 stopped working after upgrade to 5.5, and none of the never versions had not helped. Workaround exists, but if user has to change the code just because of upgrade, and had no way to find out that is is going to be needed before he hit the bug, it's already bad.
  • Bug #70491, "SELECT DISTINCT may return a wrong result if a join buffer is involved". One more case of the optimizer-related regressions, and ones that could be found by proper QA. It was reported by Igor Babaev from MariaDB. As a side note, too many bugs in 5.6 are related to all cases and kinds of statements involving aggregation. Something to think about.
  • Bug #70466, "No results when filesorting with a correlation in subquery's HAVING clause". Again HAVING and regression in MySQL 5.6. Oracle started to change optimizer in MySQL 5.6, based on old MySQL 6.0 ideas and some new ones, but it's clear that new developments in this area are NOT properly supported by regression tests.
  • Bug #70351, "ALTER TABLE ADD CONSTRAINT xxx FOREIGN KEY adds two constraints". Sometimes fixes/changes happen in all major MySQL versions. In cases like these one has to care a lot to test properly, as change or related regression may affect wide user base. This bug is just an example of a change that led to regression in all versions after just a minor upgrade.
I should stop at this stage, as it's time for a New Year wishes. I with all Oracle MySQL engineers to care about regression testing properly!

And to Oracle MySQL managers I wish to recognize the efforts of their colleagues who process bugs from community, and to award those who really work and care. As a hint, note who worked on the bugs mentioned above, who verified most of them (5 out of 10), who added more test cases and who contributed patches promptly. These are your most valuable assets, and I hope you will care about them properly!

Saturday, December 28, 2013

Fun with Bugs #27 - bug reports from my teammates at Percona

Surely, I am not the only one in Percona who reports MySQL bugs. In my old post, "17 Famous MySQL Bug Reporters", I've already mentioned Roel Van de PaarAlexey Kopytov and Peter Zaitsev.They had contributed a lot over years.

In this post I'd like to concentrate on bug reports from my Support colleagues at Percona. Many of their contributions are notably more important than anything I've ever reported. Many bugs they reported are fixed. Oracle recently started to recognize in public Community contributions in a form of the bug reports, so you had a chance to see some of the names mentioned below, with explicit thanks to them. Still, I think it makes sense to repeat them again.

So, here is a list of my colleagues who used to work in Percona Support team in 2013 and report MySQL bugs, with total number of bug reports from them as of today and 1-2 reports highlighted as most important ones. You can click on the name to see the list of all bugs reported by each person, started from the recent ones:
  • Jaime Sicam had reported only one MySQL Workbench bug this year (that was fixed in 6.0.1). But his older bug report, Bug #64922, "Foreign Key Error on CREATE TABLE after ALTER TABLE and DROP TABLE statements.", that I verified in 2012 while working in Oracle, is still waiting for some attention (even though it may be already fixed in 5.6.6+).
  • Jervin Real had reported 11 bugs in total, 8 of them in 2013. Check his Bug #70404, "Bit Value Not being Dumped Properly by mysqldump". BIT data type causes many small but annoying problems in MySQL, and inability to dump values properly is one of them. Use TINYINT or CHAR(1) instead, really...
  • Justin Swanhart is already famous as bug reporter based on his contributions to MySQL 5.6. He had reported 21 bugs in 2013. Oldest of his reports, Bug #68607, "REPLACE statement not properly logged in binary log in RBR", is still just "Verified" though.
  • Miguel Angel Nieto was also recognized as a valuable contributor by Oracle, because of Bug #69861 he had reported this year (fixed in 5.6.15). His old documentation request, Bug #63128, "explanation of the behavior of innodb_autoinc_lock_mode = 1 with INSERT IGNORE", is still waiting for some attention.
  • Muhammad Irfan had reported Bug #70537, "No users created under MySQL system database for RPM based installation", that ended up as a verified documentation request.
  • Ovais Tariq had reported 11 bugs in total, of them 5 in 2013. Most of his recent reports are still waiting for the fix, and they are pretty serious. Check Bug #69680, "Auto_inc value not properly generated with RBR and auto_inc column only on slave", for example.
  • Paul Namuag - had reported Bug #71188, "Strange beheavior ON DUPLICATE KEY UPDATE when auto_increment reaches MAXINT", recently. It was declared "Not a bug", but I think there is still something to fix. Hence my Bug #71232, "Wrong behaviour for auto_increment unsigned bigint column approaching max value ". 
  • Przemyslaw Malkowski had reported 6 bugs this year, and 4 of them are still waiting for the fixes. Check his recent finding, Bug #71211, "ARCHIVE engine does not guarantee UNIQUE and PRIMARY KEY constraints".
  • Raghavendra Prabhu  had reported Bug #69969, "Failing assertion: prebuilt->trx->conc_state == 1 from subselect", this year. Still waiting for the fix. Note that this bug was found as a result of his ongoing QA efforts using RQG. It seems Raghu has more than one account in the bugs database, so here is the list of 5 more bugs he had reported (3 of them in 2013).
Even though he does not work in Support formally, but Laurynas Biveinis must be mentioned in any good list of MySQL contributors. With his 54 bug reports in total (many of them with patches), of them 27 reported in 2013, he is one of the key Community contributors recently. Still, some of his reports, like Bug #68725, "UNIV_MEM_DEBUG needlessly slow, especially with UNIV_ZIP_DEBUG", are waiting for formal processing, not even the fix...

Anyway, as you can see, Percona employees contribute not only to Percona software users, but also to upstream MySQL users. We report bugs, so we care!

Fun with Bugs #26 - MySQL bugs Oracle had not fixed for me (yet)

In the previous post in this series I've listed 15 MySQL bug reports, documentation and feature requests I've made in 2013 that got fixes or any other kind of solution. Now it's time to check what happened to the rest and try to think why.

First of all, no MySQL bug reporter is perfect (if only Domas), so some bug reports may be false alarms ("Not a bug"), to hard to fix at any foreseeable future ("To be fixed later") or asking for something that Oracle does not plan to provide at all ("Won't fix" or "Unsupported"). Some of my bug reports this year felt into these categories:
  • Bug #71205 - "Queries to P_S seems to pass extra stages related to query cache". This is "Not a bug", as it seems access to query cache happens even before SELECT is parsed enough to find out it accesses table(s) in PERFORMANCE_SCHEMA. So, if you want to avoid extra overhead in general case make sure you start all your queries to PERFORMANCE_SCHEMA with SELECT SQL_NO_CACHE ... I'd prefer to see this explicitly mentioned in the manual, but maybe it's only me.   
  • Bug #71170 - "Please, make MySQL RPMs relocatable". This one was set to "Unsupported", so Oracle does not plan to provide relocatable packages and users who want to test/use multiple versions of MySQL server should either rely on chroot environments, .tar.gz packages and sandboxes or wait for other MySQL providers to make their RPMs relocatable (if ever). Fair enough.
  • Bug #71041 - "Please, document every instrument in P_S.setup_instruments in details". Here I see "To be fixed later", so it seems documentation team has more important things to do than to document all the details of PERFORMANCE_SCHEMA instrumentation. This is sad as it's hard to use instrumentation now without knowing all ins and outs of the code. I am also not satisfied with official reasons to have this undocumented listed in the manual.
  • Bug #69399 - "Inconsistency in crash report". It's "Won't fix" and for a really good reason explained by Shane. Signal handler must use approved safe functions or risk causing a crash itself. time() is safe but returns UTC. So, you should expect that for crashes, assertion failures and outputs caused by other signals you get timestamps in UTC, no matter what your timezone is. So, this was my fault.
Some bug reports are still not processed ("Open" or "Analyzing"). That's expected for complex reports, those without clearly repeatable test case (I try to NOT send reports like this) or related to not clearly documented features of MySQL server. But I have one feature request that is probably just missed. It's the Bug #70196 - "DISCARD/IMPORT tablespace is not supported for partitioned InnoDB tables". There is no way to IMPORT just one partition, or, for that matter, to backup or restore from backup only one partition, even when it is stored in the individual .ibd. file and you use advanced tools like MySQL Enterprise Backup for backups. This is unfortunate that nobody explicitly cares about this.

The rest of my reports sent this year were accepted as valid and has "Verified" status. One can hardly expect for 20+ valid MySQL server or documentation bugs he reported to be fixed over a year, so this situation is not a problem. But I still would like to remind about a couple of reports that are verified long time ago and still do not get any visible attention or fixes:
  • Bug #68097 - "Manual does not explain that some P_S instruments must be enabled at startup". it was reported on January 16, 2013 and it seems already clear what instruments (mutexes actually) must be enabled upon startup (and then can be disabled and enabled back dynamically at runtime), by design. Slide 35 of my presentation for Percona Live London 2013 quotes the explanation made by famous Oracle's performance expert, Dimitri Kravtchuk, in his blog. But the documentation request is still just "Verified".
  • Bug #69574 - "Slave crashes when applying row-based binlog entries in cascading replication...". It affects MySQL 5.6, was verified 6 months ago, sounds serious enough and still had not got any single comment after verification.
For other bug reports I agree to wait patiently... Just note that while I like to play with PERFORMANCE_SCHEMA or find some minor problems in the manual just for fun, most of my bug reports for InnoDB, Optimizer, Partitioning and Replication categories are based on real life issues noted by Percona customers. They were not really funny to hit in production and caused notable problems.

Friday, December 27, 2013

Fun with Bugs #25 - MySQL bugs Oracle fixed for me this year

I've checked recently and noted that I've sent 50 reports about MySQL bugs, features I'd like to see and unclear/missing manual pages this year. It all started with famous Bug #68079 (reported on January 14, 2013), that got a lot of attention, valuable workaround from Oracle and caused a lot of work that is going to improve MySQL scalability substantially in the future.

Oracle had also implemented this my (and not only mine!) feature request, Bug #69527, and in MySQL 5.7.3 PERFORMANCE_SCHEMA finally exposes metadata locks information. This is a great and long waited step forward in instrumentation.

Besides that, 12 of my documentation requests were satisfied:
  • Bug #68089 - "Manual refers to wrong (old?) column names for some P_S tables"
  • Bug #68181 - "Release notes for 5.6.11 reference wrong bug number"
  • Bug #68223 - "Manual for ALTER TABLE is wrong about mixing partitioning and other changes"
  • Bug #69697 - "Manual has not enough details on how to use transportable tablespaces"
  • Bug #69701 - "ALTER TABLE ... IMPORT TABLESPACE does not check foreign keys"
  • Bug #69717 - "DML statements replicated via RBR are not logged in the general query log"
  • Bug #69865 - "Wrong default MESSAGE_TEXT values for SIGNALs are listed in the manual"
  • Bug #70682  - "The description for --version-check in the summary table is wrong"
  • Bug #70683 - "The description is wrong for --server-public-key-path=file_name"
  • Bug #70741 - "InnoDB background stats thread is not properly documented"
  • Bug #70991 - "Manual seems to recommend IDEMPOTENT mode for all cases of master-master"
  • Bug #71103 - "Wrong syntax is used as example in the manual"
Thank you, Jonathan Stephens and Paul Dubois, for your hard work to improve MySQL manual! I know, you had hard times with me more than once...

Also just yesterday one my feature request for http://bugs.mysql.com was finally implemented, Bug #70631, "8K limit for "How to repeat" filed is too restrictive"! I hit it several times during this year and it seems now we have a new limit, 32K. Good news!

Anything else? Actually, no. So, 15 out of 50 or so (I still have time to report few more). Of them one new feature in 5.7.3 9very important one) and one performance problem studied, with more work to do... Why is that so, maybe the rest of my reports were irrelevant, wrong or useless? You can check yourself or wait for the next issues of "Fun with Bugs", where I'll try to check other bugs I've reported in 2013.