Saturday, October 24, 2015

Fun with Bugs #39 - Known Bugs in MySQL 5.7.9 GA

These days everybody is excited with recent announcement of MySQL 5.7.9 GA release. If you are not aware of this event yet (I've noted it from numerous posts even during my short vacation), wait for the Oracle Open World 2015 to begin tomorrow to announce it even wider and louder!

I already have 5.7.9 built from source, up and running, so it's time to check what else we can expect from this new GA release besides new great features (this is a topic for a separate post or two) and usual excitement. Yes, I mean known, verified bugs in MySQL 5.7.9.

Let me start with a quick summary and then present the details. So, even though MySQL Community tried hard to check 5.7.x at early stages and report bugs to Oracle, MySQL 5.7.9 GA has a number of known installation and upgrade problems, code is not always clean, not all community suggested patches are included. As usual with new releases, there are some performance issues (one should expect single thread to run slower than on older versions in many cases), including those on replication slaves. Some new features are not completely documented, some implementation details are missing here and there, and transparent page compression, for example, is hardly can be widely used at the moment. All these are expected for users who had seen previous GA releases of MySQL. What's a bit worse, I suspect there are known crashing bugs (sequence of SQL statements that, when executed by a user with enough privileges, crashes non-debug binaries) in MySQL 5.7.9. Also, MySQL 5.7.9 inherits some regression bugs from MySQL 5.6.x.

Quick search for active bugs in version 5.7 reported over last 6 months gives me a list of 157 bugs that probably affect MySQL 5.7.9. I've excluded those that are not yet verified formally, those affecting older versions (5.6.x and 5.5.x), most of documentation and test problems and ended up with the following list that I want to present to those who plan to upgrade to 5.7 GA really soon:
  • Bug #78936 - "Installing MySQL on Linux Using Debian Packages missing dependency". This is mostly a documentation request. Please, check comments there for the correct order of .deb packages installation, if you plan to use dpkg.
  • Bug #78591 - "mysqld --initialize fails on empty data directory". Some more installation problems on Ubuntu...
  • Bug #78895 - "Lack information about generated columns in mysql_fetch_fields function". Now, this is typical, unfortunately, for new MySQL features. They are rarely completely implemented in first few GA releases. Oracle still has to rely on MySQL Community to find these small missing implementation details (or design problems) here and there.
  • Bug #78848 - "Docs still claim Performance Schema does not allocate memory after startup". While formally it's just a documentation bug, it is about a notable change in behavior. In 5.7 additional memory may be allocated after startup to support performance_schema. This had never been the case before. My report that is based on real customer case, Bug #78808, "Manual is wrong when explaining the default UMASK value", is of a similar kind - some clear changes in behavior are NOT documented properly before making GA announcement. MySQL 5.7 follows the same bad tradition of 5.0, 5.1, 5.5, and 5.6 GA releases here...
  • Bug #78778 - "mysqlclient does not return correct mysql_insert_id via database handle". It seems to be a regression bug (even though it is not tagged so) and as a result your DBD::mysql Perl module may not work with 5.7 the way you expect.
  • Bug #78774 - "old-password subcommand is still in mysqladmin --help in 5.7". Minor problem, but I wonder why it could not be fixed in the code before GA release. It seems the build/QA/release process still takes a lot of time in Oracle, so even those bugs noted by Community in RC releases were reported "too late" to influence the GA release. This is unfortunate.
  • Bug #78758 - "Inserting (invalid) utf8 data into 5.6.25 master breaks 5.7 slave". There is a workaround documented, but this bug report shows that upgrade from 5.6 may not be seamless and simple (are you surprised?) and that upgrade was NOT really tested really well by the internal QA process. I am not surprised, it had always been the case.
  • Bug #78672 - "assert fails in fil_io during linkbench with transparent innodb compression". Facebook engineers are trying hard, but transparent page compression mostly does NOT work well (if only on NVMFS, see Bug #78277 also). Just remember that for now, before you became too excited with the feature.
  • Bug #78495 - "Table 'mysql.gtid_executed' cannot be opened." Don't be scared with this warning, it's "Ok"...
  • Bug #78374 - "CREATE USER IF NOT EXISTS" reports an error". New feature that just does not work as documented. Well done.
  • Bug #78352 - "Slow startup of 5.7.x slave with relay_log_recovery = ON and many relay logs". This my report is also based on a real life customer case. Take care when upgrading, really.
  • Bug #78254 - "After running mysql_upgrade proxies_priv user columns are not updated to 32". mysql_upgrade is not able to fix all incompatibilities for further seamless use of MySQL 5.7.
  • Bug #78251 - "handle_fatal_signal (sig=11) in mach_double_read". No further comments for now.
  • Bug #78236 - "InnoDB: Failing assertion: holder != requestor". Again, no comments.
  • Bug #78219 - "compression can be defined on a compressed table if innodb_strict_mode is OFF"
  • Bug #78068 - "replication_* tables miss information about binary log position and other". I should not comment on performance_schema any more probably.
  • Bug #77740 - "silent failure to start if mysql.gtids_executed gets HA_ERR_LOCK_WAIT_TIMEOUT". You can end up with failures and error log does not help to find out why...
  • This bug was actually reported back in February, so it was not in the list of 157 recent reports, and there is nothing unexpected here, but still. Note that single thread performance of MySQL 5.7.x GA may be worse than for older versions, see Bug #75981, "MySQL 5.7.5 30% slower than 5.6.23 for 1 thread write", for example.
Some potential performance issues, mistakes in the code and debug-only assertions (that may show huge internal corruptions) are never noted until somebody starts to test a lot or read the code carefully:
  • Bug #78894 - ''buf_pool_resize can lock less in checking whether AHI is on or off".
  • Bug #78637 - "Incorrect for loop condition in mysql_crawler.cc"
  • Bug #78732 - "InnoDB: Failing assertion: *mbmaxlen < 5 in file ha_innodb.cc line 1803".
  • Bug #78728 - "InnoDB: Failing assertion: 0 in file ha_innopart.cc line 3526" 
  • ... check the list yourself, there are many more assertion failures and code review problems reported  ...
  • Bug #77094 - "Reduce log_sys->mutex contention by allowing concurrent mtr commit and log write". This is just one of many patches contributed by Zhai Weixiang that remain just "Verified"
I expect many more reports of this kind from Roel, Laurynas and my other colleagues now, when Percona is actively working on its own Percona Server 5.7 GA release.

The last but not the least, if you are upgrading to 5.7 GA from 5.5.x or older versions, check known regressions in MySQL 5.6. Many of them are affecting 5.7.9 as well, unfortunately.

Saturday, October 10, 2015

Fun with Bugs #38 - Regression Bugs in MySQL 5.6

I often have to reply to questions related to upgrade from, say, MySQL 5.5.x to latest MySQL (or Percona) Server 5.6.x (5.6.27 for MySQL and 5.6.26 for Percona at the moment). One of them is sometimes about "any known bugs" that may affect a user after upgrade.

One may assume that now, 2 years and 8 months after the first GA release of MySQL 5.6.10, there should be very few bugs (we call them regression bugs) of a kind that is interesting for the user planning upgrade from 5.5.x. I checked yesterday and found out it's not the case actually. It took me half an hour to end up with the list of more than 20 regression bugs that are still "Verified", so may affect even the latest recent release, 5.6.27. I had not checked them one by one on 5.6.27 yet (it will take more than half an hour probably), but still would like to present them, classified into separate categories.

First of all, there are performance regressions of various kinds and impact. Mostly they are about single thread performance (that is well known to be worse in 5.6 and may influence replication slaves badly) or related to negative effects of new optimizer features that are enabled by default in 5.6, but sometimes reasons are different or not clear:
  • Bug #68825 - "performance regressions for single-threaded workloads". It was not the first result in my Google search, but this is a classical bug report by Mark Callaghan. There are many test results and observations, regressions continue in 5.7.8. Useful reading. With this bug in mind we all should assume by default that performance of single thread processing is expected to go down with each major MySQL release...
  • Bug #76933 - "Performance Regression in 5.6: Update does not use index". As simple as that, some update is fast in 5.5.x (and 5.7.x), but is slow in 5.6 as full table scan is used. There are no comments about the reason. Pure regression clearly marked with a "regression" tag.
  • Bug #69350 - "performance regression between 5.5 and 5.6 for simple statement in routine loop". Your stored procedures may run slower in 5.6 comparing to 5.5. Note that this bug report does NOT have a usual "regression" tag, so search by tag is not enough in general.
  • Bug #68979 - "performance regression of MySQL 5.6.10?". If you use derived tables, they are no longer materialized at optimization stage (due to "Delayed materialization of derived tables" feature of 5.6), so optimizer is not able to make some choices early.
  • Bug #69801 - "performance regression between 5.5 and 5.6 for str_to_date function". 5.7 is also affected. No "regression" tag.
  • Bug #76247 - ''Regression case with semijoin - query with many INs stuck on statistics state". This is the only "Not a bug" in my list. We can continue arguing is it a bug or not, but you should expect the problem in 5.6 and 5.7 with default settings (some queries may just "hang" forever or for a long time, those that were executed fast enough in 5.5 or older versions) and check the list of workarounds, from disabling semijoin optimization to query rewrite or setting optimizer_prune_level to 1.
  • Bug #68919 - "Performance regression when DS-MRR is used for query with small limit". The workaround is to set mrr=off for this specific kind of queries. No "regression" tag.
  • Bug #69219 - "CREATE TEMPORARY TABLE ... SELECT is slower on 5.6 than on 5.1". Bug title mentions 5.1, but actually regression was noted comparing to 5.5. Nobody cared to work on this any further, so the reason is not clear, but note that this happens, to different extents, to both InnoDB and MyISAM storage engines used for these explicitly created temporary tables.

What I consider the worst kind of regressions, is a wrong results bugs, just (usually undocumented in any clear way) change in behavior out of nothing that leads to different data (or metadata) returned by some queries. The fact that such bugs exist tells a lot about the processes of development, documenting and QA:
  • Bug #76355 - "Function "addtime" returns wrong column type (regression)." No "regression" tag, 5.7 is also affected.
  • Bug #70491 - "SELECT DISTINCT may return a wrong result if a join buffer is involved". I remember more bugs for SELECT DISTINCT, with wrong results, but this one remains "Verified" 2 years after it was reported. Nobody cares.
  • Bug #75447 - "INSERT into a view sometimes ignores default values". 5.7 is also affected, unlike 5.5. It seems noted by a developer actually, and we have a clear "regression" tag.
  • Bug #70091 - ''tinyint(1) column type returned by jdbc connector becomes Integer with ORDER BY". Regression happened in 5.6.13, so obviously some tests are missing when this happens during GA bug fixing cycle.
  • Bug #68972 - "Can't find temporary table". You can call some procedures that create temporary tables, among other things, just once. Second call fails, and this was not the case in 5.5. Several users claim to be affected since it was reported.
  • Bug #75668 - "Use of Distinct, Group_concat, and Group by together produces 1 NULL row result.". Classical wrong results case (unexpected results) in default installation. Affects 5.6 and 5.7 it seems.
This kind of bugs is bad in general as one can hit the bug any time, unless all code paths in application are covered by proper test cases that are checked on every upgrade. No 3rd party benchmarks or reading the manual carefully may help, only code fully covered with tests (somethings that MySQL vendors seem not to have!).

There are other regressions/unexpected changes in behavior that affect old familiar environments or procedures:
  • Bug #74908 - "Unable to detect network timeout in 5.6 when using SSL (regression from 5.5)". In case of network problem in between slave and master, slave connected via SSL no longer restarts after slave_net_timeout seconds passed. The slave status will continue to state "waiting for master to send event". The slave will remain in this state until the slave is stopped and restarted explicitly. 
  • Bug #77227 - "mysqld_multi stop does not work". Regression happened in 5.6.25 it seems. So, while upgrading from some old 5.5.x to latest and greatest 5.6.x usually allows to hit fewer bugs in 5.6, some 5.6-specific regressions also happen sometimes, and they may affect you badly during upgrade. 
  • Bug #72108 - "Hard to read history file". See also Bug #69991 and Bug #68925.

Other regressions in InnoDB were just unexpected:
  • Bug #72851 - "Fix for bug 16345265 in 5.6.11 breaks backward compatibility for InnoDB recovery". Read it entirely, but basically remember to do a clean shutdown in older versions before upgrading to 5.6 (or 5.7) in place.
  • Bug #77128 - "Regression in dealing with filesystem limits". InnoDB just works differently now on filesystems like ext3 (or fat32) when file size limit is hit by autoextending file from shared tablespace. In 5.5 there was an easy way to proceed, now you end up with error messages and no way to start. Note that at least 5.7.7 was also affected. There is no "regression" tag.
  • Bug #74609 - "Unable to update Foreign Key created as NOT NULL to table named with a Dollar $". 5.7 is also affected. 
  • Bug #77654 - "After 5.6.15 innodb_force_recovery greater or equal than 4 is useless". What else to add, it just does not start...
Some regressions are probably just formal and irrelevant to any real life use cases, but still weird to see them happening:
  • Bug #71477 - "transaction_allow_batching is available in non-Cluster MySQL Server 5.6". Nobody cares, I know, but it was not like that in 5.5...
To summarize, one has to search carefully for known regression bugs in 5.6.x before upgrading. Performance regressions may happen for almost any load and should be expected. But problem with stored procedures, built-in functions, replication and even command line utilities are also possible, and most of them are NOT explained in any section of the manual.

Same applies to 5.7 as well (some of the bugs above are regressions in 5.7 comparing to 5.5 or older versions). I plan to write another blog post about regressions in 5.7 comparing to 5.6, as well as problems one should expect during upgrade, documented or not.

Take care.

Saturday, October 3, 2015

Fun with Bugs #37 - Bugs fixed in MySQL 5.6.27

MySQL 5.6.27 was released on September 30 formally. Source code is also available on GitHub, and I have it compiled (some users are less lucky) and running for a couple of days already. In this post I'll comment on some bugs reported by MySQL Community that are fixed there.

I'd like to start with a couple of bugs where patches were also contributed. First of all, the fix suggested by Stewart Smith in Bug #72811, "Set NUMA mempolicy for optimum mysqld performance", helps to allocate memory in a more reasonable way on NUMA-enabled systems. Previously it was like all interleaved or nothing, now there is a way to apply this only to the InnoDB buffer pool. The bug was formally verified by Umesh.

Alexey Kopytov had contributed two important fixes, for Bug #76927, "Duplicate UK values in READ-COMMITTED (again)" that was verified by Shane Bester, and for Bug #74891 (reported by Stewart Smith and verified by Umesh).

As a side note, I like to see fixes for other platforms than x86_64 and other OSes than Linux. Historically I started to work on MySQL by checking compilation or build bugs for unusual platforms, compilers and environments... There is one more bug of this kind fixed in 5.6.27: Bug #76135 , "data corruption on arm64", reported by Daniel Frazier.

Other important InnoDB bug fixes include:
  • Bug #77743, "Auto-increment sequence gets reset", reported by my colleague Marcos Albe and verified by Umesh
  • Bug #75185, "lower_case_table_names=0 on windows leads to problems", reported by Shane Bester and verified by Umesh

A lot of replication bugs were fixed in this release:
  • Bug #78389, "5.6.24: Lost data during master restart if partial transact has been downloaded", reported by Simon Mudd and verified by Shane Bester.
  • Bug #74607, "slave io_thread may get stuck when using GTID and low slave_net_timeouts", reported by Santosh Praneeth Banda and verified by Umesh
  • Bug #76959, "Gaps in Retrieved_Gtid_Set while no gaps in Executed_Gtid_Set", reported by my colleague Sveta Smirnova and verified by Umesh
  • Bug #76727, "Slave assertion in unpack_row with ROLLBACK TO SAVEPOINT in error handler", reported by Philip Stoev and verified by Umesh. In Percona we found out hard way that while one can create a trigger that does ROLLBACK TO SAVEPOINT in the code (see related documentation Bug #77163) and it even seems to work as expected, this leads to problems on slave replicating such a transaction. This fix in 5.6.27 is just a start of the long way to proper fixes in all places, and those interested in that's wrong can check great analysis by Vlad Lesin in Percona Server bug #1483251
  • Bug #76618, "SHOW BINLOG EVENTS completely locks down writes to binlog, hence transactions", reported by Shlomi Noach and verified by Shane Bester
  • Bug #76493, "Binlog statement is not ignored", reported by Dan Lukes and verified by Umesh
  • Bug #76379, "binlog_error_action doesn't handle some failures during binlog rotation", reported by Santosh Praneeth Banda and verified by Umesh
  • Bug #74950, that is still private at the moment (had I written before that I hate when this happens?), is described as follows in release notes:
    "Modifying the master_info_repository or relay_log_info_repository inside a transaction and later rolling back that transaction left the repository in an unusable state. We fix this by preventing any modification of these repositories inside a transaction."
    This is not the last remaining bug related to "crash-safe" replication in 5.6, implemented by storing information in explicit InnoDB tables in mysql database. Changes there are a part of transaction and when changes of transaction are rolled back, we often end up with replication broken in some way or in some otherwise bad state. It seems we are doomed to see these problems until MySQL implements some "micro transactions" for its "data dictionary", that are handled differently than normal transactions to usual InnoDB tables... One day I'll write a separate blog post on all the related bugs already identified
  • Bug #74089, "Report Relay_Log_File and Relay_Log_Pos on relay-log-recovery", is a nice feature request by Jean-François Gagné that was verified by Luis Soares himself, and implemented in 5.6.27. Well done!
  • Bug #73806, still private, and Bug #76746, "Broken replication on SQL thread restart if gtid_mode is enabled", reported by Davi Arnaut who also contributed a patch, and verified by Umesh
  • Bug #68525, "Error "When GTID_NEXT is set to a GTID" ROW based replication", reported by Nogueira Jesus and verified by Sveta Smirnova
I find it important to note that most replication bugs fixed in 5.6.27 were found by MySQL Community users, NOT internally by Oracle. I am also surprised with so many replication bugs still existing in 5.6.x now, 2+ years after GA release. I wonder what's going to happen when new replication features of 5.7 will be tested by MySQL users...

Several optimizer bugs were also fixed:
  • Bug #77135, "Update on varchar and text columns produce incorrect results", reported by Chris Sims and formally verified by Sinisa Milivojevic
  • Bug #76349, "memory leak in add_derived_key()", reported by Vlad Lesin and verified by Shane Bester
  • Bug #75248, "OR conditions on multi-column index may not use all index columns to filter rows", reported by Yoshinori Matsunobu and verified by Umesh

Performance Schema also had got some fixes:
  • Bug #77577, "setup_timers initialization assumes CYCLE timer is always available", reported by Alexey Kopytov who also contributed a patch, and verified by Umesh
  • Bug #74614, "events_statements_history has errors=0 when there are errors", reported by DaniĆ«l van Eeden and verified by Umesh 
I'd also want to mention Bug #76480, "mysqlimport --use-threads doesn't use multiple threads", reported by my colleague Miguel Angel Nieto that got a detailed analysis from yours truly and then was verified by Umesh. That's all I contributed to the release :)

The last but not the least, if you use memcached to work with InnoDB tables, check Bug #75199, "MySQL crashed because of flush_all", reported by Zhai Weixiang and verified by Umesh (as most of the community bug reports for a couple of last years).

MySQL 5.6.27 seems to be a really useful and important release for those who rely on InnoDB and replication (that is, for all of us probably). Note that most of the bugs mentioned above also affected 5.7.x and are fixed in upcoming MySQL 5.7.9 and 5.8.0 releases.