Monday, August 12, 2013

MySQL Bugs Verification - Is It Really Simple?

While it was explained already by Sveta and others what does it really mean to "Verify" (or "Confirm", in Launchpad/Percona's terms) a bug in MySQL software, and why this step in a bug's life cycle is important, we still often read complains about too much time taken to verify the bug even with a clearly repeatable test case that can be just copy/pasted, like Bug #69985 or notably more serious Bug #69990. Moreover, I often make comments of this kind myself...

So, it seems there is still a need for clear explanation of all steps that may be involved in verification of a MySQL or Percona Server (let's take server as a most complicated case) bug. First of all, both Oracle and Percona require engineers who process bugs to check them on latest releases and/or source code builds of all versions/branches that are currently GA and fully supported, and on all development versions. For Oracle MySQL bug it means check of recent builds from current source code of 5.1, 5.5, 5.6 and 5.7 and, for many kinds of bugs like packaging, also on official binaries of 5.1.71, 5.5.33, 5.6.13 and 5.7.1 provided by Oracle for the platform. Some bugs may be clearly OS-specific even based on description, but others may require checks on Linux, Windows or even Solaris and FreeBSD (if they are NOT repeatable on Linux). Note that there are also different kinds of Windows and different Linux distributions, both 32-bit and 64-bit, and sometimes all these details matter. So, if engineer did not bother to check everywhere before setting bug to "Verified", bug may come back to him with a request from development for some more checks...

On top of that from time to time MySQL developers and QA start to care about regression bugs more than usual and as a result ask engineers who process bugs to try to pinpoint the exact release when supposedly regression bug appeared. Sometimes it mean checks of 3 previous official releases, sometimes it means separate detailed study that only really brave people like Shane (who probably has all releases since 3.23.5x installed and ready to start anyway) do... Note that all these is for a bug with a clear test case, while many bug reports are not that obvious.

Surely, a lot of checks can be automated (as Sveta explained) using smart setups, MTR, MySQL sandboxes and some shell scripting. But then even a small bug/problem in scripting may lead to bugs NOT checked on some important version (like it happened with MySQL 5.6 at pre-GA stage just because it was no longer mysql-trunk, but mysql-5,6, while scripts remain the same). And then you know what happens - people like me note this problem and Pandora's box is opened...

On top of that, every "Verified" MySQL bug in Oracle should be copied to internal bugs database, and at this step one has to run a script (that may have bugs also) from a Web-form providing MySQL bug number, then check copied bug in the internal bugs database, set proper status for it, make sure it had got proper category (as not every category at http://bugs.mysql.com is supported in internal Oracle's bugs database), got proper priority and ends up assigned to the developer lead who can really care about it. Some bugs should obviously be immediately escalated, and there was a separate procedure for this... At least this was the case a year ago.

So, even if I was able sometimes to "Verify" a bug in a matter of 15 minutes since it was reported, proper verification even of a simplest server bug usually takes more than that, even if it was immediately noted by the engineer who had nothing more important to do at the moment. By the way, bugs processing is hardly a 24x7 service in any company, so we should NOT expect some engineer to really monitor all incoming bugs in a real time on Sunday.

Sounds like a really over-complicated procedure, isn't it? Is it any different in Percona? Yes, it's easier here as there is no need to copy from one bugs database to the other - everything is in one place. We also do not release binaries for Windows, FreeBSD or Solaris and thus usually do not care about bugs on these platforms much unless they are repeatable on Linux. But on the other hand Percona provides repositories of RPM and .deb packages, and thus some bugs had to be checked on all recent major releases of RHEL/CentOS, Debian and Ubuntu (that are officially fully supported platforms here). I also have to check on all major versions of Percona Server, 5.1, 5.5 and 5.6 at the moment. On top of that, if bug is not clearly related to Percona-specific feature, we have to check upstream MySQL version and, if it is affected, we have to report and link upstream bug to the Percona server bug. So, again, often we in Percona end up working with two bugs databases, a lot of copy/pasting and following other annoying procedures... So, do not expect Percona server bug to be "Confirmed" in a matter of minutes or days, even if it comes with a simple test case repeatable on a recent version. So it goes.

Summary is simple: please, respect hard and often boring work of engineers who process bugs and give them some time before complaining (this is a reminder for myself as well).

If you care a lot about the bug, probably you should just open a support request with a vendor (I hope you have a support subscription, if you really care that much?) and then use your power of a customer to make things happen. If it does not work this way - tell me and let me open Pandora's box (or can of worms, if you prefer) at Facebook...

Friday, August 9, 2013

Fun with Bugs #22 - Some Bug Reports You Should Not Miss

Yet another user installed MySQL 5.5.32 yesterday and got a system that can not start... It's really easy to help in this case - just downgrade back to 5.5.31 or upgrade to 5.5.33 if you can. Why problem happened during upgrade? Because of a regression bug #69623.

This case that was easily solved during a quick chat reminded me about the problem of bugs in production. Nobody expects any sane DBA to review every new bug report, but some of them should not be missed, at least when upgrading to any newer version. Regression bugs (I see 15 here reported for MySQL 5.6 GA versions and still "Verified", and it was a search for "regression" tag that may not be always used...) are in this category, same as bugs in new features that may be just enabled/always there by default. Let me list a few more for 5.6.13:
  • Bug #69325 - "MySQL uses significantly more memory for ALTER TABLE than expected". Imagine you are trying to use more partitions than usual because MySQL 5.6 allows it, and plan to enjoy fast ALTER maybe while adding some indexes... just to end up swapping as crazy and everything hanging. Surprise...
  • Imagine you use replication in MySQL 5.6, with status stored in tables:

    mysql> show variables like '%info%';
    +---------------------------+----------------+
    | Variable_name             | Value          |
    +---------------------------+----------------+
    | master_info_repository    | TABLE          |
    | relay_log_info_file       | relay-log.info |
    | relay_log_info_repository | TABLE          |
    | sync_master_info          | 10000          |
    | sync_relay_log_info       | 10000          |
    +---------------------------+----------------+
    5 rows in set (0.02 sec)

    and just run CHANGE MASTER from time to time (or some tool may do this even if you do not know about it) and restart your server. You know what? You may easily end up with:
  • Bug #69825 - "InnoDB: Assertion failure in thread ... in file lock0wait.cc line 297", or
  • Bug #69898 - "change_master() invokes ha_innobase::truncate() in a DML transaction" and same assertion as above actually, and then upon restart...
  • Bug #69907 - "Error(1030): Got error -1 from storage engine" and no way to start up even with innodb_force_recovery maybe...
Why is that so? Probably at least partially because you blindly trusted Oracle MySQL 5.6 GA status and had not cared to monitor bug reports... I'll speculate about possible reasons in some other post.

Is there any way to prevent this kind of troubles? Nobody can guarantee bugs free releases for you, unfortunately, but monitoring bugs database for any new bugs or at least some other sources that do monitor bugs database, like this my blog or my Facebook page, give you notably more chances to prevent unexpected troubles. So, take care...

Sunday, August 4, 2013

Fun with Bugs #21 - recently verified bugs in MySQL 5.6.13

Notable contribution of MySQL Community to MySQL 5.6.13 was explicitly recognized recently. But users and contributors still continue their efforts, as well as Oracle engineers. Even though MySQL 5.6.13 has been generally available just for few days, we already have several new bug reports and updates to known bugs at http://bugs.mysql.com. Let me present a short list with some comments.

  • Bug #69915 is a great example of a "new thinking" inside Oracle. Todd Farmer does not only write about new ways to use PERFORMANCE_SCHEMA in MySQL 5.6 in his blog, but also reports bugs found in  the process to the public bugs database. This is what every responsible MySQL engineer in Oracle should do, if you ask me. Reporting bugs not related to customer confidential data or clear security issues to Oracle's internal bugs database only is a waste of additional time and efforts for all interested parties. This particular bug report is about statement/com/Query counter not incremented, but in this case I care more about the approach used (report bug in public) than the problem itself (even though it still shows that extra QA efforts are still needed for MySQL 5.6.x).
  • Bug #69895 - "mysql 5.6.13 i386 ships with 64bit libraries" on Solaris. Not that many people still care about Solaris these days, but for Oracle as a vendor of both it would make sense to care more about proper packaging on this platform. It's also a regression bug that again questions even basic QA testing of the releases...
  • Bug #69892 - "innodb stats interferes with innodb force recovery and drop/create tables". Shane is the most famous and productive bug reported over last 7 years, and he keeps up reporting bugs to public bug database. This one is serious enough, but I am more concerned about other recent bug report affecting MySQL 5.6.x that is still open: Bug #69907. It seems not only InnoDB statistics stored in tables, but also master and slave information stored in InnoDB tables may prevent any practical use of innodb_forced_recovery. This is pretty serious and now I wonder had anybody even tried to think about forced recovery while adding these new great features and more InnoDB tables to the "data dictionary"...
  • Bug #69887 - EXPLAIN for UPDATE of a single row by PRIMARY KEY shows access type as "range". This may be not even new and is just weird, but 5.6.13 is still affected as it was recently verified:

    mysql> explain select * from tbl_sample where id = 1\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: tbl_sample
             type: const
    possible_keys: PRIMARY
              key: PRIMARY
          key_len: 2
              ref: const
             rows: 1
            Extra: NULL
    1 row in set (0.00 sec)

    mysql> explain update tbl_sample set cnt = 1 where id = 1\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: tbl_sample
             type: range
    possible_keys: PRIMARY
              key: PRIMARY
          key_len: 2
              ref: const
             rows: 1
            Extra: Using where
    1 row in set (0.01 sec)
Does this small list mean that there no more bugs in MySQL 5.6.13, but the ones above? No, many old bugs are still not fixed, some bugs were verified by Oracle engineers on 5.6.13 many days before the official release. You may get information about some of them by following links in my previous post.