Sunday, August 12, 2018

On Oracle's QA for MySQL

In my recent blog posts I presented lists of bugs, fixed and not yet fixed, as usual. Working on these lists side tracked me from the main topic of this summer - problems in Oracle's way of handling MySQL. Time to get back on track!

Among things Oracle could do better for MySQL I mentioned QA:
"Oracle's internal QA efforts still seem to be somewhat limited.
We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public)."
I have to explain these in details, as it's common perception for years already that Oracle improved MySQL QA a lot and invests enormously in it, and famous MySQL experts were impressed even 5 years ago:
"Lets take a number we did get the QA team now has 400 person-years of experience on it. Lets say the QA team was 10 people before, and now it is tripled to 30 people. That means the average QA person has over 13 years experience in QA, which is about a year longer than my entire post-college IT career."
I was in the conference hall during that famous keynote, and QA related statements in it sounded mostly funny for me. Now, 5 years later, let me try to explain why just adding people and person-years of experience may not work that well. I'll try to present some examples and lists of bugs, as usual, to prove my points.
Emirates Air Line in London lets you see nice views of London, and it costed a lot, but hardly it's the most efficient public transport system between the North Greenwich Peninsula and Royal Victoria Dock one could imagine.
  1. We still get all kinds of regression bugs reported by MySQL Community for every release, even MySQL 8.0.12. Here is the short list of random recent examples:
    • Bug #90209 - "Performance regression with > 15K tables in MySQL 8.0 (with general tablespaces)".
    • Bug #91878 - "Wrong results with optimizer_switch='derived_merge=OFF';".
    • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM".
    • Bug #90100 - "Year type column have index, query value more than 2156 , the result is wrong".
    • Bug #91927 - "8.0.12 no longer builds with Mac brew-installed ICU".
    • Bug #91975 - "InnoDB: Assertion failure: dict0dd.cc:1071:!fail".
    It means that Oracle's MySQL QA may NOT do enough/proper regression testing. We sometimes can not say this for sure, as Oracle hides some test cases. So, we, users of MySQL, just may not know what was the intention of some recent change (tests should show it even if the fine manual may not be clear enough - a topic for my next post).
  2. We still get valid test failure bugs found by MySQL Community members. Some recent examples follows:
    • Bug #90633 - "innodb_fts.ngram_1 test fails (runs too long probably)".
    • Bug #90608 - "rpl_gtid.rpl_perfschema_applier_status_by_worker_gtid_skipped_transaction fails".
    • Bug #90631 - "perfschema.statement_digest_query_sample test fails sporadically".
    • Bug #89431 - "innodb_undo.truncate_recover MTR test failing with a server error log warning". It's fixed, but only in MySQL 8.0.13.
    • Bug #91175 - "rpl_semi_sync_group_commit_deadlock.test is not simulating flush error ".
    • Bug #91022 - "audit_null.audit_plugin_bugs test always failing".
    • Bug #86110 - "A number of MTR test cases fail when run on a server with no PERFSCHEMA".
    For me it means that Oracle's MySQL QA either do not care to run regression tests suite properly, in enough combination of platforms, options and build types, or they do not analyze the failures they get properly (and release when needed, not when all tests pass on all platforms). This is somewhat scary.
  3. We still get crashing bugs in GA releases. It's hard to notice them as they are got hidden fast or as soon as they get public attention, but they do exist, and the last example, Bug #91928, is discussed here.
  4. It seems some tools that helps to discover code problems may not be used properly/regularly in Oracle. I had a separate post "On Bugs Detected by ASan", where you can find some examples. Lucky we are that Percona engineers test ASan builds of MySQL 5.7 and 8.0 regularly, for years, and contribute back public bug reports.
  5. Oracle's MySQL QA engineers do not write much about their work in public recently. I can find some posts here and there from 2013 and 2014, but very few in recent years. One may say that's because QA engineers are working hard and have no time for blogging (unlike lazy annoying individual like me), but that's not entirely true. There is at least one Oracle engineer who does a lot of QA and makes a lot of information about his work public - Shane Bester - who is brave enough and cares enough to report MySQL bugs in public. Ironically, I doubt he has any formal relation to any of QA teams in Oracle!
  6. A lot of real MySQL QA is still done by MySQL Community, while these efforts are not that much acknowledged recently (you usually get your name mentioned in the official release notes if you submitted a patch, but the fact that you helped Oracle by finding a real bug their QA missed is NOT advertised any more since last Morgan's "Community Release Notes" published 2 years ago). Moreover, only MySQL Community tries to make QA job popular and educate users about proper tools and approaches (Percona and Roel Van de Paar personally are famous for this).
To summarize, for me it seems that real MySQL QA is largely still performed by MySQL Community and in public, while the impact of hidden and maybe huge Oracle's investments in QA is way less clear and visible. Oracle's MySQL QA investments look like those into the Emirates Air Line cable car in London to me - the result is nice to have, but it's the most expensive cable system ever built with a limited efficiency for community as a public transport.

5 comments:

  1. While Bug #91878 - "Wrong results with optimizer_switch='derived_merge=OFF'" is technically regression it has nothing to do with 8.0. This is wrong implementation of derived_merge, introduced in 5.7. 5.6 does not have derived_merge optimization, therefore query in that version always returns correct results.

    ReplyDelete
    Replies
    1. I had not stated that the list presents only 8.0-specific regressions, I stated that they still happen even in 8.0.12. Technically your bug is a regression vs 5.6 for a specific corner case, and it shows lack of proper tests. That's why it proves my points.

      Delete
    2. Well, if you search bugs database for derived_merge bugs you will find many. Only one is fixed in 8.0 so far. Together they are not corner case, but not specific 8.0 regressions either.

      Delete
  2. I am sure there will be problems, I don't want this as close as you do, and I don't support a wide range of features like you do, but I still think that quality during the Oracle years has been dramatically better than before those years.

    Too bad we don't have this level of visibility into quality for closed-source products. While their "primary" features tend to be solid, experience tells me that some of their other features are not.

    Regardless, I hope that Oracle/MySQL continues to engage with the community over QA and other issues because we can help them.

    ReplyDelete
    Replies
    1. Good we don't care about the quality of closed-source MySQL Enterprise Backup, for example :) There must be dragons...

      Quality of MySQL surely improved under Oracle's ownership and brand, in a same sense as that cable car line under Air Emirates brand improved connectivity between North Greenwich and Royal Dock area (I know it as I walked all that way starting from https://en.wikipedia.org/wiki/Greenwich_foot_tunnel...)

      My point is the costs vs benefits. I am sure that by investing a little bit of acknowledgement and appreciation into community-provided QA, or some more into Percona's QA efforts, or just nothing into sharing all the test cases developed and proper bugs processing policies, Oracle would easily improve the quality of MySQL even more.

      Delete