Sunday, August 12, 2018

On Oracle's QA for MySQL

In my recent blog posts I presented lists of bugs, fixed and not yet fixed, as usual. Working on these lists side tracked me from the main topic of this summer - problems in Oracle's way of handling MySQL. Time to get back on track!

Among things Oracle could do better for MySQL I mentioned QA:
"Oracle's internal QA efforts still seem to be somewhat limited.
We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public)."
I have to explain these in details, as it's common perception for years already that Oracle improved MySQL QA a lot and invests enormously in it, and famous MySQL experts were impressed even 5 years ago:
"Lets take a number we did get the QA team now has 400 person-years of experience on it. Lets say the QA team was 10 people before, and now it is tripled to 30 people. That means the average QA person has over 13 years experience in QA, which is about a year longer than my entire post-college IT career."
I was in the conference hall during that famous keynote, and QA related statements in it sounded mostly funny for me. Now, 5 years later, let me try to explain why just adding people and person-years of experience may not work that well. I'll try to present some examples and lists of bugs, as usual, to prove my points.
Emirates Air Line in London lets you see nice views of London, and it costed a lot, but hardly it's the most efficient public transport system between the North Greenwich Peninsula and Royal Victoria Dock one could imagine.
  1. We still get all kinds of regression bugs reported by MySQL Community for every release, even MySQL 8.0.12. Here is the short list of random recent examples:
    • Bug #90209 - "Performance regression with > 15K tables in MySQL 8.0 (with general tablespaces)".
    • Bug #91878 - "Wrong results with optimizer_switch='derived_merge=OFF';".
    • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM".
    • Bug #90100 - "Year type column have index, query value more than 2156 , the result is wrong".
    • Bug #91927 - "8.0.12 no longer builds with Mac brew-installed ICU".
    • Bug #91975 - "InnoDB: Assertion failure: dict0dd.cc:1071:!fail".
    It means that Oracle's MySQL QA may NOT do enough/proper regression testing. We sometimes can not say this for sure, as Oracle hides some test cases. So, we, users of MySQL, just may not know what was the intention of some recent change (tests should show it even if the fine manual may not be clear enough - a topic for my next post).
  2. We still get valid test failure bugs found by MySQL Community members. Some recent examples follows:
    • Bug #90633 - "innodb_fts.ngram_1 test fails (runs too long probably)".
    • Bug #90608 - "rpl_gtid.rpl_perfschema_applier_status_by_worker_gtid_skipped_transaction fails".
    • Bug #90631 - "perfschema.statement_digest_query_sample test fails sporadically".
    • Bug #89431 - "innodb_undo.truncate_recover MTR test failing with a server error log warning". It's fixed, but only in MySQL 8.0.13.
    • Bug #91175 - "rpl_semi_sync_group_commit_deadlock.test is not simulating flush error ".
    • Bug #91022 - "audit_null.audit_plugin_bugs test always failing".
    • Bug #86110 - "A number of MTR test cases fail when run on a server with no PERFSCHEMA".
    For me it means that Oracle's MySQL QA either do not care to run regression tests suite properly, in enough combination of platforms, options and build types, or they do not analyze the failures they get properly (and release when needed, not when all tests pass on all platforms). This is somewhat scary.
  3. We still get crashing bugs in GA releases. It's hard to notice them as they are got hidden fast or as soon as they get public attention, but they do exist, and the last example, Bug #91928, is discussed here.
  4. It seems some tools that helps to discover code problems may not be used properly/regularly in Oracle. I had a separate post "On Bugs Detected by ASan", where you can find some examples. Lucky we are that Percona engineers test ASan builds of MySQL 5.7 and 8.0 regularly, for years, and contribute back public bug reports.
  5. Oracle's MySQL QA engineers do not write much about their work in public recently. I can find some posts here and there from 2013 and 2014, but very few in recent years. One may say that's because QA engineers are working hard and have no time for blogging (unlike lazy annoying individual like me), but that's not entirely true. There is at least one Oracle engineer who does a lot of QA and makes a lot of information about his work public - Shane Bester - who is brave enough and cares enough to report MySQL bugs in public. Ironically, I doubt he has any formal relation to any of QA teams in Oracle!
  6. A lot of real MySQL QA is still done by MySQL Community, while these efforts are not that much acknowledged recently (you usually get your name mentioned in the official release notes if you submitted a patch, but the fact that you helped Oracle by finding a real bug their QA missed is NOT advertised any more since last Morgan's "Community Release Notes" published 2 years ago). Moreover, only MySQL Community tries to make QA job popular and educate users about proper tools and approaches (Percona and Roel Van de Paar personally are famous for this).
To summarize, for me it seems that real MySQL QA is largely still performed by MySQL Community and in public, while the impact of hidden and maybe huge Oracle's investments in QA is way less clear and visible. Oracle's MySQL QA investments look like those into the Emirates Air Line cable car in London to me - the result is nice to have, but it's the most expensive cable system ever built with a limited efficiency for community as a public transport.

Saturday, August 4, 2018

Fun with Bugs #70 - On MySQL Bug Reports I am Subscribed to, Part VIII

More than 2 months passed since my previous review of active MySQL bug reports I am subscribed to, so it's time to describe what I was interested in this summer.

Let's start with few bug reports that really surprised me:
  • Bug #91893 - "LOAD DATA INFILE throws error with NOT NULL column defined via SET". The bug was reported yesterday and seem to be about a regression in MySQL 8.0.12 vs older versions. At least I have no problem to use such a way to generate columns for LOAD DATA with MariaDB 10.3.7.
  • Bug #91847 - "Assertion `thread_ids.empty()' failed.". As usual, Roel Van de Paar finds funny corner cases and assertion failures of all kinds. This time in MySQL 8.0.12.
  • Bug #91822 - "incorrect datatype in RBR event when column is NULL and not explicit in query". Ernie Souhrada found out that the missing column is given the datatype of the column immediately preceding it, at least according to mysqlbinlog output.
  • Bug #91803 - "mysqladmin shutdown does not wait for MySQL to shut down anymore". My life will never be the same as before after this. How can we trust anything when even shutdown command is no longer works as expected? I hope this bug is not confirmed after all, it's still "Open".
  • Bug #91769 - "mysql_upgrade return code is '0' even after errors". Good luck to script writers! The bug is still "Open".
  • Bug #91647 - "wrong result while alter an event". Events may just disappear when you alter them. Take care!
  • Bug #91610 - "5.7 or later returns an error on strict mode when del/update with error func". Here Meiji Kimura noted a case when the behavior of strict sql_mode differs in MySQL 5.6 vs never versions.
  • Bug #91585 - "“dead” code inside the stored proc or function can significantly slow it down". This was proved by Alexander Rubin from Percona.
  • Bug #91577 - "INFORMATION_SCHEMA.INNODB_FOREIGN does not return a correct TYPE". This is a really weird bug in MySQL 8.
  • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM". It seems Oracle tries really hard to get rid of MyISAM by all means in MySQL 8 :)
  • Bug #91203 - "For partitions table, deal with NULL with is mismatch with reference guide". All version affected. maybe manual is wrong, but then we see weird results in information_schema as well. So, let's agree for now that it's a "Verified" bug in partitioning...
As usual, I am interested in InnoDB-related bugs:
  • Bug #91861 - "The buf_LRU_free_page function may leak some memory in a particular scenario". This is a very interesting bug report about the memory leak that happens when tables are compressed. It shows how to use memory instrumentation in performance_schema to pinpoint the leak. This bug report is still "Open".
  • Bug #91630 - "stack-use-after-scope in innobase_convert_identifier() detected by ASan". Yura Sorokin from Percona had not only reported this problem, but also contributed a patch.
  • Bug #91120 - "MySQL8.0.11: ibdata1 file size can't be more than 4G". Why nobody tries to do anything about this "Verified" bug reported 2 months ago?
  • Bug #91048 - "ut_basename_noext now dead code". This was reported by Laurynas Biveinis.
Replication problems are also important to know about:
  • Bug #91744 - "START SLAVE UNTIL going further than it should." This scary bug in cyclic replication setup was reported by  Jean-François Gagné 2 weeks ago and is still "Open" at the moment.
  • Bug #91633 - "Replication failure (errno 1399) on update in XA tx after deadlock". On top of all other problems with XA transactions we have in MySQL, it seems that replication may break upon executing a row update immediately after a forced transaction rollback due to a deadlock being detected while in an XA transaction.
Some optimizer bugs also caught my attention:
  • Bug #91486 - "Wrong column type , view , binary". We have a "Verified" regression bug here without a "regression" tag or exact versions checked. Well done, Sinisa Milivojevic!
  • Bug #91386 - "Index for group-by is not used with primary key for SELECT COUNT(DISTINCT a)". Yet another case where a well known bug reporter, Monty Solomon, had  to apply enormous efforts to get it "Verified" as a feature request.
  • Bug #91139 - "use index dives less often". A "Verified" feature request from Mark Callaghan.
The last but not the least, documentation bugs. We have one today (assuming I do not care that much about group replication):
  • Bug #91074 - "INSTANT add column undocumented". It was reported by my former colleague in Percona, Jaime Crespo. The bug is still "Verified" as of now, but since MySQL 8.0.12 release I think it's no longer valid. I see a lot of related details here, for example. But nobody cares to close this bug properly and provide the links to manual that were previously missing.
That's all for today, folks! Stay tuned.