Saturday, August 18, 2018

On Fine MySQL Manual

Today I am going to provide some details on the last item in my list of problems with Oracle's way of MySQL server development, maintenance of MySQL Manual. I stated that:
"MySQL Manual still have many details missing and is not fixed fast enough.
Moreover, it is not open source...
"
Let me explain the above:
  1. MySQL Reference Manual is not open source. It used to be built from DocBook XML sources. Probably that's still the case. But you can not find the source code in open repositories (please, correct me if I am wrong, I tried to search...) That's because it is NOT open source. It says this clearly in Preface:
    "This documentation is NOT distributed under a GPL license. Use of this documentation is subject to the following terms:
    ...
    Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.
    "
    It was NOT Oracle who closed the source (as far as I remember, the manual was not GPL even in 2004, when I started to care about MySQL in general). But Oracle had a chance to change the license and set up some better contribution process for MySQL Community users, with many good writers among them. They decided not do this, so creators of forks and derived software have to duplicate efforts and rewrite everything themselves, and the only real way to help make the manual better is to report bugs.
  2. Quality of new documentation is not improved much. We, MySQL Community users, have to report bugs in the manual, as it still has some details missing or documented wrongly. Let me illustrate this with recent 10 or so documentation requests users made (I skipped reports for features I do not care about for now, like group replication):
    • Bug #91955 - "8.0 broke plugin API for C". According to the comment from Oracle developer in this bug reported by Laurynas Biveinis, writing plugins in C is not supported any more... But this change is NOT properly documented.
    • Bug #91781 - "Add version matrix in "Chapter 2 Keywords and Reserved Words". Good idea in the manual is not consistently implemented.
    • Bug #91743 - "performance_schema.data_locks not working correctly". Nice feature was added to Performance Schema (now we can check and study InnoDB locks with SQL!), but it is still not completely and properly documented.
    • Bug #91654 - "Handling an Unexpected Halt of a Replication Slave" Documentation is uncertain". This is also an example of improper bug handling - when I worked in Oracle newer bugs were usually declared duplicates of older ones. Here the opposite decision was made, even though both reports were from the same user, Uday Varagani, who explicitly asked to change the decision. Obviously, documentation requests do not get any extra care comparing to other kinds of bugs, quite to the contrary...
    • Bug #91648 - "Numeric literals must be a number between 0 and 9999". Surely ports with numbers larger than 9999 can be used.
    • Bug #91642 - "Document how protocol version 4.0 and 4.1 differ on how strings are terminated". As noted by Rene' Cannao', comments in the code are still sometimes more useful than documentation.
    • Bug #91549 - "MDL lock queue order seems to depend on table names". Make sure to read last comment in this nice request by Shane Bester. Dmitry Lenev provides some details on priorities of MDL locks in it. There are still cases when bugs and documentation requests document some details better than fine MySQL Manual!
    • Bug #90997 - "Handling an Unexpected Halt of a Replication Slave" manual page is wrong". Sveta Smirnova highlighted a totally misleading statement in the manual.
    • Bug #90935 - "Modify relay_log_recovery Documentation". Simple documentation request stays "Open" for 3 months. Definitely processing documentation requests is not a highest priority for Oracle engineers.
    • Bug #90680 - "dragnet logging: document how to use / find error_symbol codes". Even if request comes from a customer or otherwise well known bug reporter, like Simon Mudd, and it's related to some totally new cool feature of MySQL 8, it can wait for the fix for months...
    You can make your own conclusions from the above. But I do not see any good trends in the way new documentation is created or documentation requests are processed recently. Same problems as 4 years ago (see more on that in a side note below).
  3. Older documentation requests get even less attention than recent ones sometimes, even though they may highlight problems with software itself, not the MySQL Manual. Let me illustrate this with a few bugs I reported:
    • Bug #83640 - "Locks set by DELETE statement on already deleted record". I explicitly asked to
      "...document in the manual what locks this kind of DELETE sets when it encountered a record already marked as deleted, and why"
      This still had NOT happened. Moreover, eventually both MySQL and MariaDB engineers decided that current implementation of locking for this case is wrong and can be improved, so this report ended up as InnoDB bug. Check related  MDEV-11215 for more details.
    • Bug #73413 - "Manual does not explain MTS implementation in details". This is one of my favorite documentation requests. I've got a suggestion to explain what I want to see documented, in details. I tried my best, but if you want to get more details, read this blog.
    • Bug #72368 - "Empty/zero statistics for imported tablespace until explicit ANALYZE TABLE". This is a bug in the way persistent statistics (the feature I truly hate) in InnoDB is re-estimated "automatically". But until the bug is fixed, I asked to document current implementation (and problems with it). So, where current implementation is properly documented? If only in the bug reports...
    • Bug #71294 - "Manual page for P_S.FILE_INSTANCES table does not explain '~' in FILE_NAME".  They pretend they do not get the point:
      "Per Marc Alff, the Performance Schema is simply reporting what it gets from runtime. If runtime is reporting nonexistent filenames, that's a server issue.

      Recategorizing this as a server bug.
      "
    • Bug #71293 - "Manual page for P_S.FILE_INSTANCES table does not explain EVENT_NAME values". Now they are waiting for the new DOCUMENTATION column in the setup_instruments table to be filled in with something by server developers... The code is the documentation? OK, bus as we know from the experience (see slides 44-46 here) chances to get a bug in Performance Schema fixed fast are even less than to see it properly and completely documented...
There are more problems with MySQL documentation (not only reference manual), but at the moment I consider 3 highlighted and illustrated above the most serious.

Regent's Canal is nice. If I only knew how to operate the locks there... MySQL Manual also misses some information about locks.
As a side note, it's not the first time I write about MySQL Manual. You can find some useful details in the following posts:
  • "What's wrong with MySQL Manual". In 2014, after spending few weeks reporting up to 5 documentation bugs per day, I thought that, essentially, there is nothing much wrong with it - it's fine, well indexed by Google and has meaningful human-readable URLs. Few problems listed were lack of careful readers (I tried my best to fix that), limited attention to old documentation requests, some pages with not so much useful content and lack of "How to" documents. The later I also tried to fix to some small extent in this blog, see howto tagged posts. The real fix came mostly from Percona's blog, though.
  • I have a special "missing manual" tag for blog posts that mention at least one bug in the manual.
  • I tried to use "missing_manual" tag consistently for my own documentation requests. Last year I shared a detailed enough review of the bugs with that tag that were still active.
As a yet another side note, I tried once to create a kind of community driven "missing manual" project, but failed. Writing manual from scratch is a (hard) full time job, while my full time job is providing support to users of MySQL, MariaDB and Percona software...

That said, I also wanted to join MySQL's documentation team in the past, but this was not possible at least because I am not a native speaker. If anything changed in this regard, I am still open to a job offer of this kind. My conditions for an acceptable offer from Oracle are known to all interested parties and they include (but are not limited to) at least 4 times the salary I had before I quit (it was almost 6 years ago) and working for Oracle's office in Paris (because in Oracle office you ere employed by formally matters a lot for your happiness and success as employee).

In case of no offer in upcoming week, I'll just continue to write my annoying but hopefully somewhat useful blog posts (until MySQL Reference Manual becomes true open source project ;)

Sunday, August 12, 2018

On Oracle's QA for MySQL

In my recent blog posts I presented lists of bugs, fixed and not yet fixed, as usual. Working on these lists side tracked me from the main topic of this summer - problems in Oracle's way of handling MySQL. Time to get back on track!

Among things Oracle could do better for MySQL I mentioned QA:
"Oracle's internal QA efforts still seem to be somewhat limited.
We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public)."
I have to explain these in details, as it's common perception for years already that Oracle improved MySQL QA a lot and invests enormously in it, and famous MySQL experts were impressed even 5 years ago:
"Lets take a number we did get the QA team now has 400 person-years of experience on it. Lets say the QA team was 10 people before, and now it is tripled to 30 people. That means the average QA person has over 13 years experience in QA, which is about a year longer than my entire post-college IT career."
I was in the conference hall during that famous keynote, and QA related statements in it sounded mostly funny for me. Now, 5 years later, let me try to explain why just adding people and person-years of experience may not work that well. I'll try to present some examples and lists of bugs, as usual, to prove my points.
Emirates Air Line in London lets you see nice views of London, and it costed a lot, but hardly it's the most efficient public transport system between the North Greenwich Peninsula and Royal Victoria Dock one could imagine.
  1. We still get all kinds of regression bugs reported by MySQL Community for every release, even MySQL 8.0.12. Here is the short list of random recent examples:
    • Bug #90209 - "Performance regression with > 15K tables in MySQL 8.0 (with general tablespaces)".
    • Bug #91878 - "Wrong results with optimizer_switch='derived_merge=OFF';".
    • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM".
    • Bug #90100 - "Year type column have index, query value more than 2156 , the result is wrong".
    • Bug #91927 - "8.0.12 no longer builds with Mac brew-installed ICU".
    • Bug #91975 - "InnoDB: Assertion failure: dict0dd.cc:1071:!fail".
    It means that Oracle's MySQL QA may NOT do enough/proper regression testing. We sometimes can not say this for sure, as Oracle hides some test cases. So, we, users of MySQL, just may not know what was the intention of some recent change (tests should show it even if the fine manual may not be clear enough - a topic for my next post).
  2. We still get valid test failure bugs found by MySQL Community members. Some recent examples follows:
    • Bug #90633 - "innodb_fts.ngram_1 test fails (runs too long probably)".
    • Bug #90608 - "rpl_gtid.rpl_perfschema_applier_status_by_worker_gtid_skipped_transaction fails".
    • Bug #90631 - "perfschema.statement_digest_query_sample test fails sporadically".
    • Bug #89431 - "innodb_undo.truncate_recover MTR test failing with a server error log warning". It's fixed, but only in MySQL 8.0.13.
    • Bug #91175 - "rpl_semi_sync_group_commit_deadlock.test is not simulating flush error ".
    • Bug #91022 - "audit_null.audit_plugin_bugs test always failing".
    • Bug #86110 - "A number of MTR test cases fail when run on a server with no PERFSCHEMA".
    For me it means that Oracle's MySQL QA either do not care to run regression tests suite properly, in enough combination of platforms, options and build types, or they do not analyze the failures they get properly (and release when needed, not when all tests pass on all platforms). This is somewhat scary.
  3. We still get crashing bugs in GA releases. It's hard to notice them as they are got hidden fast or as soon as they get public attention, but they do exist, and the last example, Bug #91928, is discussed here.
  4. It seems some tools that helps to discover code problems may not be used properly/regularly in Oracle. I had a separate post "On Bugs Detected by ASan", where you can find some examples. Lucky we are that Percona engineers test ASan builds of MySQL 5.7 and 8.0 regularly, for years, and contribute back public bug reports.
  5. Oracle's MySQL QA engineers do not write much about their work in public recently. I can find some posts here and there from 2013 and 2014, but very few in recent years. One may say that's because QA engineers are working hard and have no time for blogging (unlike lazy annoying individual like me), but that's not entirely true. There is at least one Oracle engineer who does a lot of QA and makes a lot of information about his work public - Shane Bester - who is brave enough and cares enough to report MySQL bugs in public. Ironically, I doubt he has any formal relation to any of QA teams in Oracle!
  6. A lot of real MySQL QA is still done by MySQL Community, while these efforts are not that much acknowledged recently (you usually get your name mentioned in the official release notes if you submitted a patch, but the fact that you helped Oracle by finding a real bug their QA missed is NOT advertised any more since last Morgan's "Community Release Notes" published 2 years ago). Moreover, only MySQL Community tries to make QA job popular and educate users about proper tools and approaches (Percona and Roel Van de Paar personally are famous for this).
To summarize, for me it seems that real MySQL QA is largely still performed by MySQL Community and in public, while the impact of hidden and maybe huge Oracle's investments in QA is way less clear and visible. Oracle's MySQL QA investments look like those into the Emirates Air Line cable car in London to me - the result is nice to have, but it's the most expensive cable system ever built with a limited efficiency for community as a public transport.

Saturday, August 4, 2018

Fun with Bugs #70 - On MySQL Bug Reports I am Subscribed to, Part VIII

More than 2 months passed since my previous review of active MySQL bug reports I am subscribed to, so it's time to describe what I was interested in this summer.

Let's start with few bug reports that really surprised me:
  • Bug #91893 - "LOAD DATA INFILE throws error with NOT NULL column defined via SET". The bug was reported yesterday and seem to be about a regression in MySQL 8.0.12 vs older versions. At least I have no problem to use such a way to generate columns for LOAD DATA with MariaDB 10.3.7.
  • Bug #91847 - "Assertion `thread_ids.empty()' failed.". As usual, Roel Van de Paar finds funny corner cases and assertion failures of all kinds. This time in MySQL 8.0.12.
  • Bug #91822 - "incorrect datatype in RBR event when column is NULL and not explicit in query". Ernie Souhrada found out that the missing column is given the datatype of the column immediately preceding it, at least according to mysqlbinlog output.
  • Bug #91803 - "mysqladmin shutdown does not wait for MySQL to shut down anymore". My life will never be the same as before after this. How can we trust anything when even shutdown command is no longer works as expected? I hope this bug is not confirmed after all, it's still "Open".
  • Bug #91769 - "mysql_upgrade return code is '0' even after errors". Good luck to script writers! The bug is still "Open".
  • Bug #91647 - "wrong result while alter an event". Events may just disappear when you alter them. Take care!
  • Bug #91610 - "5.7 or later returns an error on strict mode when del/update with error func". Here Meiji Kimura noted a case when the behavior of strict sql_mode differs in MySQL 5.6 vs never versions.
  • Bug #91585 - "“dead” code inside the stored proc or function can significantly slow it down". This was proved by Alexander Rubin from Percona.
  • Bug #91577 - "INFORMATION_SCHEMA.INNODB_FOREIGN does not return a correct TYPE". This is a really weird bug in MySQL 8.
  • Bug #91377 - "Can't Initialize MySQl if internal_tmp_disk_storage_engine is set to MYISAM". It seems Oracle tries really hard to get rid of MyISAM by all means in MySQL 8 :)
  • Bug #91203 - "For partitions table, deal with NULL with is mismatch with reference guide". All version affected. maybe manual is wrong, but then we see weird results in information_schema as well. So, let's agree for now that it's a "Verified" bug in partitioning...
As usual, I am interested in InnoDB-related bugs:
  • Bug #91861 - "The buf_LRU_free_page function may leak some memory in a particular scenario". This is a very interesting bug report about the memory leak that happens when tables are compressed. It shows how to use memory instrumentation in performance_schema to pinpoint the leak. This bug report is still "Open".
  • Bug #91630 - "stack-use-after-scope in innobase_convert_identifier() detected by ASan". Yura Sorokin from Percona had not only reported this problem, but also contributed a patch.
  • Bug #91120 - "MySQL8.0.11: ibdata1 file size can't be more than 4G". Why nobody tries to do anything about this "Verified" bug reported 2 months ago?
  • Bug #91048 - "ut_basename_noext now dead code". This was reported by Laurynas Biveinis.
Replication problems are also important to know about:
  • Bug #91744 - "START SLAVE UNTIL going further than it should." This scary bug in cyclic replication setup was reported by  Jean-François Gagné 2 weeks ago and is still "Open" at the moment.
  • Bug #91633 - "Replication failure (errno 1399) on update in XA tx after deadlock". On top of all other problems with XA transactions we have in MySQL, it seems that replication may break upon executing a row update immediately after a forced transaction rollback due to a deadlock being detected while in an XA transaction.
Some optimizer bugs also caught my attention:
  • Bug #91486 - "Wrong column type , view , binary". We have a "Verified" regression bug here without a "regression" tag or exact versions checked. Well done, Sinisa Milivojevic!
  • Bug #91386 - "Index for group-by is not used with primary key for SELECT COUNT(DISTINCT a)". Yet another case where a well known bug reporter, Monty Solomon, had  to apply enormous efforts to get it "Verified" as a feature request.
  • Bug #91139 - "use index dives less often". A "Verified" feature request from Mark Callaghan.
The last but not the least, documentation bugs. We have one today (assuming I do not care that much about group replication):
  • Bug #91074 - "INSTANT add column undocumented". It was reported by my former colleague in Percona, Jaime Crespo. The bug is still "Verified" as of now, but since MySQL 8.0.12 release I think it's no longer valid. I see a lot of related details here, for example. But nobody cares to close this bug properly and provide the links to manual that were previously missing.
That's all for today, folks! Stay tuned.