Blog of (former?) MySQL Entomologist: July 2018

Saturday, July 28, 2018

Fun with Bugs #69 - On Some Public Bugs Fixed in MySQL 5.7.23

Several MySQL releases happened yesterday, but of them all I am mostly interested in MySQL 5.7.23, as MySQL 5.7 (either directly or indirectly, via forks and upstream fixes they merge) is probably the most widely used MySQL GA release at the moment.

In this post (in a typical manner for this "Fun with Bugs" series) I'd like to describe several bugs reported by MySQL Community users and fixed in MySQL 5.7.23. As usual, I'll try to concentrate mostly on InnoDB, replication, partitioning and optimizer-related bugs (if any).

Checking MySQL release notes is like swimming with dolphins - it's pure fun

I'd like to start with InnoDB and partitioning-related fixes:

The most important fix (that will make downgrades from 5.7.23 more problematic probably) is for Bug #86926 - "The field table_name (varchar(64)) from mysql.innodb_table_stats can overflow.", reported by Jean-François Gagné a year ago. The length of the table_name column in mysql.innodb_table_stats and mysql.innodb_index_stats tables has been increased from 64 to 199 characters (not 100% sure where this new value comes from), and one has to run mysql_upgrade while upgrading to 5.7.23.
Bug #90296 - "Hard error should be report when fsync() return EIO". Now this recent request from Allen Lai is implemented.
Bug #88739 - "Mysql scalability improvement patch: Optimized away excessive condition". The patch that removes unnecessary check for read-only transactions was suggested by Sandeep Sethia.
Bug #86370 - "crash on startup, divide by zero, with inappropriate buffer pool configuration". This funny bug was reported by Shane Bester. I wonder how many more bombs of this kind (use of uint or ulint data types) still remain in the code...
Bug #87253 - "innodb partition table has unexpected row lock". This bug was reported by Yx Jiang and a patch was later suggested by Zhenghu Wen. If the comment at the end is correct, the fix is NOT included in MySQL 8.0.12.

Now let's check some replication bugs fixed:

Bug #89272 - "Binlog and Engine become inconsistent when binlog cache file gets out of space". This bug was reported by Yoshinori Matsunobu. Good question in the last comment there, by Artem Danilov:
"I wonder why does only ENOSPC end up with flush errors and clears the binary log cached. What about any other disk error? Can this fix be extended for all disk errors?"
Bug #88891 - "Filtered replication leaves GTID holes with create database if not exists". This serious bug was reported by Simon Mudd.
Bug #89938 - "Rejoin old primary node may duplicate key when recovery", was reported by Zhenghu Wen, who cares about group replication a lot (unlike me, I have Galera for that).
Yes another bug report from Jean-François Gagné, this time in group replication - Bug #89146 - "Binlog name and Pos are wrong in group_replication_applier channel's error msgs".
Bug #81500 - "dead branch in search_key_in_table()". This bug was reported 2 years ago by Andrei Elkin, now my colleague (again) in MariaDB. Index is properly used on slave to find rows after this fix.

Surely in MySQL 5.7.23 there had to be some community bug fixed that is kept private. This time it's a bug in mysqldump:

"mysqldump exited abnormally for large --where option values. (Bug #26171967, Bug #86496, Bug #27510150)"

and this is a complete nonsense to me!

The last but not the least, we all should be thankful to Daniël van Eeden for these bug reports and patches contributed:

Bug #89001 - "MySQL aborts without proper error message on startup if grant tables are corrupt".
Bug #89578 - "Contribution: Use native host check from OpenSSL"

Had you noted any fixes to numerous public bug reports for InnoDB FULLTEXT indexes, "online" DDL or InnoDB compression, or optimizer maybe? I had not, and no wonder - the way these features are used by Community is obviously not something Oracle cares much about.

Anyway, I'd suggest to consider upgrade to MySQL 5.7.23 to anyone who cares about security and uses OpenSSL, replication of any kind or InnoDB partitioned tables in production.

Tuesday, July 24, 2018

On Some Problematic Oracle MySQL Server Features

In one of my previous posts I stated that in Oracle's MySQL server some old enough features remain half-backed, not well tested, not properly integrated with each other, and not documented properly. It's time to prove this statement.

I should highlight from the very beginning that most of the features I am going to list are not that much improved by other vendors. But they at least have an option of providing other, fully supported storage engines that may overcome the problems in these features, while Oracle's trend to get rid of most engines but InnoDB makes MySQL users more seriously affected by any problems related to InnoDB.

The Royal Pavilion in Brighton looks nice from the outside and is based on some great engineering decisions, but the decorations had never been completed, some interiors were ruined and never restored, and the building was used for too many different purposes over years.

The list of problematic MySQL server features includes (but is not limited to) the following:

InnoDB's data compression

Classical InnoDB compression (row_format=compressed) has limited efficiency and does not get any attention from developers recently. Transparent page compression for InnoDB seems to be originally more like a proof of concept in MySQL that may not work well in production on commodity hardware and filesystems, and was not integrated with backup tools.
Partitioning

Bugs reported for this feature by MySQL Community do not get proper attention. DDL against partitioned tables and partition pruning do not work the way DBAs may expect. We still miss parallel processing for partitioned tables (even though proof of concept for parallel DDL and some kinds of SELECTs was ready and working 10 years ago). Lack of careful testing of partitioning integration with other features is also visible.
InnoDB's FULLTEXT indexes
This feature appeared in MySQL 5.6, but 5 years later there are still all kinds of serious bugs in it, from wrong results to hangs, debug assertions and crashes. There are performance regressions and missing features comparing to MyISAM FULLTEXT indexes, and this makes the idea to use InnoDB for everything even more problematic. Current implementation is not designed to work with really large tables and result sets. DBAs should expect problems during routine maintenance activities, like ALTERing tables or dumps and restores when any table with InnoDB FULLTEXT index is involved.
InnoDB's "online" DDL implementation
It is not really "online" in too many important practical cases and senses. Replication ignores LOCK=NONE and slave starts to apply "concurrent" DML only after commit, and this may lead to a huge replication lag. The entire table is often rebuilt (data are (re-)written) to often, in place or by creating a copy. One recent improvement in MySQL 8, "instant ADD COLUMN", was actually contributed by Community. The size of the "online log" (that is kept in memory and in temporary file) created per table altered or index created, depends on concurrent DML workload and is hard to predict. For most practical purposes good old pt-online-schema-change or gh-ost tool work better.
InnoDB's persistent optimizer statistics

Automatic statistics recalculation does not work as expected, and to get proper statistics explicit ANALYZE TABLE calls are still needed. The implementation is complicated and introduced separate implicit transactions (in dirty reads mode) against statistics tables. Bugs in the implementation do not seem to get proper priority and are not fixed.

I listed only those features I recently studied in some details in my previous blog posts. I've included main problems with each feature according to my older posts. Click on the links in the list above to find the details.

The Royal Pavilion of InnoDB in MySQL is beautiful from the outside (and somewhere inside), but is far from being completed, and some historical design decisions do not seem to be improved over years. We are lucky that it is still used and works nice for many current purposes, but there are too many dark corners and background threads there where even Oracle engineers rarely look and even less are improving them...

Sunday, July 22, 2018

Problems with Oracle's Way of MySQL Bugs Database Maintenance

In one of my previous posts I stated that Oracle does not care enough to maintain public MySQL bugs database properly. I think it's time to explain this statement in details.

The fact that https://bugs.mysql.com/ still exists and community bug reports there are still processed on a regular basis by my former colleagues, Miguel Solorzano, Sinisa Milivojevic, Umesh Shastry, Bogdan Kecman and others, is awesome. Some probably had not expected this to still be the case for 8+ years since Oracle took over the software and procedures around it. My former bugs verification team still seems to exist and even get some new members. Moreover, today we have less "Open" bugs than 6 years ago, when I was preparing to leave Oracle to join (and build) the best MySQL support team in the industry elsewhere...

That's all good and beyond the best expectation of many. Now, what's wrong then with the way Oracle engineers process community bug reports these days?

On the photo above that I made last autumn we see the West Pier in Brighton, England. It keeps collapsing after major damages it got in 2002 and 2003 from storms and fires, and this spring I've seen even more ruins happened. Same happens to MySQL public bugs database - we see it is used less than before, and severe damage to the Community is made by some usual and even relatively simple actions and practices. Let me summarize them.

"Security" bugs are handled in such a way that they never becomes public back, even after the problem is fixed in all affected versions and this fix is documented.

Moreover, often it takes bug reporter to check "security" flag by mistake or for whatever reason for nobody else ever to be able to find out what exactly the bug was about. This is done even in cases when all other vendors keep the information public or open it after the fix is published. Even worse, sometimes when somebody "escalates" some public bug forgotten for a long time (or wrongly handled) to Oracle engineers, in public, the bug immediately becomes private, even though nobody cared about it for months before. I have complained about this many times everywhere.
I would so much like to publish all the details of mishandling of Bug #91118, for example, but the bug was made private, so you'd have to just trust my words and quotes, while I prefer to provide arguments everyone can verify... Remember the bug number though and this (top) part of the stack trace:
...
/home/openxs/dbs/8.0/bin/
mysqld(row_search_mvcc(unsigned char*,
page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x2b76)
[0x1dc78a6]
/home/openxs/dbs/8.0/bin/mysqld(ha_innobase::general_fetch(unsigned
char*, unsigned int, unsigned int)+0x15f) [0x1c98c1f]
/home/openxs/dbs/8.0/bin/mysqld(handler::ha_index_next_same(unsigned
char*, unsigned char const*, unsigned int)+0x1e4) [0xe909a4]
/home/openxs/dbs/8.0/bin/mysqld() [0xc5636a]
...

The practice of hiding bugs once and forever was advocated by some MySQL engineers well before Oracle's acquisition, but in Oracle it became a rule that only bravest engineers sometimes can afford NOT to follow.
Some older bug reports are never processed or never revisited back by Oracle engineers.

Consider this simple example, Bug #70237 - "the mysqladmin shutdown hangs". It was reported almost 5 years ago by a famous Community member, Zhai Weixiang, who even suggested a patch. This bug had not got a single public comment from anyone in Oracle.
The oldest open server bug today is Bug #44411 - "some Unicode text garbled in LOAD DATA INFILE with user variables". It was reported more than 9 years ago. Moreover, it was once "Verified", but then silently became "Not a bug" and then was reopened by the bug reporter. Nobody cares, even though it's clear that many Oracle engineers should get notification emails whenever any change to public bug report happens. This is also fundamentally wrong, no matter what happened to assignee or engineer who worked on the bug in the past.

This specific problem with bugs handling is not new, we always had a backlog of bugs to verify and some bugs were re-checked maybe once in many years, but Oracle now has all kinds of resources to fix this problem, and not just reduce the number of open reports by closing them by all fair and not that fair means... See the next item also.
Recently some bug reports are handled wrongly, with a trend of wasting bug reporter time on irrelevant clarifications or closing the bug too early when there is a problem to reproduce it.

If you report some MySQL problem to Oracle, be ready to see your report closed soon with a suggestion to contact Support. Check this recent example, Bug #90375 - "Significantly improve performance of simple functions". OK, this is a known limitation, but user suggested several workarounds that are valid feature requests for optimizer, that can be smart enough to inline the stored function if it just returns some simple expression. Why not to verify this as a valid feature request?
Another great example is Bug #80919 - "MySQL Crashes when Droping Indexes - Long semaphore wait". It was closed (after wasting more than a year on waiting) as a "Duplicate" with such a nice comment:
"[16 Jan 15:53] Sinisa Milivojevic
Hi!

This is a duplicate bug, because it is very similar to an internal-only bug, that is not present in the public bugs database.

I will provide the internal bug number in the hidden comment, but will update this page with public comment, once when that bug is fixed."

No reference to the internal bug number, just nothing. If you are wondering what is the real problem, check MDEV-14637.
Yet another case of "Not a bug", where the decision is questionable and further statements from the bug reporter are ignored is Bug #89065 - "sync_binlog=1 on a busy server and slow binary log filesystem stalls slaves".
The bug may be "Verified" finally, but after some time wasted on discussions and clarifications when the problem is obvious and bug report contains everything needed to understand this. Nice example is Bug #91386 - "Index for group-by is not used with primary key for SELECT COUNT(DISTINCT a)". Yet another example is Bug #91010 - "WolfSSL build broken due to cmake typo". Knowing parties involved in that discussions in person, I wish they spent their time on something more useful than arguing on the obvious problems.

This practice discourage users from reporting bugs to Oracle. Not that bug handling mistakes never happened before Oracle (I did many myself), but recently I see more and more wrongly handled bugs. This trend is scary!
Oracle mostly fixes bugs reported internally.

Just check any recent Release Notes and make your own conclusion. One may say it means that internal QA and developers find bugs even before Community notices them, but the presence of all kinds of test failures, regression bugs etc in the same recent versions still tells me that Community QA is essential for MySQL quality. But it does not set the agenda for the bug fixing process, for many years.
One may also say that bug fixing agenda is defined by Oracle customers mostly. I let Oracle customers to comment here if they are happy with what they get. Some of the were not so happy with the speed of resolution even for their real security related bugs.

I can continue with more items, but let's stop for now.

If we want MySQL public bugs database to never reach the state of the West Pier (declared to be beyond repair in 2004), we should force Oracle to do something to fix the problems above. Otherwise we'll see further decline of bugs database and Community activity switched elsewhere (to Percona's and MariaDB's public bugs databases, or somewhere else). It would be sad to not have any central public location for all problem reports about core MySQL server functionality...

Friday, July 6, 2018

Problems of Oracle's MySQL as an Open Source Product

In my previous summary blog post I listed 5 problems I see with the way Oracle handles MySQL server development. The first of them was that "Oracle does not develop MySQL server in a true open source way" and this is actually what I started my draft of that entire blog post with. Now it's time to get into details, as so far there was mostly fun around this and statements that MariaDB also could do better in the related Twitter discussion I had.

So, let me explain what forces me to think that Oracle is treating MySQL somewhat wrong for the open source product.

Nice pathway on this photo, but it's not straight and it's not clear where it goes. Same with MySQL development...

We get MySQL source code updated at GitHub only when (or, as it often happened in the past, some time after) the official release of new version happens. You can see, for example, that MySQL 8.0 source code at GitHub was actually last time updated on April 3, 2018, while MySQL 8.0.11 GA was released officially on April 19, 2018 (and that's when new code became really available in public repository). We do not see any code changes later than April 3, while it's clear that there are bug fixes already implemented for MySQL 8.0.12 (see Bug #90523 - "[MySQL 8.0 GA Release Build] InnoDB Assertion: (capacity & (capacity - 1)) == 0", for example. There is an easy way to crash official MySQL 8.0.11 binaries upon startup, fixed back before April 30, with some description of the fix even, but no source code of the fix is published) and 8.0.13 even (see Bug #90999 - "Bad usage of ppoll in libmysql"). With Oracle's approach to sharing the source code, we can not see the fixes that are already made long time ago, apply them, test them or comment on them. This is fundamentally wrong, IMHO, for any open source software.

In other projects we usually can see the code as soon as it is pushed to the branch (check MariaDB if you care, last change few hours ago at the moment). Main branches may have more strict rules for updating, but in general we see fixes as they happen, not only when new official release happens.

Side note: if you see that Bug #90523 became private after I mentioned it here, that's another wrong thing they often do. More on the in the next post, on community bug reports handling by Oracle...

Interesting enough, when the fix comes from community we can usually see the patch. This happened to the Bug #90999 mentioned above - we have a fix provided by Facebook and one can see the patch in Bug #91067 - "Contribution by Facebook: Do not use sigmask in ppoll for client libraries". When somebody makes pull request, patch source is visible. But one can never be sure if it's the final patch and had it passed all the usual QA tests and reviews, or what happens to pull requests closed because developer had not signed the agreement...

If the fix is developed by Oracle you'll see the code changed only with/after the official release. Moreover, it would be on you to identify the exact commit(s) that introduced the fix. For a long time Laurynas Biveinis from Percona cared to add comments about the exact commit that fixed the bug to public bug reports (see Bug #77689 - "mysql_execute_command SQLCOM_UNLOCK_TABLES redundant trans_check_state check?" as one of examples). Community members have to work hard to "reverse engineer" Oracle's fixes and link them back to details of real problems (community bug reports) they were intended to resolve!

Compare this to a typical changelog of MariaDB that leads you directly to commits and code changes.

What's even worse, Oracle started a practice to publish only part of their changes made for the release. Some tests, those for "security" bugs, are NOT published even if we assume they exist or even can be 100% sure they exist.

My recent enough favorite example is the "The CREATE TABLE of death" bug reported by Jean-François Gagné. If you follow his blog post and links in it you can find out all the details, including the test case that is public in MariaDB. With this public information you can go and crash any affected older MySQL versions. Bug reporter did everything to inform affected vendors properly, and responsible vendors disclosed the test (after they fixed the problem)!

Now, try to find similar test in public GitHub tree of Oracle MySQL. I tried to find it literally, try to find references to somewhat related public bug numbers etc, but failed. If you know better and can identify the related public test at GitHub, please, add a comment and correct me!

To summarize, this is what I am mostly concerned about:

Public source code is updated only with the releases. There are no feature-specific code branches, development branches, just nothing public until the official release.
Oracle does not provide any details about commits and their relations to bugs fixed in the release notes or anywhere else outside GitHub. One has to go study the source code to make his own conclusions.
Oracle does not share some of test cases in their commits. So, some test cases remain non-public and we can only guess (based on code analysis) what was the real intention of the fix. This applies to security bugs and who knows to what else.

I would not go into other potential problems (I've heard about some others from developers, for example, related to code refactoring Oracle does) or more details. The above is enough for me to state that Oracle do wrong things with the way they publish source code and threat MySQL as open source product.

All the problems mentioned above were introduced by Oracle, these never happened in MySQL AB or Sun. MariaDB and Percona servers may have their own problems, but the above do NOT apply to them, so I state that other vendors develop MySQL forks and related projects differently, and still are in business and doing well!

Sunday, July 1, 2018

What's Right and What's Wrong With Oracle's Way of MySQL Server Development

Recently it's quite common to state that "Oracle's Acquisition Was Actually the Best Thing to Happen to MySQL". I am not going to argue with that - Oracle proved over years that they are committed to continue active development of this great open source RDBMS, and they have invested a lot into making it better and implementing features that were missed or became important recently. Unlike Sun Microsystems, they seem to clearly know what to do with this software to make it more popular and make money on it.

Among the right things Oracle does for MySQL server development I'd like to highlight the following:

MySQL server development continues, with new features added, most popular OSes supported, regular releases happened and source code still published at GitHub under GPL license.
Oracle continues to maintain public MySQL bugs database and fix bugs reported there.
Oracle accepts external contributions to MySQL server under clear conditions. They acknowledge contributions in public. The release notes, in particular, mention authors of each community-provided patch.
Oracle works hard on improving performance and scalability of MySQL server.
Oracle tries to provide good background for their new MySQL designs (check new InnoDB redo logging, for example).
Oracle cooperates with MySQL Community. They organize their own community events and participate in numerous related conferences, including (but not limited to) the biggest Percona Live ones. Oracle engineers speak and write about their work in progress. Oracle seems to actively support some open source tools that work with MySQL server, like ProxySQL.
Oracle still keeps and maintains pluggable storage engine architecture and plugin APIs, even though their own development is recently mostly related to InnoDB storage engine.
Oracle still maintains and improves public MySQL Manual.

So, Oracle is doing good with MySQL, and dozens of their customers, community members and MySQL experts keep stating this all the time. But, as a former and current "MySQL Entomologist" (somebody who worked on processing MySQL bug reports from community and reported MySQL bugs for 13 years), I clearly see problems with the way Oracle handles MySQL server development. I write and speak about these problems in public since the end of 2012 or so, and would like to summarize them in this post.

MySQL's future is bright, but there are some clouds

Here is the list of problems I see:

Oracle does not develop MySQL server in a true open source way.
Oracle does not care enough to maintain public bugs database properly.
Some older MySQL features remain half-backed, not well tested, not properly integrated with each other and new features, and not documented properly, for years.
In general, Oracle's focus seem to be more on new developments and cool features for MySQL (with some of them got ignored and going nowhere with time).
Oracle's internal QA efforts still seem to be somewhat limited.
We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public).
MySQL Manual still have many details missing and is not fixed fast enough.
Moreover, it is not open source, so there is no other way for community to fix or improve it other than add comments or report documentation bugs, and wait.

In the upcoming weeks I am going to explain each of these items in a separate post, with some links to my older blog posts, MySQL server bug reports and other sources that should illustrate my points. In the meantime I am open for comments from those who disagree with the theses presented above.