Sunday, July 22, 2018

Problems with Oracle's Way of MySQL Bugs Database Maintenance

In one of my previous posts I stated that Oracle does not care enough to maintain public MySQL bugs database properly. I think it's time to explain this statement in details.

The fact that still exists and community bug reports there are still processed on a regular basis by my former colleagues, Miguel Solorzano, Sinisa Milivojevic, Umesh Shastry, Bogdan Kecman and others, is awesome. Some probably had not expected this to still be the case for 8+ years since Oracle took over the software and procedures around it. My former bugs verification team still seems to exist and even get some new members. Moreover, today we have less "Open" bugs than 6 years ago, when I was preparing to leave Oracle to join (and build) the best MySQL support team in the industry elsewhere...

That's all good and beyond the best expectation of many. Now, what's wrong then with the way Oracle engineers process community bug reports these days?

On the photo above that I made last autumn we see the West Pier in Brighton, England. It keeps collapsing after major damages it got in 2002 and 2003 from storms and fires, and this spring I've seen even more ruins happened. Same happens to MySQL public bugs database - we see it is used less than before, and severe damage to the Community is made by some usual and even relatively simple actions and practices. Let me summarize them.
  1. "Security" bugs are handled in such a way that they never becomes public back, even after the problem is fixed in all affected versions and this fix is documented.

    Moreover, often it takes bug reporter to check "security" flag by mistake or for whatever reason for nobody else ever to be able to find out what exactly the bug was about. This is done even in cases when all other vendors keep the information public or open it after the fix is published. Even worse, sometimes when somebody "escalates" some public bug forgotten for a long time (or wrongly handled) to Oracle engineers, in public, the bug immediately becomes private, even though nobody cared about it for months before. I have complained about this many times everywhere.
    I would so much like to publish all the details of mishandling of Bug #91118, for example, but the bug was made private, so you'd have to just trust my words and quotes, while I prefer to provide arguments everyone can verify... Remember the bug number though and this (top) part of the stack trace:
    mysqld(row_search_mvcc(unsigned char*,
    page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x2b76)
    char*, unsigned int, unsigned int)+0x15f) [0x1c98c1f]
    char*, unsigned char const*, unsigned int)+0x1e4) [0xe909a4]
    mysqld() [0xc5636a]
    The practice of hiding bugs once and forever was advocated by some MySQL engineers well before Oracle's acquisition, but in Oracle it became a rule that only bravest engineers sometimes can afford NOT to follow.

  2. Some older bug reports are never processed or never revisited back by Oracle engineers.

    Consider this simple example, Bug #70237 - "the mysqladmin shutdown hangs". It was reported almost 5 years ago by a famous Community member, Zhai Weixiang, who even suggested a patch. This bug had not got a single public comment from anyone in Oracle.
    The oldest open server bug today is Bug #44411  - "some Unicode text garbled in LOAD DATA INFILE with user variables". It was reported more than 9 years ago. Moreover, it was once "Verified", but then silently became "Not a bug" and then was reopened by the bug reporter. Nobody cares, even though it's clear that many Oracle engineers should get notification emails whenever any change to public bug report happens. This is also fundamentally wrong, no matter what happened to assignee or engineer who worked on the bug in the past.

    This specific problem with bugs handling is not new, we always had a backlog of bugs to verify and some bugs were re-checked maybe once in many years, but Oracle now has all kinds of resources to fix this problem, and not just reduce the number of open reports by closing them by all fair and not that fair means... See the next item also.

  3. Recently some bug reports are handled wrongly, with a trend of wasting bug reporter time on irrelevant clarifications or closing the bug too early when there is a problem to reproduce it.

    If you report some MySQL problem to Oracle, be ready to see your report closed soon with a suggestion to contact Support. Check this recent example, Bug #90375 - "Significantly improve performance of simple functions". OK, this is a known limitation, but user suggested several workarounds that are valid feature requests for optimizer, that can be smart enough to inline the stored function if it just returns some simple expression. Why not to verify this as a valid feature request?
    Another great example is Bug #80919 - "MySQL Crashes when Droping Indexes - Long semaphore wait". It was closed (after wasting more than a year on waiting) as a "Duplicate" with such a nice comment:
    "[16 Jan 15:53] Sinisa Milivojevic

    This is a duplicate bug, because it is very similar to an internal-only bug, that is not present in the public bugs database.

    I will provide the internal bug number in the hidden comment, but will update this page with public comment, once when that bug is fixed.
    No reference to the internal bug number, just nothing. If you are wondering what is the real problem, check MDEV-14637.
    Yet another case of "Not a bug", where the decision is questionable and further statements from the bug reporter are ignored is Bug #89065 - "sync_binlog=1 on a busy server and slow binary log filesystem stalls slaves".
    The bug may be "Verified" finally, but after some time wasted on discussions and clarifications when the problem is obvious and bug report contains everything needed to understand this. Nice example is Bug #91386 - "Index for group-by is not used with primary key for SELECT COUNT(DISTINCT a)". Yet another example is Bug #91010 - "WolfSSL build broken due to cmake typo". Knowing parties involved in that discussions in person, I wish they spent their time on something more useful than arguing on the obvious problems.

    This practice discourage users from reporting bugs to Oracle. Not that bug handling mistakes never happened before Oracle (I did many myself), but recently I see more and more wrongly handled bugs. This trend is scary!

  4. Oracle mostly fixes bugs reported internally.

    Just check any recent Release Notes and make your own conclusion. One may say it means that internal QA and developers find bugs even before Community notices them, but the presence of all kinds of test failures, regression bugs etc in the same recent versions still tells me that Community QA is essential for MySQL quality. But it does not set the agenda for the bug fixing process, for many years.
    One may also say that bug fixing agenda is defined by Oracle customers mostly. I let Oracle customers to comment here if they are happy with what they get. Some of the were not so happy with the speed of resolution even for their real security related bugs.
I can continue with more items, but let's stop for now.

If we want MySQL public bugs database to never reach the state of the West Pier (declared to be beyond repair in 2004), we should force Oracle to do something to fix the problems above. Otherwise we'll see further decline of bugs database and Community activity switched elsewhere (to Percona's and MariaDB's public bugs databases, or somewhere else). It would be sad to not have any central public location for all problem reports about core MySQL server functionality...

1 comment:

  1. Bug #20786 got cake for its 7th birthday:

    We missed its 10th birthday in 2016 :(

    It will be a teenager next year.