Saturday, November 24, 2018

Fun with Bugs #73 - On MySQL Bug Reports I am Subscribed to, Part X

It's time to continue my review of MySQL bug reports that I considered interesting for some reason recently. I had not got any notable reaction from Oracle engineers to my previous post about recent regression bugs in MySQL 8.0.13, so probably this topic is not really that hot. In this boring post I'll just review some bugs I've subscribed to since August that are still not closed, starting from the oldest.

Let me start with a couple of bug reports that remain "Open":
  • Bug #91959 - "UBSAN: signed integer overflow in lock_update_trx_age". It's really unusual to see bug reported by Shane Bester himself just "Open" for months. As he noted, it's really strange to see age defined as 32-bit value here when age_updated two lines later is 64-bit. On the other hand, from comments in the code it seems the age value is supposed to be small enough and if it grows too much this is a problem.
  • Bug #92964 - "Slave performance degrades over time". As description says:
    "When mysql slave is running gtid based asynchronous multi threaded replication, performance degrades over time in significant way when session_track_gtids is set to OWN_GTID; "
    The bug is assigned, but there are no public comments, while the way to reproduce seem to be defined clearly.
Same as the weather at Seven Sisters Cliffs back on September 14, 2018, some recent MySQL bug reports do not add confidence to those few visitors who explore them...

Now let me continue with some "Verified" bugs:
  • Bug #91981 - "Inconsistent user@host definitions for definer/grantor/grantee columns". It seems consistency is not a high priority these days.
  • Bug #92020 - "Introduce new SQL mode rejecting queries with results depending on query plan". I already mentioned this report while reviewing recent MySQL optimizer bugs. I have nothing to add to my conclusion there:
    "So, current MySQL considers different results when different execution plans are used normal!"
  • Bug #92032 - "Restrict usage of session foreign_key_checks". It's a great request from Federico Razzoli and I am happy to see it verified as a bug, fast.
  • Bug #92093 - "Replication crash safety needs relay_log_recovery even with GTID." Recently Jean-François Gagné pays special attention to MySQL replication crash safety in his bug reports, talks and blog posts. This specific report ended up as a request for a more clear documentation that should not provoke (false) safety feelings. See also his Bug #92109 - "Please make replication crash safe with GTID and less durable setting (bis)." for the request to improve MySQL in this regard one day.
  • Bug #92131 - "ASan: Direct leak of 272 byte(s) in main.mysqlpump_partial_bkp MTR test case". This kind of reports make me nervous. They make me think that still there is no regular testing of ASan-enabled builds in Oracle. Lucky we are, Percona engineers (like Yura Sorokin) do this for Oracle.
  • Bug #92133 - "DICT_SYS mutex contention causes complete stall when running with 40 mill tables". Proper data dictionary in MySQL 8.0 was supposed to solve some performance problems for instances with a lot of tables. As this perfect bug report from Alexander Rubin shows, this is not yet the case. We can still push MySQL instance over the limits and single instance for thousands of different small databases is hardly usable.
  • Bug #92209 - "AVG(YEAR(datetime_field)) makes an error result because of overflow". Perfect analysis from Pin Lin. All recent MySQL (and MariaDB) versions are affected.
  • Bug #92252 - "Mysql generates gtid gaps because of option slave-skip-errors". Yet another bug report from Pin Lin
  • Bug #92364 - "events_transactions_summary_global_by_event_name not working as expected". This bug was reported by Przemyslaw Malkowski from Percona. We all know Performance Schema is near perfect, but it turned out that "...performance_schema.events_transactions_summary_global_by_event_name behavior for instrumenting transactions seems completely broken.".
  • Bug #92398 - "point in time recovery using mysqlbinlog broken with temporary table -> errors". Let me quote Shane Bester:
    "Running a multi-threaded workload involving temporary and non-temporary tables leads to binary log playback giving errors."
  • Bug #92421 - "Queries with views and operations over local variables don't use indexes". This is a kind of regression in MySQL 5.7. MySQL 8 allows to workaround the problem properly using ROW_NUMBER() function.
  • Bug #92540 - "Comparison between DATE and DECIMAL does not round nanoseconds". This bug was reported by Alexander Barkov from MariaDB. MariaDB 10.3.x is not affected based on my tests.
To summarize:
  1. It seems ASan-enabled builds are not tested by Oracle engineers on a regular basis.
  2. Percona engineers help a lot with regular MySQL QA and testing for various extreme use cases.
  3. There is a lot to do to make GTID-based replication crash-safe and working fast in common use cases.
 More bug review are coming soon, so stay tuned.



Sunday, November 18, 2018

Fun with Bugs #72 - On MySQL Bug Reports I am Subscribed to, Part IX

I've subscribed to more than 60 new bug reports since my previous post in this series. It means that I'd need 4-5 posts to cover all new subscriptions and reasons behind them. I still plan to write about most of the bug reports I was interested in recently, but for this post I decided to pick up only MySQL 8.0 regression bugs and pay special attention to those that could be handled better or faster by Oracle engineers, as well as those handled perfectly.

The initial reason for this special attention originally was Bug #93085 - "Stall when concurrently execute create/alter user with flush privilege", that caused a lot of interesting Twitter discussions. It took some time, comments (in the bug report and in social media) and pressure from MySQL Community (including yours truly) to get it accepted as a real (regression!) bug to work on, and got "Verified". Unfortunately too often recently I see more time spent on arguing that something is not a bug or can not be reproduced, or is an example of improper usage of some MySQL feature etc instead of simply checking how things worked before MySQL 8.0 and how this changed, to worse.

Another example of "interesting" approach to bugs in MySQL 8.0 is Bug #93102 - "I_S queries don't return column names in the same case as requested.". It's indeed a duplicate of old and well known Bug #84456 - "column names in metadata appearing as uppercase when selecting from I_S" reported at early 8.0 development stage by Shane Bester from Oracle and community user (see Bug #85947). Still, it was decided NOT to fix it and tell users to rely on workaround, while this breaks application compatibility and is a regression.

Take a look at Bug #92998 - "After finishing the backup with mysqldump, mysql crashes and restarts" also. It ended up in "Unsupported" status, with some statements that "Dumping and restoring data between different 8.0 releases is not supported". This can be classified as a regression by itself. What I miss is a link to the manual saying it's not supported (was not able to find it in 5 minutes) any explanation of crash and restart - supported or not, running mysqldump should NOT cause server restarts in a general case. I think this bug report could end up in many statuses, but of them all "Unsupported" is hardly correct.

This my photo is far from ideal and can be criticized from different points of view, but there is no point to argue with the fact that it shows clouds in the sky. I wish the fact that MySQL 8.0 GA releases still have regression bugs is accepted with less arguing and more attention.

Now let me continue with a list of recently reported regression bugs in MySQL 8.0 that were handled mostly properly:

  • Bug #93215 - "Case statement use in conditional with sub_part index regression in 8.0". MySQL of versions < 8.0 (and MariaDB 10.3.7) work as expected also. The bug was verified fast, but it still misses explicit "regression" tag.
  • Bug #93214 - "LIMIT is applied before HAVING when you have a subquery". The bug was "Verified" quickly, but I still miss the exact 8.0.x version(s) affected and the results of checking with older versions. I strongly suspect it's a regression, as MariaDB 10.3.7 provides expected result:
MariaDB [test]> CREATE TABLE test (id INT PRIMARY KEY, value INT);
Query OK, 0 rows affected (0.510 sec)
MariaDB [test]> INSERT INTO test VALUES (1, 99), (2,98), (3, 97);
Query OK, 3 rows affected (0.057 sec)
Records: 3  Duplicates: 0  Warnings: 0
MariaDB [test]> SELECT t1.id, (SELECT t2.value FROM test t2 WHERE t1.id = t2.id) AS sub_value FROM test t1 HAVING sub_value = 99 ORDER BY value LIMIT 1;
+----+-----------+
| id | sub_value |
+----+-----------+
|  1 |        99 |
+----+-----------+
1 row in set (0.116 sec)
            • Bug #93170 - "undo truncation in 8.0.13 is not crash safe". The bug was quickly verified (after all, it's a failure of existing innodb_undo.truncate_recover MTR test case), but had not got "regression" tag. I am still not sure how it was missed during regular testing and ended up in the MySQL 8.0.13 release.
            • Bug #93147 - "Upgrade to 8.0.13 from 8.0.11 fails". In pre-8.0 releases there was no strict need to update to every intermediate minor version, so it's also a regression of a kind for any production DBA.
            • Bug #92979 - "MySQL 8.0 performance degradation on INSERT with foreign_key_checks=0". This is a verified performance regression comparing to MySQL 5.7, but "regression" tag is still missing. 

            To summarize, there are some regressions noted by community users recently in MySQL 8.0 GA releases. Some of them were demonstrated with simple test cases, so it's strange they were not noted by Oracle's QA. What's worse, it seems some of Oracle engineers are not ready to accept the fact that the best ever MySQL 8.0 GA release they worked on may get some things done incorrectly and worse than before, so they seem to waste time on useless discussions that everything is OK, work as expected and nothing can be done differently.  I also see some processed and verified bug reports without detailed check for regressions presented to users or even with "regression" tag NOT added when needed.

            I hope this is not going to become a new trend. I wish all community bug reports and features of MySQL get as much attention and detailed study from Oracle employees as (far from perfect) JSON support in MariaDB...

            Sunday, November 4, 2018

            On New Severity Levels for MySQL Bugs

            Four weeks ago while working on a blog post about half baked XA transactions feature of MySQL server I've noted that there are new severity levels added by Oracle for MySQL bug reports. Previously we had 5 levels:

            • S1 (Critical) - mostly for all kinds of crashes, DoS attack vectors, data corruptions etc
            • S2 (Serious) - mostly for wrong results bugs, broken replication etc
            • S3 (Non-critical) - all kinds of minor but annoying bugs, from unexpected results in some corner cases to misleading or wrong error messages, inefficient or unclear code etc
            • S4 (Feature requests) - anything that should work or be implemented based on common sense, but is not documented in the manual and was not required by the original specification or implementation of some feature.
            • S5 (Performance) - everything works as expected and documented, but the resulting performance is bad or less than expected. Something does not scale well, doesn't return results fast enough in some cases, or could be made faster or some specific platform using some different code or library. This severity level was also probably added at Oracle times, at least it was not there in 2005 when I started to work on MySQL bugs.

            Informal descriptions above are mine and may be incorrect or different from definitions Oracle engineers currently use. I tried to search for Oracle definitions that apply to MySQL, but was not able to find anything immediately useful (any help with public URL is appreciated). 

            In general, severity is defined as the degree of impact a bug has on the operation or use of some software, so less severity assumes less impact on common MySQL operations. One may also expect that bugs with higher severity are fixed first (have higher internal priority). It may not be that simple (and was not during my days in MySQL, when many more inputs were taken into account while setting priority for the bug fix), but it's a valid assumption for any community member.

            By default when searching for bugs you got all bugs of severity levels S1, S2, S3 and S5. You had to specifically care to get feature requests in search results while using bugs database search interface.

            If you try to search bugs today, you'll see two more severity levels added, S6 (Debug Builds) and S7 (Test Cases):

            Now we have 7 Severity levels for MySQL bug reports
            S6 severity level seems to be used for assertion failures and other bugs that affect only debug builds and can not be reproduced literally with non-debug binaries. S7 severity level is probably used for bug reports about failing MTR test cases, assuming that failure does NOT show a regression in MySQL software, but rather some non-determinism, platform dependencies, timing assumptions or other defects of the test case itself.

            By default bug reports with these severity levels are NOT included in search (they are not considered "Production Bugs"). So, one has to care to see them. This, as well as normal common sense based assumption that lower severity eventually means to lower priority for the fix, caused some concerns. It would be great for somebody from Oracle to explain the intended use and reasons for introduction of these severity levels with some more text than a single tweet, to clarify possible FUD people may have. If applied formally, these new severity values may lead to low priority for quite important problems. Most debug assertions are in the code for really good reason, as many weird things (up to crashes and data corruption) may happen in non-debug binaries somewhere later in cases when debug-only assertion fails.

            I was surprised to find out that at the moment we have 67 active S6 bug reports, and 32 active S7 bug reports. The latter list obviously includes reports that should not be S7, like Bug #92274 - "Question of mysql 8.0 sysbench oltp_point_select test" that is obviously about a performance regression noted in MySQL 8 (vs MySQL 5.7) by the bug reporter.

            Any comments from Oracle colleagues on the reasons to introduce new severity levels, their formal definitions and impact on community bug reports processing are greatly appreciated.