Saturday, February 23, 2019

Fun with Bugs #80 - On MySQL Bug Reports I am Subscribed to, Part XVI

Today I'd like to continue my review of public MySQL bug reports with a list of some bugs I've subscribed to over last 3 weeks. It's already long enough and includes nice cases to check and share. Note that I usually subscribe to a bug either because it directly affects me or customers I work with, or I consider it technically interesting (so I mostly care about InnoDB, replication, partitioning and optimizer bugs), or it's a "metabug" - a problem in the way public bug report is handled by Oracle engineers. These are my interests related to MySQL bugs.

As usual, I start with the oldest bugs and try to mention bug reporters by name with links to their other reports whenever this may give something useful to a reader. I try to check if MariaDB is also affected in some cases. Check also my summary comments at the end of this blog post.
  • Bug #94148 - "Unnecessary Shared lock on parent table During UPDATE on a child table". In this bug report Uday Varagani reasonably pointed out that formally there is no need to lock parent row when column NOT included in the foreign key gets updated. This happens though when this column is included into the index used to support foreign key constraint. IMHO it's a reasonable feature request and both Trey Raymond and Sveta Smirnova tried their best to  highlight this, but this report now has a "Need Feedback" status with a request to explain new algorithm suggested. It's simple - "Stop it", check that column changed is NOT the one foreign key is defined on, even if it's in the same index...I see no reason NOT to verify this as a reasonable feature request. Is it a new policy that every feature request should come with details on how to implement it? I truly doubt.
  • Bug #94224 - "[5.6] Optimizer reconsiders index based on index definition order, not value". Domas Mituzas found yet another case (see also Bug #36817 - "Non optimal index choice, depending on index creation order" from Jocelyn Fournier, the bug I verified more than 10 years ago) when in MySQL order of index definition matters more for optimizer than anything else.  My quick check shows that MariaDB 10.3.7 is not affected:
    MariaDB [test]> explain select distinct b from t1 where c not in (0) and d > 0;+------+-------------+-------+-------+---------------+--------------------+---------+------+------+-------------+| id   | select_type | table | type  | possible_keys | key            | key_len
    | ref  | rows | Extra                    |
    +------+-------------+-------+-------+---------------+--------------------+--------+------+------+-------------+
    |    1 | SIMPLE      | t1    | index | NULL          | non_covering_index | 9    | NULL |    1 | Using where |
    +------+-------------+-------+-------+---------------+--------------------+---------+------+------+-------------+
    1 row in set (0.002 sec)

    MariaDB [test]> alter table t1 add index covering_index (b, c, d);
    Query OK, 0 rows affected (0.149 sec)
    Records: 0  Duplicates: 0  Warnings: 0

    MariaDB [test]> explain select distinct b from t1 where c not in (0) and d > 0;
    +------+-------------+-------+-------+---------------+----------------+---------+------+------+--------------------------+
    | id   | select_type | table | type  | possible_keys | key            | key_len
    | ref  | rows | Extra                    |
    +------+-------------+-------+-------+---------------+----------------+---------+------+------+--------------------------+
    |    1 | SIMPLE      | t1    | index | NULL          | covering_index | 14
    | NULL |    1 | Using where; Using index |
    +------+-------------+-------+-------+---------------+----------------+---------+------+------+--------------------------+
    1 row in set (0.025 sec)
    Fortunately MySQL 8 is no longer affected. Unfortunately we do not see a public comment showing the results of testing on MySQL 5.7 (or any version, for that matter), from engineer who verified the bug. I already pointed out that this "metabug" becomes popular in my previous blog post.
  • Bug #94243 - "WL#9508 introduced non-idiomatic potentially-broken C macros". Laurynas Biveinis from Percona found new code that in ideal world wound not pass any serious code review.
  • Bug #94251 - "Aggregate function result is dependent by window is defined directly or as named". This bug was reported by Владислав Сокол. From what I see:
    MariaDB [test]> WITH RECURSIVE cte AS (
        -> SELECT 1 num
        -> UNION ALL
        -> SELECT num+1 FROM cte WHERE num < 5
        -> )
        -> SELECT num, COUNT(*) OVER (frame) cnt_named, COUNT(*) OVER (ORDER BY num
    DESC) cnt_direct
        -> FROM cte
        -> WINDOW frame AS (ORDER BY num DESC);
    +------+-----------+------------+
    | num  | cnt_named | cnt_direct |
    +------+-----------+------------+
    |    1 |         5 |          5 |
    |    2 |         4 |          4 |
    |    3 |         3 |          3 |
    |    4 |         2 |          2 |
    |    5 |         1 |          1 |
    +------+-----------+------------+
    5 rows in set (0.117 sec)

    MariaDB [test]> WITH RECURSIVE cte AS (
        -> SELECT 1 num
        -> UNION ALL
        -> SELECT num+1 FROM cte WHERE num < 5
        -> )
        -> SELECT num, COUNT(*) OVER (frame) cnt_named, COUNT(*) OVER (ORDER BY num
    DESC) cnt_direct
        -> FROM cte
        -> WINDOW frame AS (ORDER BY num DESC)
        -> ORDER BY num desc;
    +------+-----------+------------+
    | num  | cnt_named | cnt_direct |
    +------+-----------+------------+
    |    5 |         1 |          1 |
    |    4 |         2 |          2 |
    |    3 |         3 |          3 |
    |    2 |         4 |          4 |
    |    1 |         5 |          5 |
    +------+-----------+------------+
    5 rows in set (0.003 sec)
    MariaDB 10.3.7 is NOT affected.
  • Bug #94283 - "MySQL 8.0.15 is slower than MySQL 5.7.25". Percona's CTO Vadim Tkachenko reported that MySQL 8.0.15 is notably slower than 5.7.25 on a simple oltp_read_write sysbench test. He had recently written a separate blog post about this, with more details.There is one detail to clarify based on today's comment from Peter Zaitsev (was the same default character set used), but as my dear friend Sinisa Milivojevic verified the bug without any questions, requests or his own test outputs shared, we can assume that Oracle officially accepted this performance regression (even though "regression" tag was not set).

    Check also later Bug #94387 - "MySQL 8.0.15 is slower than MySQL 5.7.25 in read only workloads", yet another performance regression report from Vadim, where he found that on read only (sysbench oltp_point_select) all in memory workloads MySQL 8.0.15 may also be slower than MySQL 5.7.25.
  • Bug #94302 - "reset master could not break dump thread in some cases". This bug was reported by Ashe Sun. This is definitely a corner case, as it happens only master is still writing to the very first binary log. We can not find out from public comments in the bug report if any other versions besides 5.7.x are affected. This is yet another "metabug" - during my days in Oracle's MySQL bugs verification team we had to check on all versions still supported and present the results explicitly.
  • Bug #94319 - "Format_description_log_event::write can cause segfaults". Nice bug report by Manuel Ung from Facebook.
  • Bug #94330 - "Test for possible compressed failures before upgrade?". Change of zlib version starting from MySQL 5.7.24 means that some operations for InnoDB tables with ROW_FORMAT=COMPRESSED that previously worked may start to fail. In this report Monty Solomon asks for some way to determine if there will be a problem with existing compressed tables before upgrading to 5.7.24. The bug is still "Open".
  • Bug #94338 - "Dirty read-like behavior in READ COMMITTED transaction". Bug reporter, Masaki Oguro, stated that MySQL 8 is not affected (only 5.6 and 5.7) and the bug is verified on these versions, so we should assume it's really the case. But I miss public comment showing the result of testing on recent MySQL 8.0.15.
  • Bug #94340 - "backwards incompatible changes in 8.0: Error number: 3747". Simon Mudd complains about incompatible change in 8.0.13 that does not allow slave to easily switch from SBR to RBR without restart (and was not clearly documented as a change in behavior). Make sure to read all comments.
  • Bug #94370 - "Performance regression of btr_cur_prefetch_siblings". Nice bug report with a patch from Zhai Weixiang.
  • Bug #94383 - "simple ALTER cause unnecessary InnoDB index rebuilds, 5.7.23 or later 5.7 rlses". In this bug report Mikhail Izioumtchenko presented the detailed analysis and suggested diagnostics patches to show what really happens and why. This bug is also a regression of a kind, so while testing results are presented, I still think that it could be processed better according to the good old rules I have in mind.
  • Bug #94394 - "Absence of mysql.user leads to auto-apply of --skip-grant-tables". Great finding by Ceri Williams from Percona. Sveta Smirnova provided a separate MTR test case and clarified the impact of the bug. Surely this is also a regression comparing to MySQL 5.7, as there you can not start MySQL if mysql.user table is missing. I leave it to a reader to decide if there is any security-related impact of this bug...
  • Bug #94396 - "Error message too broad: The used command is not allowed with this MySQL version". This bug was reported by my former colleague in Percona Support, famous Bill Karwin. Informative error messages matter for good user experience.
We rely on MySQL in a same way as that guys on top of dolphins pyramid on this strange monument in some court somewhere at the Lanes. Reliable foundation matters, so regressions should better be avoided.
To summarize:
  1. Looks like it's time for Oracle to spend some efforts to make MySQL 8 great again, by fixing some of the bugs mentioned above, especially performance regressions vs MySQL 5.7 found recently by Vadim Tkachenko from Percona.
  2. Oracle continues to introduce backward-incompatible changes in behavior in minor MySQL 8.0.x releases at GA stage. This is not really good for any production environment.
  3. Asking bug reporters to provide "the basics of such a new algorithm" when they complain that current one is wrong or not optimal is a new word in bugs processing!
  4. When I joined MySQL bugs verification team in 2005 we've set up a culture of bugs processing that included, among other things, presenting in a public comment any successful or unsuccessful attempt to verify the bug, by copy-pasting all commands and statements used along with the outputs, whenever possible and with enough context to show what was really checked. I've studied this approach from Oracle's Tom Kyte over the previous 10 years when I followed him closely. I used to think it's standard for more than a decade already, a kind of my (and not only my) "heritage". It's sad to see this approach is no longer followed by many Oracle engineers who process bugs, in too many cases.
  5. Oracle engineers still do not use "regression" tag when setting "Verified" status for obviously regression bugs. I think bug reporters should care then to always set it when they report regression of any kind.

Saturday, February 16, 2019

Fun with Bugs #79 - On MySQL Bug Reports I am Subscribed to, Part XV

More than 3 weeks passed since my previous review of public MySQL bug reports I am subscribed to, so it's time to present some of the bugs I've considered interesting in January, 2019.

As usual, I'll review them starting from the oldest and try to summarize my feelings about these bugs at the end of this post. Here they are:
  • Bug #93806 - "Document error about ON DUPLICATE KEY UPDATE". Years pass, but fine MySQL manual still does not explain some cases of InnoDB locking properly. Xiaobin Lin found yet another case that it does not explain properly. Or, maybe, the manual is correct and the problem in the implementation? MariaDB 10.3.7 shows the same behavior.
  • Bug #93827 - "dict_index_has_desc() is not efficient". Yet another bug report from Zhai Weixiang. I see 50 still active bug reports from him! Maybe Oracle should send some nice T-shirts to top N most productive bug reporters?
  • Bug #93845 - "Optimizer choose wrong index, sorting index instead of filtering index". yet another bug report of a known class, this time from Daniele Renda. It's good example of optimizer trace usage to make a point. Note also that using ANALYZE ... UPDATE HISTOGRAMS does not help. As a side note, implementation of optimizer trace for MariaDB is finally in progress and should be done for upcoming 10.4. See MDEV-6111 for the details if you care.
  • Bug #93875 - "mysqldump per-table dump is slow since 5.7 on instances with many tables". This performance regression bug (that was "verified" without adding the regression tag) was reported by Nikolai Ikhalainen from Percona. This bug report is a nice example of using Docker to create easily repeatable test cases for bug reports.
  • Bug #93878 - "innodb_status_output fails to restore to old value". This great bug report from Yuhui Wang  not only describes 3 cases when InnoDB status is printed to the error log automatically, but also shows that in one of these cases, when we can not found free block in the buffer pool in 20 loops, this printing is not stopped after the problem is resolved, and provides a patch that resolves the problem. See also his nice Bug #94065 - "MySQL fails to startup when setting persist variable" with detailed analysis of the problem.
  • Bug #93917 - "Wrong binlog entry for BLOB on a blackhole intermediary master". Nice corner case was found by Sveta Smirnova from Percona. With her 52 "Verified" bug reports at the moment she also deserves a T-shirt from Oracle as one of top bug reporters!
  • Bug #93922 - "UNION ALL very slow with SUM(0)". This weird bug was found and reported by Sergio Paternoster. He had to spend notable efforts to see this bug "Verified"...
  • Bug #93948 - "XID inconsistency on master-slave with CTAS". Krunal Bauskar from Percona noted this inconsistency in XID generation on slave vs master. Let's wait and check if it ends up as "Not a bug".
  • Bug #93957 - "slave_compressed_protocol doesn't work with semi-sync replication in MySQL-5.7". This bug report from Pavel Katiushyn also looks like a regression, as similar bug was fixed in older 5.7.x release. But I do not see any public comment with verification attempt neither in recent 5.7, nor in recent 8.0 (where older bug also had to be fixed). So, the bug is "verified", but the real impact and versions affected are not clear.
  • Bug #93963 - "Slow query log doesn't log a slow CREATE INDEX with admin statements enabled". This clear and properly tagged regression vs MySQL 5.7 was reported by Jeremy Smyth.
  • Bug #93986 - "Transactions in serializable mode are not actually serializable". I've subscribed to this bug report mostly for (expected) fun of reading further comments. It's still "Need feedback", but single comment so far is worth reading.
  • Bug #94121 - "Enable hardware CRC32 under Valgrind". Laurynas Biveinis from Percona also provided a patch for this 8 years old problem.
  • Bug #94130 - "XA COMMIT may lead replication broken". Yet another proof that XA transactions implementation is broken in MySQL. This time from Phoenix Zhang and in semi-sync replication case.
This photo reminds me current state of MySQL bugs processing in Oracle - it seems there is no clear and straightforward way to follow. Everything is fuzzy these days...

There are few more bugs reported in January, 2019 that I am watching, but their status is not yet clearly defined, so I decided to skip them in this review.

To summarize:
  1.  Oracle engineers who process bugs still do not add regression tag to many regression bugs. This is a shame, really. If I were their boss I'd make this a policy and one of important KPI values to monitor.
  2. In some cases bugs get verified immediately without any demonstrated attempt to show how the check was performed, while in other cases poor bug reporters have to fight hard to re-make their point and get a real check done. It seems these days good old approaches to bugs verification are not followed strictly by some Oracle engineers.

Saturday, February 9, 2019

On my Favorite FOSDEM 2019 MySQL, MariaDB and Friends Devroom Talks

This year I had not only spoken about MySQL bugs reporting at FOSDEM, but spent almost the entire day listening at MySQL, MariaDB and Friends Devroom. I missed only one talk, on ProxySQL, (to get some water, drink a bottle of famous Belgian beer and chat with my former colleague in MySQL support team, Geert, whom I had not seen for a decade). So, for the first time out of my 4 FOSDEM visits I've got a first hand impression about the entire set of talks in the devroom that I want to share today, while I still remember my feelings.

Most of the talks have both slides and videos already uploaded on site, so you can check them and make your own conclusions, but my top 5 favorite talks (that have both videos and slides already available to community) were the following:

  • "Un-split brain (aka Move Back in Time) MySQL", by Shlomi Noach. You can find slides at SlideShare.

    This was a replacement talk that was really interesting and had proper style for FOSDEM. It was mostly a nice background story of creation of the gh-mysql-rewind tool, a shell script that uses MariaDB's mysqlbinlog --flashback option and MySQL GTIDs and allows to "rewind" row-based binary log to roll back transactions to some previous point in time. The tool should become available to community soon, maybe as a part of orchestrator. I was impressed how one can successfully use 49 slides for 20 minutes talk. That's far beyond my current presentation skills...
  • "Test complex database systems in a laptop with dbdeployer", by Giuseppe Maxia. You can find slides at SlideShare.

    I've already built and used dbdeployer, as described in my blog post, so I was really interested in the talk. Giuseppe was able not only to show 45 slides over 20 minutes and explain all the reasons behind re-implementing MySQL-Sandbox in Go, but also run a live demo where dozens of sandbox instances were created and used. Very impressive!
  • "MySQL and the CAP theorem: relevance & misconceptions", second great talk and show by Shlomi Noach. You can find slides at SlideShare.

    The "CAP theorem" says is a concept that a distributed database system (like any kind of MySQL replication setup) can only have 2 of the 3 features: (atomic) Consistency, (high) Availability and Partition Tolerance. This can be proved mathematically, but Shlomi had not only defined terms and conditions to present the formal proof, but also explained that they are far from real production objectives of any engineer or DBA (like 99.95% of Availability). He had shown typical MySQL setups (from simple async master-slave replication to Galera, group replication and even Vitess) and proved that formally they all are neither consistent nor available from that formal CAP theorem point of view, while, as we all know, they are practically useful and work (and with some efforts, proxies on top etc can be made both highly available and highly consistent for practical purposes). So, CAP theorem is neither representing real production systems, nor meeting their real requirements. We've also got some kind of explanation of why async master-master or circular replication are still popular... All that in 48 slides, with links, and presented in 20 minutes! Greatest short MySQL-related talk I've ever attended.
  • "TiDB: Distributed, horizontally scalable, MySQL compatible", by Morgan Tocker. You can find slides at SlideShare.

    It was probably the first time when I listened to Morgan, even though we worked together for a long time. I liked his way of explaining the architecture of this yet another database system speaking MySQL protocol and reasons to create it. If you are interested in performance of this system, check this blog post.
  • "MySQL 8.0 Document Store: How to Mix NoSQL & SQL in MySQL 8.0", by Frédéric Descamps. You can find slides (70!) at SlideShare.

    LeFred managed to get me somewhat interested in MySQL Shell and new JSON functions in MySQL, way more than ever before. It's even more surprising that hist talk was the last one and we already spent 8+ hours listening before he started. Simple step by step explanation of how one may get the best of both SQL, ACID and NoSQL (JSON, "MongoDB") worlds, if needed, in a single database management syste, was impressive. Also this talk probably caused the longest discussion and the largest number of questions from those remaining attendees.

    He was also one of two "hosts" and "managers" of the devroom, so I am really thankful him for hist efforts year after year to make MySQL devroom at FOSDEM great!
There were more good talks, but I had to pick up few that already have slides shared and those of a kind that I personally prefer to listen to at FOSDEM. This year I also missed few people whom I like to see and talk to at FOSDEM, namely Mark Callaghan and Jean-François Gagné.

The only photo I made with my Nokia dumb phone this year in Brussels, on my way to FOSDEM on February 2. We've got snow and rain that morning, nice for anyone who had to walk 5 kilometers to the ULB campus.
Overall, based on my experience this year, it still makes a lot of sense to visit FOSDEM for anyone interested in MySQL. You can hardly find so many good, different MySQL-related talks per just one single day on any other conference.