Sunday, March 22, 2020

Post #200 in the Blog of Former MySQL Entomologist!

I noted that since August 2012 when I started this blog (while planning to leave my job as a MySQL Entomologist in Oracle to see some real MySQL life besides bugs) I've already published 199 posts. In this post #200 I'd like to look back and pick up one post per each year of blogging so far that I consider really good, important or useful.

I try my best to pick up posts that are mostly not about specific MySQL bugs. These posts either caused some interesting discussions or defined my interests for a log time, or just opened new directions in my public blogging efforts. I think many of them also got less public attention than they deserve (so most popular posts are not in this list). I also try to summarize my work related memories and sentiments of the year.

Here is the list:
  • 2012. "New Year Wishes for Customers of Oracle's MySQL Technical Support Services".

    That year I considered my job roles in Oracle too boring and decided to quit and join Percona for real fun, practical experience and really good money! I also created few social media accounts and this blog, as I decided to finally share the details of my past and current work in public. I just started and published only 4 posts during that year and the one I picked up was probably created during my shift on December 31 (I still like to work on weekends and public holidays, but do this less often). Basically I wanted to encourage Oracle customers (and employees) to report MySQL bugs they found in public MySQL bugs database. Expected for the MySQL Entomologist to wish this, isn't it? I still wish the same today, even more! Unfortunately I see public MySQL bugs database is used less these days and this makes me sad.

  • 2013. "How to Drop the Trigger".

    I had a lot of work, bugs and problems to deal with in 2013, including hard to forget snow storm in March. This year I first visited UK to speak about Performance Schema at Percona Live (it was probably my worst accepted talk ever) and enjoy London and Brighton. Who could imagine the outcomes of that very first visit... I had written a lot in public that year, 42 blog posts and so many rumblings about bugs at Facebook that I annoyed many of my former colleagues. The blog post I picked up was probably my first ever "HowTo" style post in this blog and that's why it is important. It described the real life case when due to missing .TRG file DROP TRIGGER statement just failed. One had to create the file manually to proceed, and it's easy as it's a plain text file. Surely I listed some bugs I checked in the process, including those that could explain why the file was missing.

  • 2014. "On responsible bugs reporting".

    I was less active blogger in 2014, as I was kindly asked by both Oracle and Percona officials and colleagues to reduce my extraordinary bugs-related social media activity. This made me sad, but allowed to concentrate better on support and other job duties in Percona. I kept myself mostly silent after the blog post mentioned above, where I discussed what a responsible engineer should do when she finds some simple sequence of SQL statements that, when executed by authenticated user explicitly having all the privileges needed to execute these statements, crashes some version of her favorite MySQL fork. I was (and still is) against hiding this kind of crashes by Oracle or any other open source software vendor, and suggested to report them in public as I always did. Many MySQL Community members disagree with this approach even today, probably. In 2014 I've got my (first and only) talk (about MySQL bugs reporting) accepted to Oracle Open World 2014 somehow. I keep submitting multiple talks for this event every year, to no result since that time...

  • 2015. "Using gdb to understand what locks (and when) are really set by InnoDB. Part I.".

    In 2015 I started to study MySQL internals with gdb on a regular basis. The blog post I picked up for this year was the first in a series of such studies that continue until today. Initially it was as simple as setting few breakpoints and checking several lines of source code here and there, but I had not stopped at that stage. As a result of that studies, in 2015 I've presented interesting and really well accepted talks at FOSDEM and another one about InnoDB locks and deadlocks at Percona Live. Who could imagine that next time I speak at any Percona conference it will be 2019! The rest of the year was hard and sad at work, where I spent most of my time fighting for my Support team values and colleagues against the upcoming changes, new leaders and new company approaches to services. Not so many blog posts as a result, just 12.

  • 2016. "I'm Winston Wolf, I solve problems.".

    In January, 2016, my internal fights in Percona influenced content and highlights of my blog posts. One of them, my all times favorite, was aimed at explaining how Support should work and why, based on real life story with one of customer issues I resolved instantly working the way I prefer! I have to note that my feature request (Bug #76030 - "Add a way to disable rowid-ordered retrieval to optimizer_switch") that led me to the idea about the root cause is still "Open" today. Even more interesting that the issue was about MariaDB. Later I lost in all my fights for Support in Percona (fights, but not the battle, as Percona today mostly operates the way I defended back then, not the way they planned or tried to enforce) and had to quit, to end up working in MariaDB Corporation where I stay till today, and happy! In general, 2016 was my most successful year as a blogger, with up to 1000 page views per day during some months. Many of 29 blog posts published that year are worth re-reading and quoting, and became really popular, but I'd like to remind mostly about this series about MySQL Support people... Since 2016 I am less active at conferences due to my (now resolved) conflict with Percona on this topic as well, so FOSDEM was my only MySQL-related public appearance in 2016.

  • 2017. "perf Basics for MySQL Profiling".

    By 2017 I found yet another type of tools of enormous value for my daily work in Support - profilers (not only pt-pmp, but also real ones), specifically perf in case of any modern Linux. I had found out (hard way) that proper instrumentation of source code, while useful as MySQL Performance Schema (and Oracle's wait events long before it) shown, is never 100% complete and ready to use in practice, unlike proper OS level profiling tools that Linux got since kernel 2.6.x. Profilers helped me to solve real life performance problems and my findings were later summarized in this FOSDEM 2017 talk, the best one I ever made IMHO. It was a fruitful year for blogging and it was hard to pick up one post out of 32. I continued my gdb studies of various MySQL features and ended up with a long workshop on this topic presented in Sofia to few attendees.

  • 2018. "On Some Problematic Oracle MySQL Server Features".

    In 2018 I had crazy plans to go to Oracle Open World and speak there about problems with MySQL and Oracle's way of developing it. So, many of my numerous (37 in total) blog posts that year were explaining my views on these in details. I picked up the one above as it summarized the most problematic features of MySQL server itself (namely implementations of InnoDB data compression, FULLTEXT indexes, so called "online" DDL, automatic recalculation of InnoDB "persistent" statistics, and partitioning, as well as some details on how all these (does not) work well together). This was surely NOT a topic to be accepted for any MySQL conference! But posts from this series are still useful to review even today, as some of these features got zero improvements even in MySQL 8.0 (and just few are planned or done by other vendors, with MyRocks providing better data compression and with some ongoing changes to InnoDB that are implemented by MariaDB Corporation). I had to cancel my FOSDEM talk that year due to family issues, and had not made any public presentations.

  • 2019. "Dynamic Tracing of MySQL Server With perf probe - Basic Example".


    The year of 2019 was really successful for me. In May I've got "MySQL Community Contributor of The Year" award "for bug identification and submission", that is, for activity that is the main topic of this blog. Obviously, many (of 26 in total) blog posts were about bugs, bugs reporting and bugs processing, same as my FOSDEM talk in 2019. But later I stopped exploiting "MySQL bugs" topic and switched to my current interests (dynamic tracing and profiling with perf, BPF tools and bpftrace). So, I picked up one of the key blog posts that provided additional details for my Percona Live Europe 2019 talk on the topic.

    I had a lot of plans for related studies and talks for 2020, but with COVID-19 breaking everything and conferences cancelled it may happen so that this blog would be the only public platform for all my MySQL-related activities this year.
I also checked many old photos made with Nokia dumb phone while working on this post. Who known when (and if) I ever see the West Pier in Brighton again and swim nearby...
I do not plan to stop blogging any time soon. So, stay tuned for the next 200 post :)

Sunday, March 15, 2020

Fun with Bugs #95 - On MySQL Bug Reports I am Subscribed to, Part XXIX

With conferences cancelled or postponed and people forced to stay at home due to COVID-19 wide spreading, what can be better than to give my readers a new list of MySQL bugs to check? Useful reading should help! So today I continue my previous post with a quick review of bugs I've subscribed to in February, 2020, while things were still going as usual for most of us...

Here is the list of InnoDB, replication, optimizer and some other bugs to note while working with recent MySQL 5.7.x and 8.0.19 releases:
  • Bug #98473 - "group replication will be block after lock table". This problem report by phoenix Zhang was declared not a bug recently. Looks like for group_replication_consistency= BEFORE_AND_AFTER it is expected to get nodes blocked if one of them executed LOCK TABLE ... WRITE and another tried to insert some rows into that table. Check last comment by Nuno Carvalho for more details. Having multiple nodes that change or block data in clusters is always fun. See also Bug #98643 - "group replication will be block primary node shutdown" from the same bug reporter. Analysis is still in progress for it
  • Bug #98498 - "P_S view status_by_thread does not reflect correct results". This bug was reported by Oli Sennhauser (my former colleague in MySQL, founder of FromDual). Really weird results in the output.
  • Bug #98501 - "Exchanging partition with tables does not update SDI correctly". Having a data dictionary is cool and useful, but the information there should be consistent in all places/forms where it is stored. Fungo Wang found and reported cases when it is wrong in the individual .ibd files for partitions or imported tablespaces. After some arguing about backups tools supported, the bug was verified.
  • Bug #98511 - "OPTIMIZE TABLE on myisam can increase table size (~2x) and reduce performance".  This funny bug looks like a regression in MySQL 8.0.x comparing to 5.7. I doubt Oracle is going to fix anything for MyISAM, but maybe the regression still matters for them. As it often happens, this bug reported by Pete Dishman was verified without adding a regression tag.
  • Bug #98520 - "Suboptimal implementations for some load/store functions for little-endian arch". Alexey Kopytov identified some remaining performance problems in low level functions like int6store() and uint6korr() in MySQL 8.0 for platforms like x86_64 or ARM64. He request to optimize those functions for little-endian architectures by providing specialized implementations, as it was done in MySQL 8.0.x for many other similar and more widely used functions.
  • Bug #98530 - "crash when inplace encryption resumes makes tablespace unusable". This bug (with nice MTR test case) was reported by Satya Bodapati from Percona. See also his another bug report, Bug #98537 - "inplace encryption resume thread doesn't update DD properly".
  • Bug #98546 - "Transient indexes statistics are updated in foreground causing performance issue". As noted by Charly Batista, if persistent statistics is not enabled InnoDB checks if 1/16 rows of the table have been changed and if this was the case, it calls the dict_stats_update function in the foreground. Moreover, it does not only degrade performance while recalculating stats in the foreground thread, but also sets RW_X_LATCH for the table in the process to serialize all access. So, use persistent statistics, but do not let it be automatically recalculated (and here I explained why).
  • Bug #98616 - "XA PREPARE/XA COMMIT/XA ROLLBACK lost if mysql crash just after binlog flush". Yet another problem with XA transactions support in MySQL. Dennis Gao provided a patch and kindly explained that currently:
    "When mysql start-up recover, the innodb engine will only recover the transaction in prepared state, which means the undo->state of the recovered transaction must be TRX_UNDO_PREPARED (check trx_rollback_resurrected in trx0roll.cc). So if a "xa prepare" transaction only flush binlog, it will just be rollback during start-up recover and lost."
  • Bug #98624 - "Can't connect to MySQL after establishing around 65536 connections". Yet another great finding by Fungo Wang. Basically, this is a bug in the MDL subsystem, the scalability of MDL is limited to 65536 users, due to the pins number limitation (LF_PINBOX_MAX_PINS = 65536) of the LF_HASH it employs.
  • Bug #98639 - "Redundant row-level locking for secondary index". I think this simple case pointed out by Sergei Muraviev is a yet another case of excessive locking where improvement is possible. I do not agree with simple "Not a bug" status.
  • Bug #98642 - "CONSISTENT SNAPSHOT CAN BE CORRUPTED BY OTHER TRANSACTIONS". This is a really serious bug found by Erwan MAS. It took me some efforts to make sure this bug report was properly processed and verified (my comment in the bug report is hidden, but that does not matter as long as it was really treated seriously). Looks like MySQL 8.0.x may be not affected, but recent 5.7.x and 5.6.x versions are affected for sure. Take care!
  • Bug #98651 - "Inserting into temporary table from function breaks replication in MIXED mode". This regression bug in MySQL 8 (vs 5.7.29) was reported by Alexey Gavrilov, who had create a separate GitHub repository for the test case.
  • Bug #98665 - "replication broken on blackhole node if binlog_rows_query_log_events on master". This bug was reported by Zhenghu Wen. Take care if you use BLACKHOLE tables on slaves. Both MySQL 8.0.19 and 5.7.29 are affected.
  • Bug #98673 - "Allow hints to reference query block by system name". I like optimizer hints implementation in MySQL. This feature request to make them even more useful and less confusing in case of complex queries. As Kaiwang CHen put it:
    "Note that the query blocks are internally identified with a number
    (SELECT_LEX::select_number), with which system names are defined. That
    system name could be explored to refer to any query block in the query."
    I do not see a patch in the bug report, but there was a plan to provide it.
I have few more bugs in my list for February, but let's continue with them next time. We have many boring weeks ahead it seems...

Frida is always searching for something... I do the same with MySQL bugs.
To summarize:
  1. Group replication still has a lot of problems to resolve before becoming a really mature solution. Ask Galera developers :) 
  2. There are regressions in MySQL 8.0.19.
  3. Percona and Alibaba engineers still help to make MySQL better.
  4. I still do not see a consistent use of "regression" tag for verified regression bugs. This is unfortunate.
  5. InnoDB locking still needs more attention.
  6. There is too much Docker usage in the industry for my linking...