Sunday, May 19, 2019

MySQL Support Engineer's Chronicles, Issue #9

My previous post from this series was published more than 1.5 years ago. I had never planned to stop writing about my everyday work on a regular basis, but sometimes it's not easy to pick up something really interesting for wider MySQL audience and when in doubts I always prefer to write about MySQL bugs...

In any case, any long way starts from the first step, so I decided to write one post in this series per week and try to summarize in it whatever findings, questions, discussions, bugs and links I've collected over the week. My work experience differs week after week, so some of these posts may be boring or less useful, but I still want to try to create them on a regular basis.

I was working on (upcoming) blog post (inspired by one customer issue) on the impact of innodb_default_row_format setting for importing tablespaces (and related checking of the row format really used in both .frm and .ibd files) and found FSP Header description in this old post by Jeremy Cole useful for further checks in the InnoDB source code. MySQL manual is not very informative (and MariaDB KB page is just wrong/incomplete) when describing flags for the table or tablespace, unfortunately, so I've reported MDEV-19523 to get this improved.

If you ever wonder what MariaDB plans to do with InnoDB in the future, please, check MDEV-11633 among other sources.

This week we in Support got customer (on MySQL 8.0.x) complaining that they can not start server any more on Windows 10 after moving datadir to other drive. Check this blog post by my colleague Nil on the reason, explanations and way to fix/prevent this from happening. One of those cases when MySQL Forums give useful hint.

If you build MariaDB (and MySQL) from source on a regular basis (as I do), you may wonder at times how to disable some storage engine plugin at build time (for example, NOT to be affected by some temporary bugs in it when you do not really need it for testing or production use). Save this as hint:
This is what you have to add to cmake command line to prevent building Mroonga, for example. Same approach applies to TokuDB etc. See this KB page also for more details.

I never noted before that "Explain Analyzer" service exists at, but it seems some customers use it and even prefer to share its output instead of plain text EXPLAIN. Just copy/paste any EXPLAIN ...\G there and decide if the result is useful. For Support purposes and queries accessing less than 10 tables or so I'd prefer usual text output.

Yet another public service at I noted this week by pure chance is "MariaDB CI" page with buildbot status and ways to check what is building now, what failed etc. MariaDB Foundation works in a true open manner at all levels.

If you ever cares to find out what exact versions of MariaDB (or MySQL) contain specific commit you can find out using git tag --contains commit_hash command.

I still do not care about Kubernetes at all, but it seems customers start to use it in production, so here is the hint for myself on how run specific command in a running container:
kubectl exec -it <pod> --container <container> -- vi grastate.dat
I may have to write or speak about some details of MySQL and MariaDB architecture soon, so I was looking for related pictures and texts. I found useful details in the following places:
If you are interested in different storage engines and efficiency of indexing, check this blog post by Mark Callaghan

The last but not the least, I've nominated the following bugs:
  • Bug #95269 - "binlog_row_image=minimal causes assertion failure". I really wonder why this combination was missed in any regular testing of debug builds (that I hope Oracle does).
  • Bug #90681 - "MySQL 8.0 fails to install and start from Oracle .debs on debian 9 x86_64". It seems proper documentation is missing for users to know what conflicting packages to remove, what paths to clean up (if any) etc. Maybe this is no longer a concern (I do no use Oracle .deb packages, so I don't know), but in any case having this bug just "Open" helps nobody.
  • Bug #87312 - "Test main.events_time_zone is fundamentally unstable". It's even more strange to see this bug report about unstable test case "Open" for more than 2 years. Is it really hard to run MTR many times or check the code and improve, or just agree to disable it?
  • Bug #95411 - "LATERAL produces wrong results (values instead of NULLs) on 8.0.16". This regression bug in optimizer of 8.0.16 (vs 8.0.14) leads to wrong results, but so far nobody cared to verify it (even though it has simple and clear "How to repeat" instructions). This is sad.
for bug of the day on Twitter this week. I've also participated in a discussion there. As a result I ended up reading some recent MEB 8.0 manual pages (and more here). MySQL Enterprise Backup really provides a lot of potentially useful options that mariabackup may benefit from one day...

I spent first two weeks of May properly last year, on vacation in UK. Battersea Park here.
That's more or less all I had written down for further review this week that I am ready top share. Stay tuned for what may come up next week!

Wednesday, May 1, 2019

Fun with Bugs #85 - On MySQL Bug Reports I am Subscribed to, Part XX

We have a public holiday here today and it's raining outside for a third day in a row already, so I hardly have anything better to do than writing yet another review of public MySQL bug reports that I've subscribed to recently.

Not sure if these reviews are really considered useful by anyone but few of my readers, but I am still going to try in a hope to end up with some useful conclusions. Last time I've stopped on Bug #94903, so let me continue with the next bug in my list:
  • Bug #94912 - "O_DIRECT_NO_FSYNC possible write hole". In this bug report Janet Campbell shared some concerns related to the way O_DIRECT_NO_FSYNC (and O_DIRECT) settings for innodb_flush_method work. Check comments, including those by Sunny Bains, where he agrees that "...this will cause problems where the redo and data are on separate devices.". Useful reading for anyone interested in InnoDB internals or using  innodb_dedicated_server setting in MySQL 8.0.14+.
  • Bug #94971 - "Incorrect key file error during log apply table stage in online DDL". Monty Solomon reported yet another case when "online' ALTER for InnoDB table fails in a weird way. The bug is still "Open" and there is no clear test case to just copy/paste, but both the problem and potential solutions (make sure you have "big enough" innodb_online_alter_log_max_size or better use pt-online-schema-change or gh-ost tools) were already discussed here.
  • Bug #94973 - "Wrong result with subquery in where clause and order by". Yet another wrong results bug with subquery on MySQL 5.7.25 was reported by Andreas Kohlbecker. We can only guess if MySQL 8 is also affected (MariaDB 10.3.7 is not, based on my test results shared below) as Oracle engineer who verified the bug had NOT card to check or share the results of this check. What can be easier than running this (a bit modified) test case on every MySQL major version and copy pasting the results:
    MariaDB [test]> CREATE TABLE `ReferenceB` (
        ->   `id` int(11) NOT NULL,
        ->   `bitField` bit(1) NOT NULL,
        ->   `refType` varchar(255) NOT NULL,
        ->   `externalLink` longtext,
        ->   PRIMARY KEY (`id`)
    Query OK, 0 rows affected (0.170 sec)

    MariaDB [test]> INSERT INTO ReferenceB (id, bitField, refType, externalLink) VALUES(1, 0, 'JOU', NULL);
    Query OK, 1 row affected (0.027 sec)

    MariaDB [test]> INSERT INTO ReferenceB (id, bitField, refType, externalLink) VALUES(2, 0, 'JOU', NULL);
    Query OK, 1 row affected (0.002 sec)

    MariaDB [test]> SELECT hex(bitField) from ReferenceB  where id in (select id as
    y0_ from ReferenceB  where refType='JOU') order by externalLink asc;
    | hex(bitField) |
    | 0             |
    | 0             |
    2 rows in set (0.028 sec)
    But we do not see anything like that in the bug report... This is sad.
  • Bug #94994 - "Memory leak detect on temptable storage engine". Yet another memory leak (found with ASan) reported by Zhao Jianwei, who had also suggested a patch.
  • Bug #95008 - "applying binary log doesn't work with blackhole engine tables". This bug was reported by Thomas Benkert. It seems there is a problem to apply row-based events to BLACKHOLE table and this prevents some nice recovery tricks from working.
  • Bug #95020 - "select no rows return but check profile process Creating sort index". Interesting finding from cui jacky. I can reproduce this with MariaDB as well. It seems we either have to define some new stage or define "Creating sort index" better than in the current manual. This:
    The thread is processing a SELECT that is resolved using an internal temporary table.
    is plain wrong in the case shown in the bug report IMHO.
  • Bug #95040 - "Duplicately remove locks from lock_sys->prdt_page_hash in btr_compress". One of those rare cases when Zhai Weixiang does not provide the patch, just suggests the fix based on code review :)
  • Bug #95045 - "Data Truncation error occurred on a write of column 0Data was 0 bytes long and". This really weird regression bug in MySQL 8.0.14+ was reported by Adarshdeep Cheema. MariaDB 10.3 is surely not affected.
  • Bug #95049 - "Modified rows are not locked after rolling back to savepoint". Bug reporter, John Lin, found that fine MySQL manual does not describe the real current implementation. Surprise!
  • Bug #95058 - "Index not used for column with IS TRUE or IS FALSE operators". Take extra care when using BOOLEAN columns in MySQL. As it was noted by Monty Solomon, proper index is NOT used when you try to check BOOLEAN values as manual suggests, using IS TRUE or IS FALSE conditions. Roy Lyseng explained how such queries are threated internally, but surely there is a better way. MariaDB 10.3.7 is also affected, unfortunately.
  • Bug #95064 - "slave server may has gaps in Executed_Gtid_Set when a special case happen ". Nice bug report from yoga yoga, who had also contributed a patch. Parallel slave can easily get out of sync with master in case of lock wait timeout and failed retries. Again, we do NOT see any check if MySQL 8 is affected, unfortunately.
  • Bug #95065 - "Strange memory management when using full-text indexes". We all know that InnoDB FULLTEXT indexes implementation is far from perfect. Now, thanks to Yura Sorokin, we know also about a verified memory leak bug there that may lead to OOM killing of MySQL server.
  • Bug #95070 - "INSERT .. VALUES ( .., (SELECT ..), ..) takes shared lock with READ-COMMITTED". Seunguck Lee found yet another case of InnoDB locking behavior that MySQL manual does not explain. The bug is still "Open" for some reason.
  • Bug #95115 - "mysqld deadlock of all client threads originating from 3-way deadlock". It took some efforts for bug reporter, Sandeep Dube, and other community users (mostly Jacek Cencek) to attract proper attention to this bug from proper Oracle developer, Dmitry Lenev, until it ended up "Verified" based on code review. We still can not be sure if MySQL 8 is also affected.
That's all for now. I have few more new bug reports that I monitor, but I do not plan to continue with this kind of reviews in upcoming few months in this blog. I hope I'll get a reason soon to write different kind of posts, with more in depth study of various topics...

In any case you may follow me on Twitter for anything related to recent interesting or wrongly handled MySQL bug reports.

This view of Chelsea from our apartment at Chelsea Cloisters reminds me that last year I spent spring holiday season properly - no time was devoted to MySQL bugs :)
To summarize:
  1. Do not use O_DIRECT_NO_FSYNC value for innodb_flush_method if your redo logs are located on different device than your data files. Just don't.
  2. Some Oracle engineers who process bugs still do not care to check if all supported major versions are affected and/or share the results of such checks in public.
  3. There are still many details of InnoDB locking to study, document properly and maybe fix.
  4. I am really concerned with the state of MySQL optimizer. We see all kinds of weird bugs (including regressions) and very few fixes in each maintenance release.

Saturday, April 27, 2019

Fun with Bugs #84 - On Some Public Bugs Fixed in MySQL 5.7.26

Oracle released minor MySQL Server versions in all supported branches on April 25, 2019. MySQL 5.7.26 is just one of them, but recently I prefer to ignore MySQL 8 releases (after checking that I can build them from source code at least somewhere, even if it takes 18G+ of disk space and that they work in basic tests), as there are more chances for MySQL 5.7 bug fixes to affect me (and customers I care about) directly.

So, in this yet another boring blog post (that would never be a reason for any award) I plan to concentrate on bugs reported in public MySQL bugs database and fixed in MySQL 5.7.26. As usual I name bug reporters explicitly and give links to their remaining currently active bug reports, if any. This time the list is short enough, so I do not even split it by categories:
  • Bug #93164 - "Memory leak in innochecksum utility detected by ASan". This bug was reported by Yura Sorokin from Percona, who also had contributed a patch (for some reason this is not mentioned in the official release notes).
  • Bug #90402 - "innodb async io error handling in io_event". Wei Zhao found yet another case when wrong data type was used in the code and I/O error was not handled, and this could lead even to crashes. He had submitted a patch.
  • Bug #89126 - "create table panic on innobase_parse_hint_from_comment". Nice bug report with a patch from Yan Huang. Note also detailed analysis and test case provided by Marcelo Altmann in the comment. It's a great example of cooperation of all sides: Oracle MySQL developers, bugs verification team, bug reporter and other community users.
  • Bug #92241 - "alter partitioned table add auto_increment diff result depending on algorithm". Yet another great finding from Shane Bester himself!
  • Bug #94247 - "Contribution: Fix fractional timeout values used with WAIT_FOR_EXECUTED_GTI ...". This bug report was created based on pull request from Dirkjan Bussink, who had suggested a patch to fix the problem. Note the comment from Shlomi Noach that refers to Bug #94311 (still private).
  • Bug #85158 - "heartbeats/fakerotate cause a forced sync_master_info". Note MTR test case contributed by Sveta Smirnova and code analysis in a comment from Vlad Lesin (both from Percona at that time) in this bug report from Trey Raymond.
  • Bug #92690 - "Group Replication split brain with faulty network". I do not care about group replication (I have enough Galera in my life instead), but I could not skip this report by Przemyslaw Malkowski from Percona, with detailed steps on how to reproduce. Note comments from other community members. Yet another case to show that good bug reports attract community feedback and are fixed relatively fast.
  • Bug #93750 - "Escaping of column names for GRANT statements does not persist in binary logs". Clear and simple bug report from Andrii Ustymenko. I wonder why it was not found by internal testing/QA. Quick test shows that MariaDB 10.3.7, for example, is not affected:
    c:\Program Files\MariaDB 10.3\bin>mysql -uroot -proot -P3316 test
    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 9
    Server version: 10.3.7-MariaDB-log binary distribution

    Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    MariaDB [test]> create table t_from(id int primary key, `from` int, c1 int);
    Query OK, 0 rows affected (0.582 sec)

    MariaDB [test]> create user 'user01'@'%' identified by 'user01';
    Query OK, 0 rows affected (0.003 sec)

    MariaDB [test]> grant select (`id`,`from`) on `test`.`t_from` to 'user01'@'%';
    Query OK, 0 rows affected (0.054 sec)

    MariaDB [test]> show master status;
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
    | pc-PC-bin.000007 |      852 |              |                  |
    1 row in set (0.030 sec)

    MariaDB [test]> show binlog events in 'pc-PC-bin.000007';
    | Log_name         | Pos | Event_type        | Server_id | End_log_pos | Info
    | pc-PC-bin.000007 |   4 | Format_desc       |         1 |         256 | Server
    ver: 10.3.7-MariaDB-log, Binlog ver: 4                             |
    | pc-PC-bin.000007 | 256 | Gtid_list         |         1 |         299 | [0-1-42
    ]                                                                  |
    | pc-PC-bin.000007 | 299 | Binlog_checkpoint |         1 |         342 | pc-PC-b
    in.000007                                                          |
    | pc-PC-bin.000007 | 708 | Query             |         1 |         852 | use `te
    st`; grant select (`id`,`from`) on `test`.`t_from` to 'user01'@'%' |

    9 rows in set (0.123 sec)
  • Bug #73936 - "If the storage engine supports RBR, unsafe SQL statementes end up in binlog". Nice bug report with MTR test case by Santosh Praneeth Banda. Note that last comment about the fix mentions only MySQL 8.0.15, not a single work about the fix in MySQL 5.7.26 (or anything about MySQL 5.6.x while the bug was reported for 5.6).
  • Bug #93341 - "Check for tirpc needs improvement". The need for improvement of CMake check was noted by Terje Røsten.
  • Bug #91803 - "mysqladmin shutdown does not wait for MySQL to shut down anymore". This regression bug (without a "regression" tag) was reported by Christian Roser.
  • Bug #91541 - ""Flush status" statement adds twice to global values ". Yura Sorokin contributed a detailed anlysis, MTR test case and a patch in this bug reported by Carlos Tutte.
  • Bug #90351 - "GLOBAL STATUS variables drift after rollback". Zsolt Parragi contibuted a patch to this bug found and reported by Iwo P. For some reason this contribution is not highlighted in the release notes.
  • Bug #81441 - "Warning about localhost when using skip-name-resolve". One of many bug reports from Monty Solomon in which he (and other community members like Jean-François Gagné) had to spend a lot of efforts and fight with a member of bugs verification team to get the bug accepted as a real code bug and then get it fixed in all versions affected.
  • Bug #90902 - "Select Query With Complex Joins Leaks File Handles". This bug was reported by James Wilson. I still wonder if MySQL 5.6 was affected. Bug reports says nothing about this (while I expect all supported GA versions to be checked when the bug is verified, and the results of such check clearly documented).
The future looks bright for MySQL 5.7
To summarize:
  1. Consider upgrade to 5.7.26 if you use complex joins, partitioned tables with auto_increment columns or rely on InnoDB or replication a lot.
  2. It's good to see crashing bugs that do not end up as hidden/"security", maybe because they are reported with patches...
  3. It's good to see examples of cooperation of several community users contributing to the same bug report!
  4. Percona engineers contribute a lot to MySQL, both in form of bug reports, patches and by helping other community users to make their point and get their bugs fixed fast.
  5. There are still things to improve in a way Oracle egnineers handle bugs verification, IMHO.
  6.  It's also a bit strange to see only one optimizer-related fix in this release. It means that either MySQL optimizer is already near perfect and there are no bugs to fix (check yourself, but I see 123 bugs here), or that nobody cares that much about MySQL optimizer in 5.7 these days.
  7. It seems for some bugs fixed in previous MySQL 8.0.x minor release there is no extra check/updates in public comments about the versions with the fix when it is released in MySQL 5.6 or 5.7.

Sunday, April 14, 2019

Fun with Bugs #83 - On MySQL Bug Reports I am Subscribed to, Part XIX

I have not much yet to say on a popular topic of upgrading everything to MySQL 8, so let me just continue reviewing public MySQL bug reports that I've subscribed to recently. After my previous post at least one bug, Bug #94747, got enough comments and clarifications (up to specific commit that introduced this regression pointed out by Daniel Black!) to have it re-classified and verified as InnoDB code bug. So, I see good reasons to continue attracting wide public attention to selected MySQL bugs - this helps to make MySQL better eventually.

As usual, I start from the oldest bug reports:
  • Bug #94758 - "record with REC_INFO_MIN_REC_FLAG is not the min record on non-leaf page". It was reported by a well known person, Zhai Weixiang, who contributed a lot to MySQL code and quality. This time he added a function to the code to prove his point and show that data may be stored in an unexpected order on the root node of InnoDB table. For this very reason (Oracle's code modified to show the problem) this report was marked as "Not a Bug". This is weird, one may prove the point by checking memory with gdb if needed (or maybe by checking data pages on disk as well), without any code modifications.
  • Bug #94775 - "Innodb_row_lock_current_waits status variable incorrect values on idle server". If you read this bug report by Uday Sitaram you can find out a statement that some status variables, like Innodb_row_lock_current_waits, are designed to be "fuzzy", so no matter what value you may see it's probably not a bug. Very enlightening!
  • Bug #94777 - "Question about the redo log write_ahead_buffer". One may argue that public bugs database is not a proper place to ask questions, but in this case Chen Zongzhi actually proved that MySQL 8.0 works better and started up some discussion that reveal probably a real bug (see comments starting from this one, "[5 Apr 15:59] Inaam Rana "). So, even if "Not a Bug" status is correct for the original finding, it seems there is something to study and we have a hope this study happens elsewhere (although I'd prefer to see this or new public bug report for this "Verified").
  • Bug #94797 - "Auto_increment values may decrease when adding a generated column". I can not reproduce this problem reported by Fengchun Hua with MariaDB 10.1.x. My related comments in the bug remain hidden and I've already agreed not to make any such comments in the bugs database. So, for now we have a "Verified" bug in MySQL 5.7.
  • Bug #94800 - "Lost connection (for Debug version) or wrong result (for release version)". According to my tests, MariaDB 10.3.7 is not affected by this bug reported by Weidong Yu, who had also suggested a fix. See also his Bug #94802 - "The behavior between insert stmt and "prepare stmt and execute stmt" different ". (MariaDB 10.3.7 is also not affected).
  • Bug #94803 - "rpl sql_thread may broken due to XAER_RMFAIL error for unfinished xa transaction". This bug reported by Dennis Gao is verified based on code review, but we still do not know if any major version besides 5.7 is affected.
  • Bug #94814 - "slave replication lock wait timeout because of wrong trx order in binlog file". Yet another case when XA transactions may break replication was found by Zhenghu Wen. The bug is still "Open" and I am really interested to see it properly processed soon.
  • Bug #94816 - "Alter table results in foreign key error that appears to drop referenced table". From reading this report I conclude that MySQL 5.7.25 (and Percona Server 5.7.25-28, for that matter) is affected (src table disappears) and this was verified, but still the bug ends up as "Can't repeat" (?) with a statement that there is a fix in MySQL 8.0 that can not be back ported. This is rally weird, as we have plenty of bugs NOT affecting 8.0 but verified as valid 5.7.x bugs. Moreover, I've verified that in case of MySQL 8.0.x ref table just can not be created:
    mysql> create table ref (
        -> a_id int unsigned not null,
        -> b_id int unsigned not null,
        -> constraint FK_ref_a_b foreign key (b_id,a_id) references src (b_id,a_id)
        -> ) engine=InnoDB;
    ERROR 1822 (HY000): Failed to add the foreign key constraint. Missing index for constraint 'FK_ref_a_b' in the referenced table 'src'
    But it means the test case does not apply to 8.0 "as is", that MySQL 8.0 is not affected, but from the above it's not obvious if there is a fix to back port at all. As a next step I tried essentially the same test case on MariaDB 10.3 and ended up with a crash that I've reported as MDEV-19250. So, this bug report that was not even accepted by Oracle MySQL team ended up as a source of a useful check and bug report for MariaDB.
  • Bug #94835 - "debug-assert while restarting server post install component". This is a classical Percona style bug report from Krunal Bauskar. Percona engineers carefully work on debug builds and find many unique new bugs that way.
  • Bug #94850 - "Not able to import partitioned tablespace older than 8.0.14". This regression bug (for cases when lower_case_table_names=1) was reported by Sean Ren.
  • Bug #94858 - "Deletion count incorrect when rows deleted through multi-hop foreign keys". I've checked that MariaDB 10.3 is also affected by this bug reported by Sawyer Knoblich.
  • Bug #94862 - "MySQL optimizer scan full index for max() on indexed column." Nice bug report from Seunguck Lee. As one can easily check MariaDB is not affected:
    MariaDB [test]> explain select max(fd2) from test;
    | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
    |    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
    1 row in set (0,001 sec)

    MariaDB [test]> explain select get_timestamp(max(fd2)) from test;
    | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
    |    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
    1 row in set (0,001 sec)

    MariaDB [test]> select version();
    | version()       |
    | 10.3.14-MariaDB |
    1 row in set (0,000 sec)
  • Bug #94881 - "slave replication lock wait timeout because of supremum record". I fail to understand why this bug report from Zhenghu Wen ended up as "Closed". There is a detailed enough code analsys, but no test case to just copy/paste. The problem happens only with XA transactions and it's not clear if recent MySQL 5.7.25 is also affected. It means the bug can be in "Need Feedback" or even "Can't Repeat", but I see zero reasons to close it at the moment. Looks very wrong to me.
  • Bug #94903 - "optimizer chooses inefficient plan for order by + limit in subquery". It seems recently a lot of efforts from both bug reporter (Василий Лукьянчиков in this case) and even Oracle developer (Guilhem Bichot in this case) may be needed to force proper processing of the real bug.
It may take more than one dram of a good single malt to keep up with recent style of MySQL bugs processing...
* * *
To summarize:
  1. Attracting public attention of MySQL community users (via blog posts in this series or by any other means) to some MySQL bugs still helps to get them processed properly.
  2. Oracle MySQL engineers who work on bugs continue to refuse further processing of some valid bug reports based on formal and not entirely correct assumptions. In some cases I clearly miss checks for possible regressions vs older versions.
  3. As I already stated, Oracle does not seem to care much about bugs in XA transactions and possible replication problems they may cause.
  4. I encourage community users to share their findings and concerns in public MySQL bugs database. Even if they end up as "Not a Bug", they may still start useful discussions and fixes.
  5. By the way, my comment about the related discussion in MariaDB MDEV-15641 is still private in Bug #94610. This is unfortunate.

Thursday, April 4, 2019

Fun with Bugs #82 - On MySQL Bug Reports I am Subscribed to, Part XVIII

I've got few comments to my post on references to MariaDB in MySQL bug reports (not in the blog, but via social media and in personal messages), and all but one comments from current and former colleagues whose opinion I value a lot confirmed that this really looks like a kind of attempt to advertise MariaDB. So, from now on I'll try to keep my findings on how tests shared by MySQL bug reporters work in MariaDB for myself, MariaDB JIRA and this blog (where I can and will advertise whatever makes sense to me), and avoid adding them to MySQL bug reports.

That said, I still think that it's normal to share links to MariaDB bug reports that add something useful (like patches, explanations or better test cases), and I keep insisting that this kind of feedback should not be hidden. Yes, I want to mention Bug #94610 (and related MDEV-15641) again, as a clear example of censorship that is not reasonable and should not be tolerated.

In the meantime, since my previous post in this series I've subscribed to 30 or so new MySQL bug reports. Some of them are listed below, started from the oldest. This time I am not going to exclude "inactive" reports that were not accepted by Oracle MySQL engineers as valid:
  • Bug #94629 - "no variable can skip a single channel error in mysql replication". This is a request to add support for per-channel options to skip N transactions or specific errors. It is not accepted ("Not a Bug") just because one can stop replication on all channels and start on one to skip transaction(s) there, then resume replication for all channels. Do you really think this is a right and only way to process such a report?
  • Bug #94647 - "Memory leak in MEMORY table by glibc". This is also not a bug because one ca use something like malloc-lib=jemalloc with mysqld_safe or Environment="LD_PRELOAD=/path/to/jemalloc" with systemd services. There might be some cost related to that in older versions... Note that similar MDEV-14050 is still open.
  • Bug #94655 - "Some GIS function do not use spatial index anymore". yet another regression vs MySQL 5.7 reported by Cedric Tabin. It ended up verified as feature request without a regression tag...
  • Bug #94664 - "Binlog related deadlock leads to all incoming connection choked.". This report from Yanmin Qiao ended up as a duplicate of  Bug #92108 - "Deadlock by concurrent show binlogs, pfs session_variables table & binlog purge" (fixed in MySQL 5.7.25+, thanks Sveta Smirnova for the hint). See also Bug #91941.
  • Bug #94665 - "enabling undo-tablespace encryption doesn't mark tablespace encryption flag". Nice finding by Krunal Bauskar from Percona.
  • Bug #94699 - "Mysql deadlock and bugcheck on aarch64 under stress test". Bug report with a patch contributed by Cai Yibo. The fix is included in upcoming MySQL 8.0.17 and the bug is already closed.
  • Bug #94709 - "Regression behavior for full text index". This regression was reported by Carlos Tutte and properly verified (with regression tag added and all versions checked) by Umesh Shastry. See also detailed analysis of possible reason in the comment from Nikolai Ikhalainen.
  • Bug #94723 - "Incorrect simple query result with func result and FROM table column in where". Michal Vrabel found this interesting case when MySQL 8.0.215 returns wrong results. I've checked the test case on MariaDB 10.3.7 and it is not affected. Feel free to consider this check and statement my lame attempt to advertise MariaDB. I don't mind.
  • Bug #94730 - "Kill slave may cause start slave to report an error.". This bug was declared a duplicate of a nice Bug #93397 - "Replication does not start if restart MySQL after init without start slave." reported by Jean-François Gagné earlier. Both bugs were reported for MySQL 5.7.x, but I do not see any public attempt to verify if MySQL 5.6 or 8.0 is also affected. In the past it was required to check/verify bug on all GA versions supported if the test case applies. Nowadays this approach is not followed way too often, even when bug reporter cared enough to provide MTR test case.
  • Bug #94737 - "MySQL uses composite hash index when not possible and returns wrong result". Yet another optimizer bug was reported by Simon Banaan. Again, MariaDB 10.3.7 is NOT affected. I can freely and happily state this here if it's inappropriate to state so in the bug report itself. By the way, other MySQL versions were probably not checked. Also, unlike Oracle engineer who verified the bug, I do not hesitate to copy/paste the entire results of my testing here:
    MariaDB [test]> show create table tmp_projectdays_4\G*************************** 1. row ***************************
           Table: tmp_projectdays_4
    Create Table: CREATE TABLE `tmp_projectdays_4` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `project` int(11) NOT NULL,
      `datum` date NOT NULL,
      `voorkomen` tinyint(1) NOT NULL DEFAULT 1,
      `tijden` tinyint(1) NOT NULL DEFAULT 0,
      `personeel` tinyint(1) NOT NULL DEFAULT 0,
      `transport` tinyint(1) NOT NULL DEFAULT 0,
      `materiaal` tinyint(1) NOT NULL DEFAULT 0,
      `materiaaluit` tinyint(1) NOT NULL DEFAULT 0,
      `materiaalin` tinyint(1) NOT NULL DEFAULT 0,
      `voertuigen` varchar(1024) DEFAULT '',
      `medewerkers` varchar(1024) DEFAULT '',
      `personeel_nodig` int(11) DEFAULT 0,
      `personeel_gepland` int(11) DEFAULT 0,
      `voertuigen_nodig` int(11) DEFAULT 0,
      `voertuigen_gepland` int(11) DEFAULT 0,
      `created` datetime DEFAULT NULL,
      `modified` datetime DEFAULT NULL,
      `creator` int(11) DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `project` (`project`,`datum`) USING HASH
    1 row in set (0.001 sec)

    MariaDB [test]> explain SELECT COUNT(1) FROM `tmp_projectdays_4` WHERE `project`

    | id   | select_type | table             | type | possible_keys | key  | key_len
     | ref  | rows | Extra       |
    |    1 | SIMPLE      | tmp_projectdays_4 | ALL  | project       | NULL | NULL
     | NULL | 2544 | Using where |
    1 row in set (0.004 sec)

    MariaDB [test]> SELECT COUNT(1) FROM `tmp_projectdays_4` WHERE `project` IN(1540

    | COUNT(1) |
    |     2544 |
    1 row in set (0.025 sec)

    MariaDB [test]> select version();
    | version()          |
    | 10.3.7-MariaDB-log |
    1 row in set (0.021 sec)
    When the job was done properly I see no reasons NOT to share the results.
  • Bug #94747 - "4GB Limit on large_pages shared memory set-up". My former colleague Nikolai Ikhalainen from Percona noted this nice undocumented "feature" (Had I forgotten to advertise Percona recently? Sorry about that...) He proved with a C program that one can create shared memory segments on Linux large than 4GB, one just had to use proper data type, unsigned long integer, in MySQL's code. Still, this report ended up as non-critical bug in "MySQL Server: Documentation" category, or even maybe a feature request internally. What a shame!
Spring in Paris is nice, as this photo made 3 years ago proves. The way MySQL bug reports are handled this spring is not any nice in some cases.
To summarize:
  1. It seems recently the fact that there is some limited workaround already published somewhere is a good enough reason NOT to accept valid feature request. Noted.
  2. Regression bugs (reports about drop in performance or problem that had not happened with older version but happens with some recent) are still not marked with regression tag sometimes. Moreover, clear performance regressions in MySQL 8.0.x vs MySQL 5.7.x may end up as just feature requests... A request to "Make MySQL Great Again" maybe?
  3. MySQL engineers who verify bugs often do not care to check all major versions and/or share the results of their tests. This is unfortunate.
  4. Some bugs are not classified properly upon verification. The fact that wrong data type is used is anything but severity 3 documentation problem, really.

Saturday, March 30, 2019

On References to MariaDB and MariaDB Bugs (MDEVs) in MySQL Bug Reports

Recently I noted that some of my comments to public MySQL bug reports got hidden by somebody from Oracle with privileges to do so. I was not able to find out who did that and when, as this information is not communicated to bug subscribers (this may change if my feature requests, Bug #94807 - "Subscriber should be notified when comment is made private", is eventually implemented).

When it happened for the first time I thought it was probably non-intentional. When it happened for a second time I complained with a tweet that got few likes and zero comments. Recently this happened again and yet another tweet had not got much attention, but at least I've got a comment via Bug #94797 that my comment there (where I checked test case on MariaDB version I had at hand to find out it's not affected, something I often do for bugs mentioned in my blog posts here) was hidden as irrelevant and "an attempt to advertise MariaDB".

Snow hides everything, good and bad, dog shit, holes in the road and autumn flowers... Do we really want information provided in comments to public MySQL bugs got hidden just because someone once decided it's "bad"?
I really wonder if any of my readers think that I advertise MariaDB with my public posts or public comments anywhere or specifically in MySQL bug reports?

I'd also like to share here, where no one besides me can hide or delete comments (I hope), what was hidden in the case that caused me to tweet about censorship I have to deal with. In Bug #94610 - "Server stalls because ALTER TABLE on partitioned table holds dict mutex" that ended up as "Not a Bug" (not even a duplicate of a verified Bug #83435 - "ALTER TABLE is very slow when using PARTITIONED table" it referred to and extended with a global mutex usage highlighted and impact explained), I've added the following comment:
"[12 Mar 7:30] Valeriy Kravchuk
Not only it stalls, but if it stalls for long enough time it will crash :)

Useful related reading is here:
The comment was hidden very soon. Now, if you check that link, you'll see confirmed, unresolved MariaDB bug report. I mostly had this comment to the MDEV-15641 in mind, were my colleague and well known InnoDB developer Marko Mäkelä stated:
"The row_log_table_apply() is actually invoked while holding both dict_sys->mutex and dict_operation_lock. If there is a lot of log to apply, this may actually cause InnoDB to crash."
I may be mistaking in linking these two bug reports together, but isn't highlighting the possibility of crash due to long semaphore wait important to understand the impact of the bug report and triage it properly? What wrong MySQL users and bug report readers may see if they follow the link to MariaDB bug I considered relevant? What was advertised by this comment that is harmful or useless for MySQL Community?

I was even more surprised by these recent actions on my comments because in the past I had never noted similar approach. Check the following bug reports, for example (I searched for those with "MDEV" and "Kravchuk" in them to get these):
  • Bug #80919 - "MySQL Crashes when Droping Indexes - Long semaphore wait". In this bug report (real bug fixed in 5.7.22) I've added a comment that refers to MDEV-14637. The comment still remains public and, IMHO, is still useful. Providing this link helped to get proper attention to the bug, so it was re-opened and got comments from Oracle engineers finally. Was it an attempt to advertise MariaDB? How this case is different from my comment in Bug #94610 quoted above? 
  • Bug #84185 - "Not all "Statements writing to a table with an auto-increment..." are unsafe". I reported this "upstream" MySQL bug based on MDEV-10170 - "Misleading "Statements writing to a table with an auto-increment column after selecting from another table are unsafe" on DELETE ... SELECT", previously found by my colleague Hartmut Holzgraefe. I've also added link to the "upstream" MySQL bug report to that MDEV. Does anybody in MySQL or MariaDB user communities think that such cross-references are useless, harmful or may be considered as and "attempt to advertise competitor" if any of vendors fixes the bug first?
  • Bug #48392 - "mysql_upgrade improperly escapes passwords with single quotes". I verified this bug in 2009 while working for MySQL at Sun, and it still remains "Verified" (I had not re-checked if it's still repeatable with current MySQL versions). Then in 2013 community user added a comment referring to the MariaDB bug, MDEV-4664 - "mysql_upgrade crashes if root's password contains an apostrophe/single quotation mark" that was fixed later, in 2015. This comment still remains public and is useful!
So, had my comments that mention MDEVs or MariaDB in general became so irrelevant and MariaDB advertising recently comparing to the previous ones? What exact community standards or rules they break? Is it now forbidden to any user of MySQL bugs database to mention MariaDB or bugs in it, use MariaDB in tests to make some point and share the results in public in MySQL bugs database, or the problem is with me personally doing this?

I'd be happy to read explanations or opinions from MySQL community users and my former Oracle colleagues in comments to this blog post.

Saturday, March 9, 2019

Fun with Bugs #81 - On MySQL Bug Reports I am Subscribed to, Part XVII

Two weeks passed since my previous review of public MySQL bug reports I consider interesting enough to subscribe to them. Over this period I picked up a dozen or so new public bug reports that I'd like to briefly review today.

Here is my recent subscriptions list, starting from the oldest bug reports:
  • Bug #94431 - "Can't upgrade from 5.7 to 8.0 if any database have a hyphen in their name". It seems one actually needs a database like that created in MySQL 5.6 with at least one InnoDB table having FULLTEXT index to hit the problem. Great finding by Phil Murray. Note that after several unsuccessful attempts by others the bug was eventually reproduced and verified by Jesper Wisborg Krogh. Let's hope we'll see it fixed in MySQL 8.0.16.
  • Bug #94435 - "mysql command hangs up and cosume CPU almost 100%". It was reported by Masaaki HIROSE, whose previous related/similar Bug #94219 - "libmysqlclient enters and infinite loop and consume CPU usage 100%" ended up as "Not a bug" (wrongly, IMHO, as nobody cared enough to reproduce the steps instead of commenting on their correctness and checking something else). Bug reporter had not only insisted and provided all the details, but also tried to analyze the reasons of the bug and provided links to other potentially related bug reports (Bug #88428 - "mysql_real_query hangs with EINTR errno (using YASSL)" and Bug #92394 - "libmysqlclient enters infinite loop after signal (race condition)"). Great job and nice to see the bug "Verified" eventually.
  • Bug #94441 - "empty ibuf aio reads in innodb status". This regression vs MySQL 5.6 was noted by Nikolai Ikhalainen from Percona. MariaDB 10.3.7 is also affected, unfortunately:
    I/O thread 9 state: native aio handle (write thread)
    Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
     ibuf aio reads:, log i/o's:, sync i/o's:Pending flushes (fsync) log: 0; buffer pool: 0
    1344 OS file reads, 133 OS file writes, 2 OS fsyncs
  • Bug #94448 - "Rewrite LOG_BLOCK_FIRST_REC_GROUP during recovery may be dangerous.". Yet another MySQL 8 regression (not marked with "regression" tag) was found by Kang Wang.
  • Bug #94476 - "mysql semisync replication stuck with master in Waiting to finalize termination". It has "Need feedback" status at the moment. I've subscribed to this report from Shirish Keshava Murthy mostly to find out how a report that may look like a free support request will be processed by Oracle engineers. Pure curiosity, for now.
  • Bug #94504 - "AIO::s_log seems useless". This problem was reported by Yuhui Wang. It's a regression in a sense that part of the code is no longer needed (and seems not to be used) in MySQL 8, but still remains.
  • Bug #94541 - "Assertion on import via Transportable Tablespace". This bug reported by  Daniël van Eeden was verified based on code review and some internal discussion. We do not know if any other version besides 5.7.25 is affected, though. The assertion itself:
    InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->
    does not seem to be unique. We can find it in MDEV-18455 also (in other context).
  • Bug #94543 - "MySQL does not compile with protobuf 3.7.0". I care about build/compiling bugs historically, as I mostly use MySQL binaries that I built myself from GitHub source. So, I've immediately subscribed to this bug report from Laurynas Biveinis.
  • Bug #94548 - "Optimizer error evaluating JSON_Extract". This bug was reported by Dave Pullin. From my quick test it seems MariaDB 10.3.7 is also affected. Error message is different in the failing case, but the point is the same - the function is not evaluated if the column from derived table that is built using the function is not referenced in the SELECT list. This optimization is questionable and may lead to hidden "bombs" in the application code.
  • Bug #94550 - "generated columns referring to current_timestamp fail". I tried to check simple test case in this bug report by Mario Beck on MariaDB 10.3.7, but it does not seem to accept NOT NULL constraint for generated stored columns at all:
    MariaDB [test]> CREATE TABLE `t2` (
        ->   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        ->   `content` varchar(42) DEFAULT NULL,
        ->   `bucket` tinyint(4) GENERATED ALWAYS AS ((floor((to_seconds(`created_at
    `) / 10)) % 3)) STORED NOT NULL);
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that
    corresponds to your MariaDB server version for the right syntax to use near 'NOT
     NULL)' at line 4
    I do not see this option in formal syntax described here as well. But in case of MariaDB we can actually make sure the generated column is never NULL by adding CHECK constraint like this:
    MariaDB [test]> CREATE TABLE `t2` (    ->   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        ->   `content` varchar(42) DEFAULT NULL,
        ->   `bucket` tinyint(4) GENERATED ALWAYS AS ((floor((to_seconds(`created_at`) / 10)) % 3)) STORED);
    Query OK, 0 rows affected (0.434 sec)

    MariaDB [test]> INSERT INTO t2 (content) VALUES ("taraaaa");
    Query OK, 1 row affected (0.070 sec)

    MariaDB [test]> alter table t2 add constraint cnn CHECK (`bucket` is NOT NULL);
    Query OK, 1 row affected (1.159 sec)
    Records: 1  Duplicates: 0  Warnings: 0

    MariaDB [test]> INSERT INTO t2 (content) VALUES ("tarabbb");
    Query OK, 1 row affected (0.029 sec)

    MariaDB [test]> INSERT INTO t2 (content) VALUES ("");
    Query OK, 1 row affected (0.043 sec)

    MariaDB [test]> select * from t2;
    | created_at          | content | bucket |
    | 2019-03-09 17:28:03 | taraaaa |      0 |
    | 2019-03-09 17:29:43 | tarabbb |      1 |
    | 2019-03-09 17:29:50 |         |      2 |
    3 rows in set (0.002 sec)

    MariaDB [test]> show create table t2\G*************************** 1. row ***************************
           Table: t2
    Create Table: CREATE TABLE `t2` (
      `created_at` timestamp NOT NULL DEFAULT current_timestamp(),
      `content` varchar(42) DEFAULT NULL,
      `bucket` tinyint(4) GENERATED ALWAYS AS (floor(to_seconds(`created_at`) / 10)
    MOD 3) STORED,
      CONSTRAINT `cnn` CHECK (`bucket` is not null)

    1 row in set (0.011 sec)
    So, maybe after all we can state that MariaDB is NOT affected.
  • Bug #94552 - "innodb.virtual_basic fails when valgrind is enabled". I still wonder if anyone in Oracle runs MTR test suite on Valgrind-enabled (-DWITH_VALGRIND=1 cmake option) at least in the process of official release (and if they check the failures). It seems not to be the case based on this bug report from Manuel Ung.
  • Bug #94553 - "Crash in trx_undo_rec_copy". Bernardo Perez noted that as a side effect of still "Verified" Bug #82734 - "trx_undo_rec_copy needlessly relies on buffer pool page alignment" (that affects both MySQL 5.7 and 8.0) we may get crashes while working with generated columns. I hope to see them both fixed soon, but for now Bug #94553 has status "Need Feedback", probably in a hope to get a repeatable test case. I'll watch it carefully.
  • Bug #94560 - "record comparison in spatial index non-leaf rtree node seems incorrect". I doubt spatial indexes of InnoDB are widely used, and I have no doubts there are many bugs waiting to be discovered in this area. This specific bug was reported by Jie Zhou who had also suggested a fix.
  • Bug #94610 - "Server stalls because ALTER TABLE on partitioned table holds dict mutex". My former colleague Justin Swanhart reported this bug just yesterday, so no wonder it is not verified yet. It refers to a well known verified old Bug #83435 - "ALTER TABLE is very slow when using PARTITIONED table"  (that I've also subscribed to immediately) from Roel Van de Paar, affecting both MySQL 5.6 and 5.7. I hope to see this bug verified and fixed soon, as recently I see this kind of state for main thread:
    Main thread process no. 3185, id 140434206619392, state: enforcing dict cache limit
    too often in INNODB STATUS outputs to my liking...
As you could note, I still try to check (at least in some cases) if MariaDB is also affected by the same problem. I think it's a useful check both for me (as I work mostly with MariaDB as a support engineer) and for the reader (to know if switching to MariaDB may help in any way or if there are any chances for MariaDB engineers to contribute anything useful, like a fix).

"Hove, actually". For years residents of Hove used this humorous reply when they live in Brighton... "Regression, actually" is what I want to say (seriously) about every other MySQL bug report I subscribe to... So, you see Hove and many regression bugs above!
To summarize:
  1. Sometimes Oracle engineers demonstrate proper collective effort to understand and carefully verify public bug reports. Good to know they are not ready to give up fast!
  2. I have to copy-paste this item from my previous post. As the list above proves, Oracle engineers still do not use "regression" tag when setting "Verified" status for obviously regression bugs. I think bug reporters should care then to always set it when they report regression of any kind.
  3. It seems there no regular MTR test runs for Valgrind builds performed by Oracle engineers, or maybe they just ignore failures.