Saturday, March 30, 2019

On References to MariaDB and MariaDB Bugs (MDEVs) in MySQL Bug Reports

Recently I noted that some of my comments to public MySQL bug reports got hidden by somebody from Oracle with privileges to do so. I was not able to find out who did that and when, as this information is not communicated to bug subscribers (this may change if my feature requests, Bug #94807 - "Subscriber should be notified when comment is made private", is eventually implemented).

When it happened for the first time I thought it was probably non-intentional. When it happened for a second time I complained with a tweet that got few likes and zero comments. Recently this happened again and yet another tweet had not got much attention, but at least I've got a comment via Bug #94797 that my comment there (where I checked test case on MariaDB version I had at hand to find out it's not affected, something I often do for bugs mentioned in my blog posts here) was hidden as irrelevant and "an attempt to advertise MariaDB".

Snow hides everything, good and bad, dog shit, holes in the road and autumn flowers... Do we really want information provided in comments to public MySQL bugs got hidden just because someone once decided it's "bad"?
I really wonder if any of my readers think that I advertise MariaDB with my public posts or public comments anywhere or specifically in MySQL bug reports?

I'd also like to share here, where no one besides me can hide or delete comments (I hope), what was hidden in the case that caused me to tweet about censorship I have to deal with. In Bug #94610 - "Server stalls because ALTER TABLE on partitioned table holds dict mutex" that ended up as "Not a Bug" (not even a duplicate of a verified Bug #83435 - "ALTER TABLE is very slow when using PARTITIONED table" it referred to and extended with a global mutex usage highlighted and impact explained), I've added the following comment:
"[12 Mar 7:30] Valeriy Kravchuk
Not only it stalls, but if it stalls for long enough time it will crash :)

Useful related reading is here: https://jira.mariadb.org/browse/MDEV-15641
"
The comment was hidden very soon. Now, if you check that link, you'll see confirmed, unresolved MariaDB bug report. I mostly had this comment to the MDEV-15641 in mind, were my colleague and well known InnoDB developer Marko Mäkelä stated:
"The row_log_table_apply() is actually invoked while holding both dict_sys->mutex and dict_operation_lock. If there is a lot of log to apply, this may actually cause InnoDB to crash."
I may be mistaking in linking these two bug reports together, but isn't highlighting the possibility of crash due to long semaphore wait important to understand the impact of the bug report and triage it properly? What wrong MySQL users and bug report readers may see if they follow the link to MariaDB bug I considered relevant? What was advertised by this comment that is harmful or useless for MySQL Community?

I was even more surprised by these recent actions on my comments because in the past I had never noted similar approach. Check the following bug reports, for example (I searched for those with "MDEV" and "Kravchuk" in them to get these):
  • Bug #80919 - "MySQL Crashes when Droping Indexes - Long semaphore wait". In this bug report (real bug fixed in 5.7.22) I've added a comment that refers to MDEV-14637. The comment still remains public and, IMHO, is still useful. Providing this link helped to get proper attention to the bug, so it was re-opened and got comments from Oracle engineers finally. Was it an attempt to advertise MariaDB? How this case is different from my comment in Bug #94610 quoted above? 
  • Bug #84185 - "Not all "Statements writing to a table with an auto-increment..." are unsafe". I reported this "upstream" MySQL bug based on MDEV-10170 - "Misleading "Statements writing to a table with an auto-increment column after selecting from another table are unsafe" on DELETE ... SELECT", previously found by my colleague Hartmut Holzgraefe. I've also added link to the "upstream" MySQL bug report to that MDEV. Does anybody in MySQL or MariaDB user communities think that such cross-references are useless, harmful or may be considered as and "attempt to advertise competitor" if any of vendors fixes the bug first?
  • Bug #48392 - "mysql_upgrade improperly escapes passwords with single quotes". I verified this bug in 2009 while working for MySQL at Sun, and it still remains "Verified" (I had not re-checked if it's still repeatable with current MySQL versions). Then in 2013 community user added a comment referring to the MariaDB bug, MDEV-4664 - "mysql_upgrade crashes if root's password contains an apostrophe/single quotation mark" that was fixed later, in 2015. This comment still remains public and is useful!
So, had my comments that mention MDEVs or MariaDB in general became so irrelevant and MariaDB advertising recently comparing to the previous ones? What exact community standards or rules they break? Is it now forbidden to any user of MySQL bugs database to mention MariaDB or bugs in it, use MariaDB in tests to make some point and share the results in public in MySQL bugs database, or the problem is with me personally doing this?

I'd be happy to read explanations or opinions from MySQL community users and my former Oracle colleagues in comments to this blog post.

Saturday, March 9, 2019

Fun with Bugs #81 - On MySQL Bug Reports I am Subscribed to, Part XVII

Two weeks passed since my previous review of public MySQL bug reports I consider interesting enough to subscribe to them. Over this period I picked up a dozen or so new public bug reports that I'd like to briefly review today.

Here is my recent subscriptions list, starting from the oldest bug reports:
  • Bug #94431 - "Can't upgrade from 5.7 to 8.0 if any database have a hyphen in their name". It seems one actually needs a database like that created in MySQL 5.6 with at least one InnoDB table having FULLTEXT index to hit the problem. Great finding by Phil Murray. Note that after several unsuccessful attempts by others the bug was eventually reproduced and verified by Jesper Wisborg Krogh. Let's hope we'll see it fixed in MySQL 8.0.16.
  • Bug #94435 - "mysql command hangs up and cosume CPU almost 100%". It was reported by Masaaki HIROSE, whose previous related/similar Bug #94219 - "libmysqlclient enters and infinite loop and consume CPU usage 100%" ended up as "Not a bug" (wrongly, IMHO, as nobody cared enough to reproduce the steps instead of commenting on their correctness and checking something else). Bug reporter had not only insisted and provided all the details, but also tried to analyze the reasons of the bug and provided links to other potentially related bug reports (Bug #88428 - "mysql_real_query hangs with EINTR errno (using YASSL)" and Bug #92394 - "libmysqlclient enters infinite loop after signal (race condition)"). Great job and nice to see the bug "Verified" eventually.
  • Bug #94441 - "empty ibuf aio reads in innodb status". This regression vs MySQL 5.6 was noted by Nikolai Ikhalainen from Percona. MariaDB 10.3.7 is also affected, unfortunately:
    ...
    I/O thread 9 state: native aio handle (write thread)
    Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
     ibuf aio reads:, log i/o's:, sync i/o's:Pending flushes (fsync) log: 0; buffer pool: 0
    1344 OS file reads, 133 OS file writes, 2 OS fsyncs
    ...
  • Bug #94448 - "Rewrite LOG_BLOCK_FIRST_REC_GROUP during recovery may be dangerous.". Yet another MySQL 8 regression (not marked with "regression" tag) was found by Kang Wang.
  • Bug #94476 - "mysql semisync replication stuck with master in Waiting to finalize termination". It has "Need feedback" status at the moment. I've subscribed to this report from Shirish Keshava Murthy mostly to find out how a report that may look like a free support request will be processed by Oracle engineers. Pure curiosity, for now.
  • Bug #94504 - "AIO::s_log seems useless". This problem was reported by Yuhui Wang. It's a regression in a sense that part of the code is no longer needed (and seems not to be used) in MySQL 8, but still remains.
  • Bug #94541 - "Assertion on import via Transportable Tablespace". This bug reported by  Daniël van Eeden was verified based on code review and some internal discussion. We do not know if any other version besides 5.7.25 is affected, though. The assertion itself:
    InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no()
    does not seem to be unique. We can find it in MDEV-18455 also (in other context).
  • Bug #94543 - "MySQL does not compile with protobuf 3.7.0". I care about build/compiling bugs historically, as I mostly use MySQL binaries that I built myself from GitHub source. So, I've immediately subscribed to this bug report from Laurynas Biveinis.
  • Bug #94548 - "Optimizer error evaluating JSON_Extract". This bug was reported by Dave Pullin. From my quick test it seems MariaDB 10.3.7 is also affected. Error message is different in the failing case, but the point is the same - the function is not evaluated if the column from derived table that is built using the function is not referenced in the SELECT list. This optimization is questionable and may lead to hidden "bombs" in the application code.
  • Bug #94550 - "generated columns referring to current_timestamp fail". I tried to check simple test case in this bug report by Mario Beck on MariaDB 10.3.7, but it does not seem to accept NOT NULL constraint for generated stored columns at all:
    MariaDB [test]> CREATE TABLE `t2` (
        ->   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        ->   `content` varchar(42) DEFAULT NULL,
        ->   `bucket` tinyint(4) GENERATED ALWAYS AS ((floor((to_seconds(`created_at
    `) / 10)) % 3)) STORED NOT NULL);
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that
    corresponds to your MariaDB server version for the right syntax to use near 'NOT
     NULL)' at line 4
    I do not see this option in formal syntax described here as well. But in case of MariaDB we can actually make sure the generated column is never NULL by adding CHECK constraint like this:
    MariaDB [test]> CREATE TABLE `t2` (    ->   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        ->   `content` varchar(42) DEFAULT NULL,
        ->   `bucket` tinyint(4) GENERATED ALWAYS AS ((floor((to_seconds(`created_at`) / 10)) % 3)) STORED);
    Query OK, 0 rows affected (0.434 sec)

    MariaDB [test]> INSERT INTO t2 (content) VALUES ("taraaaa");
    Query OK, 1 row affected (0.070 sec)

    MariaDB [test]> alter table t2 add constraint cnn CHECK (`bucket` is NOT NULL);
    Query OK, 1 row affected (1.159 sec)
    Records: 1  Duplicates: 0  Warnings: 0

    MariaDB [test]> INSERT INTO t2 (content) VALUES ("tarabbb");
    Query OK, 1 row affected (0.029 sec)

    MariaDB [test]> INSERT INTO t2 (content) VALUES ("");
    Query OK, 1 row affected (0.043 sec)

    MariaDB [test]> select * from t2;
    +---------------------+---------+--------+
    | created_at          | content | bucket |
    +---------------------+---------+--------+
    | 2019-03-09 17:28:03 | taraaaa |      0 |
    | 2019-03-09 17:29:43 | tarabbb |      1 |
    | 2019-03-09 17:29:50 |         |      2 |
    +---------------------+---------+--------+
    3 rows in set (0.002 sec)

    MariaDB [test]> show create table t2\G*************************** 1. row ***************************
           Table: t2
    Create Table: CREATE TABLE `t2` (
      `created_at` timestamp NOT NULL DEFAULT current_timestamp(),
      `content` varchar(42) DEFAULT NULL,
      `bucket` tinyint(4) GENERATED ALWAYS AS (floor(to_seconds(`created_at`) / 10)
    MOD 3) STORED,
      CONSTRAINT `cnn` CHECK (`bucket` is not null)

    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.011 sec)
    So, maybe after all we can state that MariaDB is NOT affected.
  • Bug #94552 - "innodb.virtual_basic fails when valgrind is enabled". I still wonder if anyone in Oracle runs MTR test suite on Valgrind-enabled (-DWITH_VALGRIND=1 cmake option) at least in the process of official release (and if they check the failures). It seems not to be the case based on this bug report from Manuel Ung.
  • Bug #94553 - "Crash in trx_undo_rec_copy". Bernardo Perez noted that as a side effect of still "Verified" Bug #82734 - "trx_undo_rec_copy needlessly relies on buffer pool page alignment" (that affects both MySQL 5.7 and 8.0) we may get crashes while working with generated columns. I hope to see them both fixed soon, but for now Bug #94553 has status "Need Feedback", probably in a hope to get a repeatable test case. I'll watch it carefully.
  • Bug #94560 - "record comparison in spatial index non-leaf rtree node seems incorrect". I doubt spatial indexes of InnoDB are widely used, and I have no doubts there are many bugs waiting to be discovered in this area. This specific bug was reported by Jie Zhou who had also suggested a fix.
  • Bug #94610 - "Server stalls because ALTER TABLE on partitioned table holds dict mutex". My former colleague Justin Swanhart reported this bug just yesterday, so no wonder it is not verified yet. It refers to a well known verified old Bug #83435 - "ALTER TABLE is very slow when using PARTITIONED table"  (that I've also subscribed to immediately) from Roel Van de Paar, affecting both MySQL 5.6 and 5.7. I hope to see this bug verified and fixed soon, as recently I see this kind of state for main thread:
    Main thread process no. 3185, id 140434206619392, state: enforcing dict cache limit
    too often in INNODB STATUS outputs to my liking...
As you could note, I still try to check (at least in some cases) if MariaDB is also affected by the same problem. I think it's a useful check both for me (as I work mostly with MariaDB as a support engineer) and for the reader (to know if switching to MariaDB may help in any way or if there are any chances for MariaDB engineers to contribute anything useful, like a fix).

"Hove, actually". For years residents of Hove used this humorous reply when they live in Brighton... "Regression, actually" is what I want to say (seriously) about every other MySQL bug report I subscribe to... So, you see Hove and many regression bugs above!
To summarize:
  1. Sometimes Oracle engineers demonstrate proper collective effort to understand and carefully verify public bug reports. Good to know they are not ready to give up fast!
  2. I have to copy-paste this item from my previous post. As the list above proves, Oracle engineers still do not use "regression" tag when setting "Verified" status for obviously regression bugs. I think bug reporters should care then to always set it when they report regression of any kind.
  3. It seems there no regular MTR test runs for Valgrind builds performed by Oracle engineers, or maybe they just ignore failures.