Saturday, October 27, 2018

Fun with Bugs #71 - On Some Public Bugs Fixed in MySQL 5.7.24

Oracle released many new MySQL versions back on Monday, but I had no time during this very busy week to check anything related (besides the fact that MySQL 8.0.13 can be complied from source on my Fedora 27 box). I am sure you've read a lot about MySQL 8.0.13 elsewhere already, even patches contributed by Community are already presented in a separate post by Jesper Krogh.

I am still mostly interested in MySQL 5.7. So, here is my typical quick review of some selected bugs reported in public by MySQL Community users and fixed in MySQL 5.7.24.

My wife noticed this nice spider in the garden and reported it to me via this photo. Spider is formally not a bug, while in this post I discuss pure bugs...
Let me start with fixes in Performance Schema (that is supposed to be mostly bugs free):
  • Bug #90264 - "Some file operations in mf_iocache2.c are not instrumented". This bug reported by Yura Sorokin from Percona, who also contributed patches, is fixed in all recent Oracle releases, from 5.5.62 to 8.0.13.
  • Bug #77519 - "Reported location of Innodb Merge Temp File is wrong". This bug was reported by Daniël van Eeden back in 2015. Let's hope files are properly reported in @@tmpdir now.
  • Bug #80777 - "order by on LAST_SEEN_TRANSACTION results in empty set". Yet another bug report from Daniël van Eeden got fixed.
Let's continue with InnoDB bugs:
  • Bug #91032 - "InnoDB 5.7 Primary key scan lack data". Really weird bug was reported by Raolh Rao back in May.
  • Bug #95045 - Release notes are referring to public bug that does not exist! So, we have a bug in them. Related text:
    "It was possible to perform FLUSH TABLES FOR EXPORT on a partitioned table created with innodb_file_per_table=1 after discarding its tablespace. Attempting to do so now raises ER_TABLESPACE_DISCARDED. (Bug #95045, Bug #27903881)"
    and refer to Bug #80669 - "Failing assert: fil_space_get(table->space) != __null in row0quiesce.cc line 724", reported by Ramesh Sivaraman from Percona. In the comment from Roel there we see that actual bug was Bug #90545 that is, surprise, still private!
    Recently I found out (here) that some community members think that keeping crashing bugs private after the fixed version is released is still better than publish test cases for them before all affected versions are fixed... I am not so sure.
What about replication (group replication aside, I have enough Galera problems to deal with in my life to even think about it)? There are some interesting bug fixes:
  • Bug #90551 - "[MySQL 8.0 GA Debug Build] Assertion `!thd->has_gtid_consistency_violation'". Good to know that Oracle engineers still pay attention to debug assertions, as in this report (with nice simple test case involving XA transactions)  reported by Roel Van de Paar from Percona.
  • Bug #89370 - "semi-sync replication doesn't work for minutes after restart replication". This bug was reported by Yan Huang, who had contributed a patch for it.
  • Bug #89143 - "Commit order deadlock + retry logic is not considering trx error cases". Nice bug report from Jean-François Gagné.
  • Bug #83232 - "replication breaks after bug #74145 happens in master". FLUSH SLOW LOGS that failed on master (because of file permission problem, for example) was still written to the binary log. Nice finding by Jericho Rivera from Percona.
There are interesting bugs fixed in other categories as well. For example:
  • Bug #91914 - "Mysql 5.7.23 cmake fail with 'Unknown CMake command "ADD_COMPILE_FLAGS".'" Now thanks to this report by Tomasz Kłoczko one can build MySQL 5.7 with gcc 8.
  • Bug #91080 - "sql_safe_updates behaves inconsistently between delete and select". The fix is described as follows:
    "For DELETE and UPDATE that produced an error due to sql_safe_updates being enabled, the error message was insufficiently informative. The message now indicates that data truncation occurred or the range_optimizer_max_mem_size value was exceeded.

    Additionally: (1) Using EXPLAIN for such statements does not produce an error, enabling users to see from EXPLAIN output why an index is not used; (2) For multiple-table deletes and updates, an error is produced with safe updates enabled only if the target table or tables use a table scan."
    I am NOT sure this is the fix that bug reporter, Nakoa Mccullough, was expecting. He asked to be consistent with SELECT (that works). The bug is still closed :(
  • Bug #90624 - "Restore dump created with 5.7.22 on 8.0.11". It seems Emmanuel CARVIN asked for the working way to upgrade from 5.7.x to 8.0.x. Last comment seems to state that upgrade from 5.7.24 to 8.0.13 is still not possible. I had not checked this.
  • Bug #90505 is private. Release notes say:
    "If flushing the error log failed due to a file permission error, the flush operation did not complete. (Bug #27891472, Bug #90505) References: This issue is a regression of: Bug #26447825"
    OK, we have a private regression bug, fixed. Nice.
  • Bug #90266 - "No warning when truncating a string with data loss". It was when making BLOB/TEXT columns smaller. Nice finding by Carlos Tutte.
  • Bug #89537 - "Regression in FEDERATED storage engine after GCC 7 fixes". Yet another by report with patch contributed by Yura Sorokin.
  • Bug #88670 - "Subquery incorrectly shows duplicate values on subqueries.". Simple from results bug in optimizer affecting all versions starting from 5.6. Fixed now thanks to Mark El-Wakil.
That's all bugs I wanted to mention today. To summarize my feelings after reading the release notes:
  1. I'd surely consider upgrade to 5.7.24 in any environment where replication is used. Some InnoDB fixes also matter.
  2. We still see not only private bugs (with questionable security impact) mentioned in the release notes, but this time also a typo in bug number that makes it harder to find out what was really fixed and why.
  3. I think it would be fair for Oracle to mention Percona as a major contributor to MySQL 5.7, in a same way as Facebook is mentioned in many places with regards to 8.0.13.
  4. It's good to know that some debug assertions related bugs are still fixed. More on this later...

Sunday, October 21, 2018

On Some Recent MySQL Optimizer Bugs

Yet another topic I missed in my blog post on problematic MySQL features back in July is MySQL optimizer. Unlike with XA transactions, it was done in purpose, as known bugs, limitations and deficiencies of MySQL optimizer is a topic for a series of blog posts if not a separate blog. At the moment the list of known active bug reports in optimize category consists of 380(!) items (mostly "Verified"), aside from feature requests and bugs not considered "production" ones by current "rules" of MySQL public bugs database. I try to check optimizer bugs often in my posts, I reported many of them, but I am still not ready to cover this topic entirely.

What I can do in frames of one blog post is a quick review of some "Verified" optimizer bugs reported over last year. I'll present them one by one in a list, with some comments (mostly related to my checks of the same test case with MariaDB 10.3.7 that I have at hand) and, hopefully, some conclusions about current state of MySQL optimizer.

I'll try to shed some light on current state of MySQL optimizer, but it's huge and dark area, with many details hidden...
So, here is the list, starting from most recently reported bugs:
  • Bug #92654 - "GROUP BY fails for queries when a temporary table is involved". This bug affects recent MySQL 8.0.12 and 5.7.23, but does not affect MariaDB 10.3, for example, from what I see:
    MariaDB [test]> insert into domain_tree values (1), (2), (3);
    Query OK, 3 rows affected (0.080 sec)
    Records: 3  Duplicates: 0  Warnings: 0

    MariaDB [test]> insert into host_connection_info values (1), (3);
    Query OK, 2 rows affected (0.054 sec)
    Records: 2  Duplicates: 0  Warnings: 0

    MariaDB [test]> SELECT
        ->   COUNT(1),
        ->   host_connection_status.connection_time
        -> FROM
        ->   (SELECT id
        ->    FROM domain_tree) AS hosts_with_status
        ->   LEFT OUTER JOIN
        ->   (SELECT
        ->      domain_id,
        ->      'recent' AS connection_time
        ->    FROM
        ->      host_connection_info) AS host_connection_status
        ->     ON hosts_with_status.id = host_connection_status.domain_id
        -> GROUP BY host_connection_status.connection_time;
    +----------+-----------------+
    | COUNT(1) | connection_time |
    +----------+-----------------+
    |        1 | NULL            |
    |        2 | recent          |
    +----------+-----------------+
    2 rows in set (0.003 sec)
  • Bug #92524 - "Left join with datetime join condition produces wrong results". The bug was reported by Wei Zhao, who contributed a patch. Again, MariaDB 10.3 is not affected:
    MariaDB [test]> select B.* from h1 left join g B on h1.a=B.a where B.d=str_to_date('99991231',"%Y%m%d") and h1.a=1;
    +---+---------------------+
    | a | d                   |
    +---+---------------------+
    | 1 | 9999-12-31 00:00:00 |
    +---+---------------------+
    1 row in set (0.151 sec)

    MariaDB [test]> select B.* from h1 left join g B on h1.a=B.a and B.d=str_to_date
    ('99991231',"%Y%m%d") where h1.a=1;
    +---+---------------------+
    | a | d                   |
    +---+---------------------+
    | 1 | 9999-12-31 00:00:00 |
    +---+---------------------+
    1 row in set (0.002 sec)
  • Bug #92466 - "Case function error on randomly generated values". See also related older Bug #86624 - "Subquery's RAND() column re-evaluated at every reference". These are either regressions comparing to MySQL 5.6 (and MariaDB), or unclear and weird change in behavior that can be workarounded with some tricks (suggested by Oracle developers) to force materialization of derived table. Essentially, result depends on execution plan - what else could we dream about?
  • Bug #92421 - "Queries with views and operations over local variables don't use indexes". Yet another case when MySQL 5.6 worked differently. As Roy Lyseng explained in comments:
    "... this is due to a deliberate choice that was taken when rewriting derived tables and views in 5.7: When a user variable was assigned a value in a query block, merging of derived tables was disabled.
    ...
    In 8.0, you can override this with a merge hint: /*+ merge(v_test) */, but this is unfortunately not implemented in 5.7.
    "
  • Bug #92209 - "AVG(YEAR(datetime_field)) makes an error result because of overflow". All recent MySQL versions and MariaDB 10.3.7 are affected.
  • Bug #92020 - "Introduce new SQL mode rejecting queries with results depending on query plan". Great feature request by Sveta Smirnova that shows current state of optimizer development properly. We need a feature for MySQL to stop accepting queries that may return different results depending on the execution plan. So, current MySQL considers different results when different execution plans are used normal! Sveta refers to her Bug #91878 - "Wrong results with optimizer_switch='derived_merge=ON';" as an example. MariaDB 10.3 is NOT affected by that bug.
  • Bug #91418 - "derived_merge causing incorrect results with distinct subquery and uuid()". From what I see in my tests, MariaDB 10.3.7 produce wrong results with derived_merge both ON and OFF, unfortunately.
  • Bug #91139 - "use index dives less often". In MySQL 5.7+ the default value of eq_range_index_dive_limit increased from to 10 to 200, and this may negatively affect performance. As Mark Callaghan noted, when there is only one possible index exists optimizer doesn't need to evaluate the query to figure out how to evaluate the query.
  • Bug #90847 - "Query returns wrong data if order by is present". This is definitely a corner case, but still. MariaDB 10.3 returns correct result in my tests.
  • Bug #90398 - "Duplicate entry for key '<group_key>' error". I can not reproduce the last public test case on MariaDB 10.3.
  • Bug #89419 - "Incorrect use of std::max". It was reported based on code analysis by Zsolt Parragi. See also Bug #90853 - "warning: taking the max of a value and unsigned zero is always equal to the other value [-Wmax-unsigned-zero]". Proper compiler detects this.
  • Bug #89410 - "delete from ...where not exists with table alias creates an ERROR 1064 (42000)". MariaDB 10.3 is also affected. Both Oracle and PostrgeSQL accepts the syntax, while in MySQL and MariaDB we can use multi-table delete syntax-based workaround as suggested by Roy Lyseng.
  • Bug #89367 - "Storing result in a variable(UDV) causes query on a view to use derived tables", was reported by Jaime Sicam. This is a kind of regression in MySQL 5.7. MariaDB 10.3 and MySQL 8.0 are not affected. Let me quote a comment by Roy Lyseng:
    "In 5.7, we added a heuristic so that queries that assign user variables are by default materialized and not merged. However, we should have let the ALGORITHM=MERGE override this decision. This is a bug."
  • Bug #89182 - "Optimizer unable to ignore index part entering less optimal query plan". Nice report from Przemyslaw Malkowski. One of many case when "ref" vs "range" decision seems to be wrong based on costs. Looks like optimizer still have parts that are heuristics/rules based and/or do not take costs into account properly.
  • Bug #89149 - "SELECT DISTINCT on multiple TEXT columns is slow". Yet another regression in MySQL 5.7+.
That's all optimizer bugs reported in 2018 and still "Verified" that I wanted to discuss.

From the list above I can conclude the following:
  1. There are many simple enough cases when queries provide wrong results or get not optimal execution plans in MySQL. For many of them MariaDB's optimizer does a better job.
  2. Behavior of optimizer for some popular use cases changed after MySQL 5.6, so take extra care to check queries and their results after upgrade to MySQL 5.7+.
  3. derived_merge optimization seems to cause a lot of problems for users in MySQL 5.7 and 8.0.
  4. It seems optimizer developers care enough to comment on bugs, suggest workarounds and explain decisions made.

Saturday, October 6, 2018

On MySQL XA Transactions

One of the features I missed in my blog post on problematic MySQL features back in July is XA transactions. Probably I was too much in a hurry, as this feature is known to be somewhat buggy, limited and not widely used outside of Java applications. My first related feature request, Bug #40445 - "Provide C functions library implementing X/OPEN XA interface for Bea Tuxedo", was created almost 10 years ago, based on the issue from one of MySQL/Sun customers of that time. I remember some internal debates on how much time and efforts the implementation may require, but the decision was not made anyway, and one still can not directly integrate MySQL with Tuxedo transaction manager (that the idea of XA transactions originally came from). It's even more funny to see that feature request still just "Verified" when taking into account the fact that BEA Tuxedo software is Oracle's software since... 2008.

XA Transactions support is a useful MySQL feature, but I wonder if one day it may just become abandoned as that West Pier in Brighton, or overwhelmed with many small bugs in a same way as this stairs to the beach in Hove...

But maybe XA transactions are not widely used and nobody cares much about them?

Let me try to do a quick review of related active bug reports and feature requests before making any conclusions:
  • Bug #91702 - "Feature Request: JOIN support for XA START command". This feature request was added less than 3 months ago and is still "Open". It means there are users interested in this feature, but Oracle engineers do not care much even to verify related requests, even less - to give them some priority. 
    See also Bug #78498 - "XA issue or limitation with 5.6.19 engine", reported 3 years ago, that is essentially about the same limitation. As bug reporter explained:
    "... it prevents us to use MySQL with Weblogic on 2 phase commit scenarii..."
  • Yet another example of a request ignored for a long time is Bug #90659 - "implicit commit and unlocking with xa start", that is about the inconsistency of current implementation. Even less (as we already know) they care about XA support outside of Java as one can conclude from the fact that Connector/Net related request, Bug #70587 - "Dot Net Distributed transaction not supported in MySql Server 5.6", had not got any attention since July, 2015...
  • Bug #91646 - "xa command still operation when super_read_only is true". This bug was reported in July by Zhenghu Wen. It seems nobody cares much about XA transactions integration when new features are added to MySQL server.
  • Bug #89860 - "XA may lost prepared transaction and cause different between master and slave". This bug reported by Michael Yang (See also Bug #88534) sounds really serious and was previously reported by Andrei Elkin (who works for MariaDB now) as Bug #76233 - "XA prepare is logged ahead of engine prepare". See also Bug #87560 - "XA PREPARE log order error in replication and binlog recovery" by Wei Zhao, who also contributed a patch. See also Bug #83983 - "innodb fail to recover the prepared xa transaction" (the bug reported by Dennis Gao is still "Open", while it's clearly related to or is a duplicate of "Verified" bugs mentioned above).
    So many related/duplicate problem reports, but no fix so far!
  • Bug #88748 - "InnoDB: Failing assertion: trx->conc_state == 1". This assertion failure was reported by Roel Van de Paar back in December, 2017. See also his Bug #84754 - "oid String::chop(): Assertion `strlen(m_ptr) == m_length' failed."
    I noted that Oracle recently invented new "low" severity levels, and this bug is S6 (Debug Builds). I do not really agree that assertions in debug builds are of so low severity - they are in the code for a reason, to prevent crashes in non-debug builds and all kinds of inconsistencies.
  • Bug #87526 - "The output of 'XA recover convert xid' is not useful". This bug reported by Sveta Smirnova caused a lot of troubles to poor users with prepared transactions hanging around for weeks after crash, as it prevented any easy way to get rid of them (and related locks) in some cases. The bug is still "Verified" in MySQL and "On hold" in Percona Server, while MariaDB fixed it in 10.3, see MDEV-14593.
  • Bug #87130 - "XA COMMIT not taken as transaction boundary". Yet another bug report with a patch from Wei Zhao.
  • Bug #75205 - "Master should write a LOST_EVENTS entry on xa commit after recovery." Daniël van Eeden reported this at early 5.7 pre-GA stage, and manual explains now that:
    "In MySQL 5.7.7 and later, there is a change in behavior and an XA transaction is written to the binary log in two parts. When XA PREPARE is issued, the first part of the transaction up to XA PREPARE is written using an initial GTID. A XA_prepare_log_event is used to identify such transactions in the binary log. When XA COMMIT or XA ROLLBACK is issued, a second part of the transaction containing only the XA COMMIT or XA ROLLBACK statement is written using a second GTID. Note that the initial part of the transaction, identified by XA_prepare_log_event, is not necessarily followed by its XA COMMIT or XA ROLLBACK, which can cause interleaved binary logging of any two XA transactions. The two parts of the XA transaction can even appear in different binary log files. This means that an XA transaction in PREPARED state is now persistent until an explicit XA COMMIT or XA ROLLBACK statement is issued, ensuring that XA transactions are compatible with replication."
    but the bug report is still "Verified".
    By the way, the need to deal with such prepared transactions recovered from the binary log caused problems like those listed above (with XA RECOVER CONVERT and order of preparing in the binary log vs engines that support XA...
  • Bug #71351 - "select hit query cache after xa commit, no result return". This bug probably affects only MySQL 5.5, so no wonder it's ignored now. Nobody tried to fix it while MySQL 5.5 was still supported, though.
There are some more bugs originally filed in other categories, but still related to XA:
  • Bug #72036 - "XA isSameRM() shouldn't take database into account". This Connecotr/J bug was reported in 2014 by Jess Balint.
  • Bug #78050 - "Crash on when XA functions activated by a storage engine". It happens when binary log not enabled. This bug was reported by Zhenye Xie, who also contributed a patch later. Still this crashing bug remains "Verified".
  • Bug #87385 - "Partial external XA transactions are not rolled back correctly". Yet another bug report with a patch from Wei Zhao. See also his Bug #87389 - "Replication position not persisted correctly for XA transactions".
  • Bug #91633 - "Replication failure (errno 1399) on update in XA tx after deadlock". This bug reported by Lukas Sydorowski got recent comment from other community member yesterday. So, the feature is used these days, still.
Now time for conclusions:
  1. Take extra care while using XA transactions in replication environments or with point in time recovery - you may easily end up with slaves out of sync with master and data lost.
  2. Feature requests related to XA transactions are mostly ignored, sometimes for a decade... 
  3. Patches contributed do not seem to speed up XA bug fixing.
  4. I'd say that Oracle does not care much about XA Transactions since MySQL 5.7 GA release in 2015.
  5. MySQL Community still use XA transactions with MySQL (and they will be used even more as corporate users migrate from Oracle RDBMS), find bugs and even try to fix them. But probably will have to use forks rather than MySQL itself if current attitude towards XA bugs processing and fixing remains.