Saturday, October 6, 2018

On MySQL XA Transactions

One of the features I missed in my blog post on problematic MySQL features back in July is XA transactions. Probably I was too much in a hurry, as this feature is known to be somewhat buggy, limited and not widely used outside of Java applications. My first related feature request, Bug #40445 - "Provide C functions library implementing X/OPEN XA interface for Bea Tuxedo", was created almost 10 years ago, based on the issue from one of MySQL/Sun customers of that time. I remember some internal debates on how much time and efforts the implementation may require, but the decision was not made anyway, and one still can not directly integrate MySQL with Tuxedo transaction manager (that the idea of XA transactions originally came from). It's even more funny to see that feature request still just "Verified" when taking into account the fact that BEA Tuxedo software is Oracle's software since... 2008.

XA Transactions support is a useful MySQL feature, but I wonder if one day it may just become abandoned as that West Pier in Brighton, or overwhelmed with many small bugs in a same way as this stairs to the beach in Hove...

But maybe XA transactions are not widely used and nobody cares much about them?

Let me try to do a quick review of related active bug reports and feature requests before making any conclusions:
  • Bug #91702 - "Feature Request: JOIN support for XA START command". This feature request was added less than 3 months ago and is still "Open". It means there are users interested in this feature, but Oracle engineers do not care much even to verify related requests, even less - to give them some priority. 
    See also Bug #78498 - "XA issue or limitation with 5.6.19 engine", reported 3 years ago, that is essentially about the same limitation. As bug reporter explained:
    "... it prevents us to use MySQL with Weblogic on 2 phase commit scenarii..."
  • Yet another example of a request ignored for a long time is Bug #90659 - "implicit commit and unlocking with xa start", that is about the inconsistency of current implementation. Even less (as we already know) they care about XA support outside of Java as one can conclude from the fact that Connector/Net related request, Bug #70587 - "Dot Net Distributed transaction not supported in MySql Server 5.6", had not got any attention since July, 2015...
  • Bug #91646 - "xa command still operation when super_read_only is true". This bug was reported in July by Zhenghu Wen. It seems nobody cares much about XA transactions integration when new features are added to MySQL server.
  • Bug #89860 - "XA may lost prepared transaction and cause different between master and slave". This bug reported by Michael Yang (See also Bug #88534) sounds really serious and was previously reported by Andrei Elkin (who works for MariaDB now) as Bug #76233 - "XA prepare is logged ahead of engine prepare". See also Bug #87560 - "XA PREPARE log order error in replication and binlog recovery" by Wei Zhao, who also contributed a patch. See also Bug #83983 - "innodb fail to recover the prepared xa transaction" (the bug reported by Dennis Gao is still "Open", while it's clearly related to or is a duplicate of "Verified" bugs mentioned above).
    So many related/duplicate problem reports, but no fix so far!
  • Bug #88748 - "InnoDB: Failing assertion: trx->conc_state == 1". This assertion failure was reported by Roel Van de Paar back in December, 2017. See also his Bug #84754 - "oid String::chop(): Assertion `strlen(m_ptr) == m_length' failed."
    I noted that Oracle recently invented new "low" severity levels, and this bug is S6 (Debug Builds). I do not really agree that assertions in debug builds are of so low severity - they are in the code for a reason, to prevent crashes in non-debug builds and all kinds of inconsistencies.
  • Bug #87526 - "The output of 'XA recover convert xid' is not useful". This bug reported by Sveta Smirnova caused a lot of troubles to poor users with prepared transactions hanging around for weeks after crash, as it prevented any easy way to get rid of them (and related locks) in some cases. The bug is still "Verified" in MySQL and "On hold" in Percona Server, while MariaDB fixed it in 10.3, see MDEV-14593.
  • Bug #87130 - "XA COMMIT not taken as transaction boundary". Yet another bug report with a patch from Wei Zhao.
  • Bug #75205 - "Master should write a LOST_EVENTS entry on xa commit after recovery." DaniĆ«l van Eeden reported this at early 5.7 pre-GA stage, and manual explains now that:
    "In MySQL 5.7.7 and later, there is a change in behavior and an XA transaction is written to the binary log in two parts. When XA PREPARE is issued, the first part of the transaction up to XA PREPARE is written using an initial GTID. A XA_prepare_log_event is used to identify such transactions in the binary log. When XA COMMIT or XA ROLLBACK is issued, a second part of the transaction containing only the XA COMMIT or XA ROLLBACK statement is written using a second GTID. Note that the initial part of the transaction, identified by XA_prepare_log_event, is not necessarily followed by its XA COMMIT or XA ROLLBACK, which can cause interleaved binary logging of any two XA transactions. The two parts of the XA transaction can even appear in different binary log files. This means that an XA transaction in PREPARED state is now persistent until an explicit XA COMMIT or XA ROLLBACK statement is issued, ensuring that XA transactions are compatible with replication."
    but the bug report is still "Verified".
    By the way, the need to deal with such prepared transactions recovered from the binary log caused problems like those listed above (with XA RECOVER CONVERT and order of preparing in the binary log vs engines that support XA...
  • Bug #71351 - "select hit query cache after xa commit, no result return". This bug probably affects only MySQL 5.5, so no wonder it's ignored now. Nobody tried to fix it while MySQL 5.5 was still supported, though.
There are some more bugs originally filed in other categories, but still related to XA:
  • Bug #72036 - "XA isSameRM() shouldn't take database into account". This Connecotr/J bug was reported in 2014 by Jess Balint.
  • Bug #78050 - "Crash on when XA functions activated by a storage engine". It happens when binary log not enabled. This bug was reported by Zhenye Xie, who also contributed a patch later. Still this crashing bug remains "Verified".
  • Bug #87385 - "Partial external XA transactions are not rolled back correctly". Yet another bug report with a patch from Wei Zhao. See also his Bug #87389 - "Replication position not persisted correctly for XA transactions".
  • Bug #91633 - "Replication failure (errno 1399) on update in XA tx after deadlock". This bug reported by Lukas Sydorowski got recent comment from other community member yesterday. So, the feature is used these days, still.
Now time for conclusions:
  1. Take extra care while using XA transactions in replication environments or with point in time recovery - you may easily end up with slaves out of sync with master and data lost.
  2. Feature requests related to XA transactions are mostly ignored, sometimes for a decade... 
  3. Patches contributed do not seem to speed up XA bug fixing.
  4. I'd say that Oracle does not care much about XA Transactions since MySQL 5.7 GA release in 2015.
  5. MySQL Community still use XA transactions with MySQL (and they will be used even more as corporate users migrate from Oracle RDBMS), find bugs and even try to fix them. But probably will have to use forks rather than MySQL itself if current attitude towards XA bugs processing and fixing remains.