Saturday, December 31, 2016

Fun with Bugs #46 - On Some Bugs I've Reported During the Year of 2016

It's time to summarize the year of 2016. As a kind of a weird summary, in this post I'd like to share a list of MySQL bug reports I've created in 2016 that are still remaining "Verified" today:

  • Bug #79831 - "Unexpected error message on crash-safe slave with max_relay_log_size set". According to Umesh this is not repeatable with 5.7. The fact that I've reported the bug on January 4 probably means I was working at that time. I should not repeat this mistake again next year.
  • Bug #80067 - "Index on BIT column is NOT used when column name only is used in WHERE clause". People say the same problem happens with INT and, what may be even less expected, BOOLEAN columns.
  • Bug #80424 - "EXPLAIN output depends on binlog_format setting". Who could expect that?
  • Bug #80619 - "Allow slave to filter replication events based on GTID". In this feature request I've suggested to implement filtering by GTID pattern, so that we can skip all events originating from specific master on some slave in a complex replication chain.
  • Bug #82127 - "Deadlock with 3 concurrent DELETEs by UNIQUE key". It's clear that manual is not even close to explaining how the locks are really set "by design" in this weird case. See comments in MDEV-10962 for some explanations. Nobody from Oracle event tried to really explain how things are designed to work.
  • Bug #82212 - "mysqlbinlog can produce events larger than max_allowed_packet for mysql". This happens for encoded row-based events. There should be some way to take this overhead into account while creating binary log, IMHO.
  • Bug #83024 - "Internals manual does not explain COM_SLEEP in details". One day you'll see Sleep for some 17 seconds logged into the slow query log, and may start to wonder why...
  • Bug #83248 - "Partition pruning is not working with LEFT JOIN". You may find some interesting related ideas in MDEV-10946.
  • Bug #83640 - "Locks set by DELETE statement on already deleted record". This case shows that design of locking in InnoDB does produce really weird outcomes sometimes. This is not about "missing manual", this is about extra lock set that is absolutely NOT needed (a gap X lock on a record in the secondary unique index is set when the same transaction transaction already has the next key lock on it). As a side note, I keep finding, explaining and reporting weird or undocumented details in InnoDB locking for years, still my talk about InnoDB locks was not accepted by Oracle once again for OOW in 2016. What do I know about the subject and who even cares about those locks... 
  • Bug #83708 - "uint expression is used for the value that is passed as my_off_t for DDL log". I was really shocked by this finding. I assumed that all uint vs unsigned long long improper casts are already found. It seems I was mistaking.
  • Bug #83912 - "Time spent sleeping before entering InnoDB is not measured/reported separately". The use case that led me to reporting this bug is way more interesting than the fact that some wait is not instrumented in performance_schema. You may see more related bug reports from me next year.
  • Bug #83950 - "LOAD DATA INFILE fails with an escape character followed by a multi-byte one". This single bug (and related bugs and stories) were original topic for issue #46 of my "Fun With Bugs" series. I was not able to write everything I want properly over last 3 weeks, but trust me: it's a great story, of "Let's Make America Great Again" style. With the goal for LOAD DATA to behave exactly as INSERT when wrong utf8 data are inserted, Oracle changed the way LOAD DATA works back and forth, with the last change (back) happened in 5.7.17:
     "Incompatible Change: A change made in MySQL 5.7.8 for handling of multibyte character sets by LOAD DATA was reverted due to the replication incompatibility (Bug #24487120, Bug #82641)"
    I just can not keep up with all the related fun people have in replication environments thanks to these ongoing changes... It's incredible.
  • Bug #84004 - "Manual misses details on MDL locks set and released for online ALTER TABLE". Nothing new: locks in MySQL are not properly/completely documented, metadata locks included. yes, they are documented better now, after 11+ years of my continuous efforts (of a kind), but we are "not there yet". I am still waiting for a job offer to join MySQL Documentation Team, by the way :)
  • Bug #84173 - "mysqld_safe --no-defaults & silently does NOT work any more". Recent MySQL 5.7.17 release had not only given us new Group Replication plugin and introduced incompatible changes. In a hope to fix security issues it comes with pure regression - for the first time in last 11 years mysqld_safe --no-defaults stopped working for me! By the way, mysqld_safe is still NOT safe in a sense that 5.7.17 tried to enforce, and one day (really soon) you will find out why.
  • Bug #84185 - "Not all "Statements writing to a table with an auto-increment..." are unsafe". If you do something like DELETE FROM `table` WHERE some_col IN (SELECT some_id FROM `other_table`) where `table` has auto_increment column, why should anyone care about it? We do not generate the value, we delete rows...
    This bug report was actually created by Hartmut Holzgraefe and test case comes from Elena Stepanova (see MDEV-10170). I want to take this opportunity to thank them and other colleagues from MariaDB for their hard work and cooperation during the year of 2016. Thanks to Umesh (who processed most of my bug reports),  Sinisa Milivojevic and Miguel Solorzano for their verifications of my bug reports this year.

In conclusion I should say that, no matter how pointless you may consider this activity, I still suggest you to report each and every problem that you have with MySQL and can not understand after reading the manual, as a public MySQL bug. Now, re-read my 4 years old post on this topic and have a Happy and Fruitful New Year 2017!

Saturday, December 3, 2016

MySQL Support Engineer's Chronicles, Issue #4

This week I had to deal with some unusual problems. But let me start with Percona's xtrabackup, software that I consider a key component of many current production MySQL setups and use regularly. Recently new minor versions of XtraBackup were released, check the details on 2.4.5, for example. It made a step towards support of MariaDB 10.2, but it's still a long way to go, see this pull request #200.

My main problem with xtrabackup, though, is not with lack of support of MariaDB 10,2-specific features. Why should they care, after all... The problem is that old well known bugs and problems are not resolved, those that may affect all MySQL versions, forks and environments. Check lp:1272329 , "innobackupex dies if a directory is not readable. i.e.: lost+found", for example. Why not to read and take into account ignore_db_dir option (as server does) and let those poor souls who used mount point as a datadir to make backups? Check even older problem, passwords that are not hidden in the command lines, see lp:907280, "innobackupex script shows the password in the ps output, when its passed as a command line argument". My colleague Hartmut even suggested the fix recently, see pull request #289.

Because of these old, known problems (some of them being low hanging fruits) that are not fixed users still suffer while using xtrabackup way more often than they would like to. One day, as a result, they'll have to switch to some other online backup tools or approaches. One may dream about extended myrocks_hotbackup to cover InnoDB one day (when MyRocks and InnoDB will work together in one instance), or just use Percona TokuBackup (after adding script to go SST for Galera with it, maybe), or try something totally different. Anyway, I feel that if more bugs (including low hanging fruits) in xtrabackup are not getting fixed and pull requests are not actively accepted, the tool may become much less relevant and used soon.

I had to deal with MaxScale-related issues this week, so I'd like to remind those who use Docker for testing about Personally I prefer to build from source. In any case, I'd like us all to remember that in older versions one may have to set strip_db_esc option explicitly for service to deal with database names containing underscore (_). Recent 2.0.x versions have it enabled by default (see MXS-801).

I also had to explain how online ALTER TABLE works, specifically, when it sets exclusive metadata locks in the process. I still do not see this topic properly explained in the manual, so I had to report Bug #84004, "Manual misses details on MDL locks set and released for online ALTER TABLE".

By no means I am a developer for 11 years already, even less one should expect writing Java code from me. Anyway, I had to explain how to replace Oracle's ref_cursors (a.k.a cursor variables) in MySQL, both in stored procedures and in Java code that calls them. If you are wondering what is this about, check this fine manual. Note that this feature is missing in MySQL, even though it was suggested to implement it here. In general, MySQL allows just to run SELECTs in stored procedures and then in Java you can process each of the result sets returned any way you want. Things may get more complicated when more than one result set is produced, and they are even more complicated in Oracle with nested cursor expressions. So, I plan to devote a separate blog post to this topic one day. Stay tuned.

I answer questions coming not only from customers. Old friends, community users out of nowhere and, even more, colleagues are welcomed to discuss whatever MySQL- or MariaDB-related  problem they may have. If I know how to help, I'll do this, otherwise I'll quickly explain that I am of no good use. This is how I ended up testing MariaDB's CONNECT storage engine quickly as a replacement for the Federated engine, that is, to link table to a remote MySQL table. Basic instructions on how to set it up and use MySQL type looked simple, but when I tried to test on Fedora 23 and hit a problem of missing

MariaDB [(none)]> INSTALL SONAME 'ha_connect';
ERROR 1126 (HY000): Can't open shared library '/home/openxs/dbs/mariadb10.1/lib/plugin/'
  (errno: 2, cannot open shared object file: No such file or directory)
the solution was not really straightforward. First of all I had to install unixODBC.x86_64 2.3.4-1.fc23 RPM, but it also does not provide

[openxs@fc23 node2]$ find / -name libodbc.* 2>/dev/null
So, I had to apply a quick and dirty hack:

[openxs@fc23 node2]$ sudo ln -s /usr/lib64/  /usr/lib64/
As a result CONNECT engine worked as expected, as long as proper account and IP-address where used:
MariaDB [test]> INSTALL SONAME 'ha_connect';
Query OK, 0 rows affected (0.27 sec)

MariaDB [test]> create table r_t2(id int primary key, c1 int) engine=connect table_type=mysql connection='mysql://msandbox:msandbox@';
Query OK, 0 rows affected (0.04 sec)

MariaDB [test]> select * from r_t2;                     
| id | c1   |
|  1 |    2 |
|  2 |    3 |
2 rows in set (0.00 sec)
From configuring MaxScale to work with database having underscore in the name to re-writing Java code that used to work with Oracle RDBMS for MySQL, with many missing details in the manuals and software bugs identified or reported in between, and all that with ongoing studies of performance problems and lost quorums, rolling upgrades and failed SSTs in Galera clusters - this is what support engineers here in MariaDB have to deal with during a typical working week.