Saturday, December 31, 2016

Fun with Bugs #46 - On Some Bugs I've Reported During the Year of 2016

It's time to summarize the year of 2016. As a kind of a weird summary, in this post I'd like to share a list of MySQL bug reports I've created in 2016 that are still remaining "Verified" today:

  • Bug #79831 - "Unexpected error message on crash-safe slave with max_relay_log_size set". According to Umesh this is not repeatable with 5.7. The fact that I've reported the bug on January 4 probably means I was working at that time. I should not repeat this mistake again next year.
  • Bug #80067 - "Index on BIT column is NOT used when column name only is used in WHERE clause". People say the same problem happens with INT and, what may be even less expected, BOOLEAN columns.
  • Bug #80424 - "EXPLAIN output depends on binlog_format setting". Who could expect that?
  • Bug #80619 - "Allow slave to filter replication events based on GTID". In this feature request I've suggested to implement filtering by GTID pattern, so that we can skip all events originating from specific master on some slave in a complex replication chain.
  • Bug #82127 - "Deadlock with 3 concurrent DELETEs by UNIQUE key". It's clear that manual is not even close to explaining how the locks are really set "by design" in this weird case. See comments in MDEV-10962 for some explanations. Nobody from Oracle event tried to really explain how things are designed to work.
  • Bug #82212 - "mysqlbinlog can produce events larger than max_allowed_packet for mysql". This happens for encoded row-based events. There should be some way to take this overhead into account while creating binary log, IMHO.
  • Bug #83024 - "Internals manual does not explain COM_SLEEP in details". One day you'll see Sleep for some 17 seconds logged into the slow query log, and may start to wonder why...
  • Bug #83248 - "Partition pruning is not working with LEFT JOIN". You may find some interesting related ideas in MDEV-10946.
  • Bug #83640 - "Locks set by DELETE statement on already deleted record". This case shows that design of locking in InnoDB does produce really weird outcomes sometimes. This is not about "missing manual", this is about extra lock set that is absolutely NOT needed (a gap X lock on a record in the secondary unique index is set when the same transaction transaction already has the next key lock on it). As a side note, I keep finding, explaining and reporting weird or undocumented details in InnoDB locking for years, still my talk about InnoDB locks was not accepted by Oracle once again for OOW in 2016. What do I know about the subject and who even cares about those locks... 
  • Bug #83708 - "uint expression is used for the value that is passed as my_off_t for DDL log". I was really shocked by this finding. I assumed that all uint vs unsigned long long improper casts are already found. It seems I was mistaking.
  • Bug #83912 - "Time spent sleeping before entering InnoDB is not measured/reported separately". The use case that led me to reporting this bug is way more interesting than the fact that some wait is not instrumented in performance_schema. You may see more related bug reports from me next year.
  • Bug #83950 - "LOAD DATA INFILE fails with an escape character followed by a multi-byte one". This single bug (and related bugs and stories) were original topic for issue #46 of my "Fun With Bugs" series. I was not able to write everything I want properly over last 3 weeks, but trust me: it's a great story, of "Let's Make America Great Again" style. With the goal for LOAD DATA to behave exactly as INSERT when wrong utf8 data are inserted, Oracle changed the way LOAD DATA works back and forth, with the last change (back) happened in 5.7.17:
     "Incompatible Change: A change made in MySQL 5.7.8 for handling of multibyte character sets by LOAD DATA was reverted due to the replication incompatibility (Bug #24487120, Bug #82641)"
    I just can not keep up with all the related fun people have in replication environments thanks to these ongoing changes... It's incredible.
  • Bug #84004 - "Manual misses details on MDL locks set and released for online ALTER TABLE". Nothing new: locks in MySQL are not properly/completely documented, metadata locks included. yes, they are documented better now, after 11+ years of my continuous efforts (of a kind), but we are "not there yet". I am still waiting for a job offer to join MySQL Documentation Team, by the way :)
  • Bug #84173 - "mysqld_safe --no-defaults & silently does NOT work any more". Recent MySQL 5.7.17 release had not only given us new Group Replication plugin and introduced incompatible changes. In a hope to fix security issues it comes with pure regression - for the first time in last 11 years mysqld_safe --no-defaults stopped working for me! By the way, mysqld_safe is still NOT safe in a sense that 5.7.17 tried to enforce, and one day (really soon) you will find out why.
  • Bug #84185 - "Not all "Statements writing to a table with an auto-increment..." are unsafe". If you do something like DELETE FROM `table` WHERE some_col IN (SELECT some_id FROM `other_table`) where `table` has auto_increment column, why should anyone care about it? We do not generate the value, we delete rows...
    This bug report was actually created by Hartmut Holzgraefe and test case comes from Elena Stepanova (see MDEV-10170). I want to take this opportunity to thank them and other colleagues from MariaDB for their hard work and cooperation during the year of 2016. Thanks to Umesh (who processed most of my bug reports),  Sinisa Milivojevic and Miguel Solorzano for their verifications of my bug reports this year.

In conclusion I should say that, no matter how pointless you may consider this activity, I still suggest you to report each and every problem that you have with MySQL and can not understand after reading the manual, as a public MySQL bug. Now, re-read my 4 years old post on this topic and have a Happy and Fruitful New Year 2017!

3 comments:

  1. Wondering if in your travels you've come across a bug like the following:

    Percona-Server 5.6.21, yesterday `show create table [users]` shows column name flipping between `sign_up_method` and `word_salt`. Replication is not disrupted, but the replica column name does not change. Also the .frm timestamp does not change. There is a column named `password_salt` in the table as well. The activity stopped about an hour after it started with no other logs yielding any clues.

    It smells like a bug to me, something corrupt in the table cache memory. Searching the releases done since .21 and google I haven't found any reports that look similar (but lots of buffer overflows). I'm hoping to find a way to recreate it or find a related bug report.

    The obvious choice is to upgrade the minor version, but to recommend that I want to be able to provide some assurance this will truly fix the issue.

    ReplyDelete
    Replies
    1. Hi TPol,

      Sounds like a bug, indeed. Can you, please, check if there is anything suspicious in the error log of the Percona server where this happened? Can you, please, share the output of SHOW CREATE TABLE?

      Delete
    2. Sorry for the delay in getting back to you, the notify apparently didn't work.

      Unfortunately there wasn't anything valuable in the mysql error logs or the system logs. We ended up concluding that it was a bug and identified the most likely candidate as being:

      InnoDB: A FLUSH TABLES operation raised an assertion. (Bug #19803418)

      This particular issue was fixed in the 5.6.23 build of MySQL.

      The issue with the table structure started right around the time of an EBS snapshot, and resolved around 1 hour later during a subsequent EBS snapshot. One of the first steps of that process (after confirming there are no long running queries) is to perform a `flush tables with read lock` to ensure the volume is in a consistent state for non-transactional tables and binary logs. We didn't see an assertion error, but this did seem close enough to the same functionality to be related.

      We recommended the minor version upgrade, and advised that should the issue arise again prior to that to run the flush with read lock to see if the issue resolves.

      Delete