Showing posts with label hash join. Show all posts
Showing posts with label hash join. Show all posts

Sunday, June 28, 2020

Fun with Bugs #100 - On MySQL Bug Reports I am Subscribed to, Part XXXIV

I delayed this post #100 in the "Fun with Bugs" series for few weeks - the previous one was published 4 weeks ago. The idea was to make it the last one, and for this I needed something to celebrate. Two days ago proper event happened, we have MySQL Bug #100000 reported! Here it is:
  • Bug #100000 - "Provide an index hint that only affects the choice of index for NL join". This nice feature request was added by former optimizer developer in MySQL, Øystein Grøvlen. Hundreds of other feature requests are waiting for the attention both from the MySQL Verification Team and from developers, so good to see a feature request getting the number that nobody ever forget!
Actually Øystein Grøvlen created several interesting bug report during that day:
  • Bug #99994 - "Index range scan is chosen where table scan takes 40% less time". Clear and simple bug report that relies on the world sample database.
  • Bug #99995 - "Histogram is not used for filtering estimate when index is disabled".
  • Bug #99996 - "Prefer histogram over index statistics when eq_range_index_dive_limit is exceeded". This was verified as a feature request.
  • Bug #99997 - "Range estimates are usually off by a factor of 2 for large ranges". It was declared a duplicate of older bug report I am also subscribed to, Bug #73386 - "For ranges, innodb doubles estimates, or caps estimates to half the table". See also this MariaDB bug report, MDEV-19424 - "InnoDB's records_in_range estimates are capped at about 50%", and links from it for a lot of related discussions. Let's wait and see what vendor resolves this faster...
    The other report, Bug #99998 - "For large ranges, the range estimate will never exceed 50%", is probably also a duplicate of the same old bug.
  • Bug #99999 - "EXPLAIN FORMAT=TREE does not show cost/rows for semijoin materialization". Yet another nice and clear bug report.
So, hardly anyone else had a chance to get that #100000 filed. As far as I can see, all these were reported during a very short period of 2 minutes, from "26 Jun 7:57" till "26 Jun 7:58"! Not sure how to do this without some automation or at least all the details ready for quick copy/pasting!

Now back to some older bugs I've subscribed to over last 4 weeks:
  • Bug #99791 - "MySQL 8 orphaned table due to unchecked non-existent row format check." As reported by Marc Reilly, tables created in versions < MySQL 8 which use row_format COMPRESSED or REDUNDANT, where row_format is not set explicitly in the Table DDL allow users to create un-prefixed indexes on fields which exceed the maximum column size of 767 bytes. Upgrading to MySQL 8 do nothing with these tables, but as soon as new index is added and reboot happens, such a table becomes inaccessible. What a surprise!
  • Bug #99794 - "MySQL 57 client is inefficient at bulkloads/binlog replay". In this bug report  Marc Reilly basically asks to back port the fix from MySQL 8.0.13.
  • Bug #99800 - "ps_truncate_all_tables() does not work in super_read_only mode". This regression bug was reported by Lalit Choudhary.
  • Bug #99805 - "mysql async client is incomplete". There is no way to determine file descriptor state (should it block on read or write), so it is impossible to use it in asynchronous contexts without busy looping. This bug report by Domas Mituzas was used in one discussion as an argument that MySQL bugs database still gets proper attention from MySQL engineers. It's truly so.
  • Bug #99892 - "initialize with innodb_page_size=4096 gets "Specified key was too long" errors". This is a regression vs 5.7 (without a tag). As Mark Callaghan found out, one can not initialize MySQL 8 instance without errors with such a small innodb_page_size.
  • Bug #99924 - "The record per key value from InnoDB is not suitable when n_diff is zero". As reported by Ze Yang, due to lack of locking when server reads the innodb_rec_per_key, the n_diff value may be 0 (not set) while the table->stat_n_rows is > 0. As a result (see great comment by Øystein Grøvlen), if a table object is opened during the recalculation of statistics, the rec_per_key for a column/index may be quite misleading. It will be interpreted as all rows have the same value, and the index will probably not be chosen for any non-covering scans. There is a patch suggested (to set rec_per_key to 1 or 10 in such case), as well as other suggestion to set the value REC_PER_KEY_UNKNOWN. Useful reading!
  • Bug #99933  - "In-memory hash join will only use two-thirds of join buffer". Yet another bug report related to hash joins from Øystein Grøvlen, with a fix suggested. See also his Bug #99934 - "Hash join adds columns to the hash table that is not needed." There is a lot of work ahead to improve the implementation of this new feature in MySQL 8.
  • Bug #99935 - "innodb_doublewrite_files is not correct when innodb_buffer_pool_size > 1G". Just 2 files are created instead of 16 according to the manual. This bug was reported by Satya Bodapati.
  • Bug #99943 - "Hash join does not work for Semijoin and Antijoin". This bug report from Tibor Korocz was "Verified", but later comments suggest that it's more like wrong expectations/interpretation of cases when the feature has to be used (it is supposed to be used instead of BNL, but not instead of semijoin materialization and subquery materialization). Let's wait and see how it ends up...
  • Bug #99966 - "Switching to use NUMA-SMART Counter for statistical counters". Great bug report from Krunal Bauskar, with a patch suggested. I hope to get a NUMA system one day myself to understand the challenges and performance problems there better.  
So, that's it, my very last post in the "Fun with Bugs" series that started more than 7 years ago. The series where I listed most of the interesting bug reports I keep an eye on, since Bug #2. It was a long way with a lot of fun and a lot of (rarely appreciated) work in the process, but now my watch has ended. I am not going to try to micro manage MySQL bugs processing any more and finally let the MySQL Verification Team do their job without my regular attention. Good luck!



Percona had recently started to blog about bugs, so I am sure they will keep an eye and share lists of important bugs on a regular basis. They should really care more than I do these days.

Saturday, May 30, 2020

Fun with Bugs #99 - On MySQL Bug Reports I am Subscribed to, Part XXXIII

In my previous post in this series I've commented on some interesting MySQL bug reports that were added during the second half of April. Time to move on to bugs reported in May, 2020, as we are quickly approaching MySQL Bug #100000 soon and I want to write a separate post for this great achievement :)

Here is the list:
  • Bug #99432 - "Improving memory barrier during rseg allocation". Nice contribution by my former colleague in Percona, Krunal Bauskar, who now works on making MySQL better for ARM processors. According to his findings, the use of a relaxed memory model improves performance on ARM by up to 2%. See also yet another bug report with a contribution that matters for ARM, Bug #99556 - "Avoid sequentially consistent atomics for atomic counters" (contributed by Sergey Glushchenko from Percona).
  • Bug #99444 - "New HASH JOIN order problem". One should not expect and rely on any specific order unless explicit ORDER BY is used, so formally this report by Gabor Aron is "Not a Bug". I put it into this list as several other community members helped him a lot in understanding why results with HASH_JOIN optimization in newer versions are still valid and what are the ways to get the results with the desired ordering efficiently. Guilhem Bichot, for instance, suggested two different ways, using window function and lateral table. Useful reading in any case!
  • Bug #99458 - "i_s_fts_index_cache_fill_one_index() is not protect by the lock". Looks like even crashes are possible as a result, based on comments. Nice finding by Haiqing Sun.
  • Bug #99459 - "SQL run with GROUP_MIN_MAX may infinite loop and never return". After some discussion around the validity and severity of bug reports where test case involved adding DEBUG_SYNC() to show the problem in a predictable way, this great bug report by Ze Yang was verified. All MySQL GA versions are affected, including 8.0.20! As a side note, I'd prefer NOT to read such discussions any more. They are wasting time of all parties involved.
  • Bug #99499 - "Incorrect result when constant equailty expression is used in LEFT JOIN condition". This bug that affects MySQL 5.7.x only (it was fixed in MySQL 8.0.17+ and in 5.6 code was different) was reported by Marcos Albe from Percona. 
  • Bug #99504 - "Generated column incorrect on INSERT when based on column w/ expression DEFAULT". Several problems are highlighted in the complex enough test case submitted by Brad Lanier.
  • Bug #99582 - "Reduce logging of new doublewrite buffer initialization which is confusing". 180 lines or so are added when --log-error-verbosity is set to 3. As a workaround one can add:
    log-error-suppression-list="MY-011950"
    to the [mysqld] section of the .cnf file. This problem was reported by Simon Mudd. Make sure to read all comments.
  • Bug #99591 - "Option --tc-heuristic-recover documentation wrong, missing details". In reality it does not work with more than one XA-capable engine installed. I wish fine manual documents the reality, not the good intentions of the past. This documentation request was added by Sami Ahlroos.
  • Bug #99593 - "Performance issues in 8.0.20". It seems to be yet another TempTable engine problem that caused regression comparing to MySQL 5.7. At least this:

    • SET GLOBAL internal_tmp_mem_storage_engine=MEMORY;

    is a workaround. The bug (a duplicate of internal Bug #30562964) was reported by billy noah and is fixed in upcoming MySQL 8.0.21.
  • Bug #99601 - "Broken Performance using EXIST function, increasing execution time each loop". This regression bug (without tag, but who cares...) in MySQL 8.0 was reported by Ronny Görner and minimal test case demonstrating that the problem is actually with function call was contributed by Shane Bester.
  • Bug #99643 - "innobase_commit_by_xid/innobase_rollback_by_xid is not thread safe". This bug was reported by Zhai Weixiang, who had also suggested the fix in the code.
  • Bug #99717 - "Performance regression of parallel count". Great bug report with code analysis and ready to use MTR test case from Ze Yang. Sunny Bains already confirmed that the problematic code is going to be removed.

To summarize:
  1. I am happy to see Oracle engineers explaining community bug reporters the reasons and possible solutions for the problems they hit that are not actually caused by any bug in MySQL. I tried to do this as well, whenever possible, while working on MySQL bugs...
  2. We can still find speculations that if the bug is repeatable only by adding DEBUG_SYNC() or similar debug lines, then it can not be verified or gets lower severity... IMHO this is nonsense, as there are many high severity verified real bug reports where this method is used to demonstrate the problem clearly. Just stop it!



Saturday, February 22, 2020

Fun with Bugs #94 - On MySQL Bug Reports I am Subscribed to, Part XXVIII

I may get a chance to speak about proper bugs processing for open source projects later this year, so I have to keep reviewing recent MySQL bugs to be ready for that. In my previous post in this series I listed some interesting MySQL bug reports created in December, 2019. Time to move on to January, 2020! Belated Happy New Year of cool MySQL Bugs!

As usual I mostly care about InnoDB, replication and optimizer bugs and explicitly mention bug reporter by name and give link to his other active reports (if any). I also pick up examples of proper (or improper) reporter and Oracle engineers attitudes. Here is the list:
  • Bug #98103 - "unexpected behavior while logging an aborted query in the slow query log".  Query that was killed while waiting for the table metadata lock is not only get logged, but also lock wait time is saved as query execution time. I'd like to highlight how bug reporter, Pranay Motupalli, used gdb to study what really happens in the code in this case. Perfect bug report!
  • Bug #98113 - "Crash possible when load & unload a connection handler". The (quite obvious) bug was verified based on code review, but only after some efforts were spent by Oracle engineer on denial to accept the problem and its importance. This bug was reported by Fangxin Flou.
  • Bug #98132 - "Analyze table leads to empty statistics during online rebuild DDL ". Nice addition to my collections! This bug with a nice and clear test case was reported by Albert Hu, who also suggested a fix.
  • Bug #98139 - "Committing a XA transaction causes a wrong sequence of events in binlog". This bug reported by Dehao Wang was verified as a "documentation" one, but I doubt documenting current behavior properly is an acceptable fix. Bug reporter suggested to commit in the binary log first, for example. Current implementation that allows users to commit/rollback a XA transaction by using another connection if the former connection is closed or killed, is risky. A lot of arguing happened in comments in the process, and my comment asking for a clear quote from the manual:
    Would you be so kind to share some text from this page you mentioned:

    https://dev.mysql.com/doc/refman/8.0/en/xa.html

    or any other fine MySQL 8 manual page stating that XA COMMIT is NOT supported when executed from session/connection/thread other than those prepared the XA transaction? I am doing something wrong probably, but I can not find such text anywhere.
    was hidden. Let's see what happens to this bug report next.
  • Bug #98211 - "Auto increment value didn't reset correctly.". Not sure what this bug reported by Zhao Jianwei has to do with "Data Types", IMHO it's more about DDL or data dictionary. Again, some sarcastic comments from Community users were needed to put work on this bug back on track...
  • Bug #98220 - "with log_slow_extra=on Errno: info not getting updated correctly for error". This bug was reported by lalit Choudhary from Percona.
  • Bug #98227 - "innodb_stats_method='nulls_ignored' and persistent stats get wrong cardinalities". I think category is wrong for this bug. It's a but in InnoDB's persistent statistics implementation, one of many. The bug was reported by Agustín G from Percona.
  • Bug #98231 - "show index from a partition table gets a wrong cardinality value". Yet another by report by Albert Hu. that ended up as a "documentation" bug for now, even though older MySQL versions provided better cardinality estimations than MySQL 8.0 in this case (so this is a regression of a kind). I hope the bug will be re-classified and properly processed later.
  • Bug #98238 - "I_S.KEY_COLUMN_USAGE is very slow". I am surprised to see such a bug in MySQL 8. According to the bug reporter, Manuel Mausz, this is also a kind of regression comparing to older MySQL version, where these queries used to run faster. Surely, no "regression" tag in this case was added.
  • Bug #98284 - "Low sysbench score in the case of a large number of connections". This notable performance regression of MySQL 8 vs 5.7 was reported by zanye zjy. perf profiling pointed out towards ppoll() where a lot of time is spent. There is a fix suggested by Fangxin Flou (to use poll() instead), but the bug is still "Open".
  • Bug #98287 - "Explanation of hash joins is inconsistent across EXPLAIN formats". This bug was reported by Saverio M and ended up marked as a duplicate of Bug #97299 fixed in upcoming 8.0.20. Use EXPLAIN FORMAT=TREE in the meantime to see proper information about hash joins usage in the plan.
  • Bug #98288 - "xa commit crash lead mysql replication error". This bug report from Phoenix Zhang (who also suggested a patch) was declared a duplicate of Bug #76233 - "XA prepare is logged ahead of engine prepare" (that I've already discussed among other XA transactions bugs here).
  • Bug #98324 - "Deadlocks more frequent since version 5.7.26". Nice regression bug report by Przemyslaw Malkowski from Percona, with additional test provided later by Stephen Wei . Interestingly enough, test results shared by Umesh Shastry show that MySQL 8.0.19 is affected in the same way as 5.7.26+, but 8.0.19 is NOT listed as one of versions affected. This is a mistake to fix, along with missing regression tag.
  • Bug #98427 - "InnoDB FullText AUX Tables are broken in 8.0". Yet another regression in MySQL 8 was found by Satya Bodapati. Change in default collation for utf8mb4 character set caused this it seems. InnoDB FULLTEXT search was far from perfect anyway...
The are clouds in the sky of MySQL bugs processing.
To summarize:
  1.  Still too much time and efforts are sometimes spent on arguing with bug reporter instead of accepting and processing bugs properly. This is unfortunate.
  2. Sometimes bugs are wrongly classified when verified (documentation vs code bug, wrong category, wrong severity, not all affected versions are listed, ignoring regression etc). This is also unfortunate.
  3. Percona engineers still help to make MySQL better.
  4. There are some fixes in upcoming MySQL 8.0.20 that I am waiting for :)
  5. XA transactions in MySQL are badly broken (they are not atomic in storage engine + binary log) and hardly safe to use in reality.