Blog of (former?) MySQL Entomologist: March 2017

Sunday, March 26, 2017

Fun with Bugs #51 - My Bug Reports that Oracle doesn't Want to Fix

This week I noticed (yet another) customer issue related to the output produced by mysqladmin debug command (or when mysqld process gets SIGHUP signal). I mean the output generated by the mysql_print_status() function. In this issue the content of the output was misinterpreted. I've seen this in the past more than once, and requested to document the output properly, but it never happened for a reason that there is an internal feature request to put this information elsewhere, in Performance Schema or Information Schema. The bug ended up with "Won't fix" status.

Surely I complained in a comment and on Facebook, and then decided to check if there are any other my bug reports and documentation request that Oracle explicitly decided not to fix after accepting the fact that there is a problem.

I've ended up with the following short list:

Bug #69399 - "Inconsistency in crash report". Here I've got a perfect reason to keep things as they are currently implemented. Functions called from signal handlers must be async signal safe, and time() is like that, but it always outputs in UTC. It would be great to print time in UTC in some messages as well, so that timezone difference is 100% clear, but it's truly not a big deal.
Bug #71300 - "Manual does not explain that statement/abstract/* instruments appeared in 5.6.15". This change in naming that happened in 5.6.15 is, indeed, explained in the manual, even if there is no highlighted statements about incompatible change etc. I can live with that.
Bug #71303 - "Manual page on P_S build configuration does not provide enough details". I really missed the details at that time on how to instrument individual buffer's mutexes/rwlocks, after getting a hint during my talk that I had no chance to see real most important waits in Bug #68079 with Performance Schema without recompiling it properly. I've got a useful comment at the end of the bug report, but I truly do not understand why this detail ("The only way to enable it is by removing the line which defines PFS_SKIP_BUFFER_MUTEX_RWLOCK in storage/innobase/include/sync0sync.h. Seems to be no compiler flags to enable or disable the above mentioned symbol.") was not added to the small enough page as a note.
Bug #71304 - "Manual does not provide enough details about automatic sizing of P_S parameters ". Here my suggestions were refused. Go figure yourself, check the output of mysqld --verbose --help 2>/dev/null | grep performance | grep "\-1" to find out what parameters are auto-sized and go read the code to find out how exactly, if you care. They don't.
Bug #71274 - "Manual does not provide enough details about background threads in P_S.threads". All I've got in reply is: "The purpose of the page is to describe the table structure, not enumerate the (subject to change) set of background threads." You may be satisfied with this remark, but I am not.
Bug #71346 - "Manual does not provide details on mysqladmin debug output". As you can check here, even for MySQL 8 the command is still there, but all we have about the output is: "Tell the server to write debug information to the error log. Format and content of this information is subject to change. This includes information about the Event Scheduler."
Bug #75366 - "mysql_install_db requires explicit --lc-messages-dir if non-default PREFIX used". This was reported at early MySQL 5.7.x development stage, and I've got a recommendation to use mysqld --initialize instead. I do so now, but sometimes some related problem still happen, see Bug #84173 and Bug #80351. I think that even deprecated commands must be properly documented, including any incompatible changes in behavior, options, binaries location etc, until they still exist in GA and supported versions of MySQL.
Bug #78822 - "Materialized table from semijoin may change join order and lead to bad plan". In MySQL 5.7 the problem must be fixed, and for 5.6 the following obvious workaround was suggested: set optimizer_switch="semijoin=off";
Bug #80601 - "Manual is wrong/not clear while explaining kill flag check for ALTER TABLE". Even though it was stated that "this is an implementation detail subject to change", some clarifications happened in the manual.

To summarize, out of 310 bug reports I've created since 2005, Oracle decided not to fix just 9, and in many cases provided proper explanations about the reasons to do this, or made some changes in the manual. The remaining cases all are related to MySQL manual and mostly happened in 2014, when nice people found a way to shut me up (temporary) on the topic of MySQL bugs...

Wednesday, March 22, 2017

Fun with Bugs #50 - On Bugs Tagged as "missing manual"

Back in January 2014, some time after many nice people kindly asked me to ~~shut up~~ stop writing about MySQL bugs on Facebook several times per day, I decided to start reading the fine MySQL Manual more carefully than before and report not only typos there, but also any topic or detail not properly explained. Usually these reports, tagged as "missing manual", were the result of careful study of the documentation based on real user question or customer issue. So, most of these reports came from real life, and missing information badly affected poor MySQL users.

Today, for this issue #50 in my series of posts about MySQL bugs, I decided to list and summarize 20 currently active (of 66 total) bugs (mostly documentation requests) tagged as "missing manual", starting from the oldest:

Bug #71293 - "Manual page for P_S.FILE_INSTANCES table does not explain EVENT_NAME values". Performance Schema was one of my favorite topics back then, as I was working on my second talk and presentation about it. No single comment since the bug was verified by Umesh Shastry.

Bug #71294 - "Manual page for P_S.FILE_INSTANCES table does not explain '~' in FILE_NAME". The bug was re-classified as server one, but still no further activity since then. Go figure what does this output may mean:

mysql> select * from performance_schema.file_instances where event_name like '%parse%';
+-----------------------------+------------------------------+------------+
| FILE_NAME                   | EVENT_NAME                   | OPEN_COUNT |
+-----------------------------+------------------------------+------------+
| /var/lib/mysql/test/ti.TRG  | wait/io/file/sql/file_parser |          0 |
| /var/lib/mysql/test/v2.frm~ | wait/io/file/sql/file_parser |          0 |
+-----------------------------+------------------------------+------------+
2 rows in set (0,00 sec)

Bug #71521 - "Manual does not list all valid values for innodb_flush_method". Actually, it seems the manual now lists them all, but the bug was not properly closed.
Bug #71732 - "Garbage value in output when MASTER_LOG_FILE='' is set". The bug was re-classified as Replication one, but I doubt that current state is documented in details.
Bug #71808 - "Manual does not explain what TICK timer is and why it's different on Windows". Still waiting for something... Had not checked if anything was documented, but TICK timer still exists in 5.7.17.
Bug #72368 - "Empty/zero statistics for imported tablespace until explicit ANALYZE TABLE". This is the InnoDB bug, and it seems there was some work performed on it internally, but the only information in the manual about the need to run ANALYZE is in user comment dated October, 2014. Had I already informed you that I hate persistent InnoDB statistics, the way it is implemented, for many reasons (including this bug)? Now you know. Statistics must be stored, engine-independent and re-estimated only upon explcit DBA request, if you ask me...
Bug #73299 - "DEFAULT value for PRIMARY KEY column depends on the way to declare it PRIMARY". It's probably a server bug, but maybe, until it is fixed, manual should explain current server's behavior in some note?
Bug #73305 - "Manual does not explain all cases when SHOW VIEW privilege is needed". SHOW VIEW privilege may be needed to run EXPLAIN against query referring the view. See Bug #73306 also ("Manual does not explain what privileges are needed for EXPLAIN explainable_stmt"). I still remember user's confusion that led to these report...
Bug #73413 - "Manual does not explain MTS implementation in details". Try to find out in the manual what threads are created for multi-threaded slave, what are their typical statuses, does replication event format (ROW vs STATEMENT) matter for MTS or not...
Bug #76563 - "Manual does not explain when exactly AUTO-INC lock is set for "bulk inserts"". There are reasons to think that when target table is different from the source one, AUTO-INC lock is set on the target table after reading the first row from the source one. Check my old blog post for more details. This is the first still "Verified" bug in this list that is explicitly devoted to InnoDB locking. You'll see several more below.
Bug #77390 - "Manual does not explain a "deadlock" case of online ALTER". Trust me, online ALTER sets metadata lock at early stage, but it is not exclusive. Check some of my posts about MDL and this documentation request: Bug #84004 - "Manual misses details on MDL locks set and released for online ALTER TABLE".
Bug #79665 - "Manual does not explain locks set by INSERT ... ON DUPLICATE KEY UPDATE properly". It wouldbe great to see the manual describing all the locks set by INSERT ... ON DUPLICATE KEY UPDATE carefully and properly, covering both the duplicate on PRIMARY key case and duplicate on secondary UNIQUE key case.
Bug #80067 - "Index on BIT column is not used when column name only is used in WHERE clause". It's a pure optimizer bug/problem, but while it is not resolved it would be nice for the manual to describe current behavior.
Bug #82127 - "Deadlock with 3 concurrent DELETEs by UNIQUE key". Manual does not explain locks set on secondary indexes properly, for too many cases, including this one. InnoDB does work as designed, and you can find some explanations (by my colleague Jan Lindström) of this design and reasons behind it in MDEV-10962. Check Bug #83640 - "Locks set by DELETE statement on already deleted record" also for the idea of how one may (mis-)interpret what really happens in similar cases. This is because InnoDB's implementation of locking is not properly explained, including implicit locks (see some details and links here), locking of secondary indexes etc. This missing information leads to all kinds of misunderstanding and speculations about "lock upgrading" etc, for decades already.
Bug #82212 - "mysqlbinlog can produce events larger than max_allowed_packet for mysql". This is a server problem, but, as I put it, please, describe "safe" setting of max_allowed_packet in case of row-based replication in the manual clearly, as well as any workarounds for the case when max_allowed_packet was 1G on the server that produced binary long with huge row based event that one needs to restore now.

Bug #83024 - "Internals manual does not explain COM_SLEEP in details". One may argue that this is truly irrelevant for most users, but it's hard to explain slow log content sometimes:

SET timestamp=1473712798;
# administrator command: Sleep;
# Time: 160912 20:39:59
# User@Host: user[host] @ [192.168.1.51]
# Thread_id: 36310042 Schema: somedb QC_hit: No
# Query_time: 17.201526 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0

without this.

Bug #85557 - "Manual does not explain locks set by UPDATE with subquery referring other table". I had to report it yesterday, as some users considers current behavior (proper, but not documented at all) a bug and complained. My dear friend Sinisa Milivojevic verified it promptly.

I understand that it's hard to keep the quality of MySQL manual, and some of the documentation requests mentioned above stay active for years just because it's really a lot of work to document things properly. Documentation team is, probably, concentrated on desribing new and shiny features of MySQL 8.0 (one day I'll start to read its manual) or InnoDB clusters/Group replication.

If the team needs somebody to help, please, get in touch with me as I may have a suggestion whom you can hire. (It's not me, I am not qualified as I am not a native speaker. I'd better report problems and missing details here and there...)

Sunday, March 19, 2017

Testing MyRocks vs InnoDB Performance Using sysbench 1.x oltp_point_select.lua

It seems MyRocks is going to become a hot topic in April 2017. Previously (here and there) I tried to compare its performance and scalability vs InnoDB from MySQL 5.7.17 using test case from famous bug #68079. It's an interesting case that took a lot of efforts from Oracle to make InnoDB scale properly, and InnoDB (on my QuadCore box at least, others reported different results on other hardware in comments) still outperformed MyRocks. But maybe it's corner case that is not a big deal in general?

Earlier this month I decided to give MyRocks another chance and try it with "industry-standard" benchmarks, like those provided by sysbench tool. At the same time, I studied the impact of adaptive hash indexing (AHI) on InnoDB (for the reason i am not yet ready to share), along the lines of this great post by Peter Zaitsev. The study is not yet complete, and I am not yet sure that it makes sense to continue doing it on my ages old QuadCore box with Fedora 25, but in the process I've got one interesting and repeatable result that I'd like to share in any case.

For that study I decided to use recent sysbench 1.1.x, so I had to build it from source to begin with. I did the following:

[openxs@fc23 git]$ git clone https://github.com/akopytov/sysbench.git

but then during ./configure run I've got a small problem:

...
checking for pkg-config... yes
checking for xxd... no
configure: error: "xxd is required to build sysbench (usually comes with the vim package)"

So, I had to install vim package:

[openxs@fc23 sysbench]$ sudo yum install vim
...
Installed:
gpm-libs.x86_64 1.20.7-9.fc24 vim-common.x86_64 2:8.0.386-1.fc25
vim-enhanced.x86_64 2:8.0.386-1.fc25 vim-filesystem.x86_64 2:8.0.386-1.fc25

Complete!

and then build and installation process (with all defaults and MariaDB software provided by Fedora present) completed without any problem, and I've ended up with nice new sysbench version:

[openxs@fc23 sysbench]$ /usr/local/bin/sysbench --version
sysbench 1.1.0-2343e4b

[openxs@fc23 sysbench]$ ls /usr/local/share/sysbench/
bulk_insert.lua oltp_point_select.lua oltp_update_non_index.lua tests
oltp_common.lua oltp_read_only.lua oltp_write_only.lua
oltp_delete.lua oltp_read_write.lua select_random_points.lua
oltp_insert.lua oltp_update_index.lua select_random_ranges.lua

As I use all default settings for both MyRocks and InnoDB, I decided to start testing with the oltp_point_select.lua simplest test and table size that does NOT fit into the default 128M of buffer pool in InnoDB case:

[openxs@fc23 sysbench]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --report-interval=1 --oltp-table-size=1000000 --max-time=0 --oltp-read-only=off --max-requests=0 --num-threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-password= prepare
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

invalid option: --oltp-table-size=1000000

Note that good old command lines copied from older sysbench versions verbatim may NOT work any more in 1.1.x. Some options changed, now the names are shorter:

[openxs@fc23 sysbench]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua help
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

oltp_point_select.lua options:
--distinct_ranges=N           Number of SELECT DISTINCT queries per transaction [1]
--sum_ranges=N                Number of SELECT SUM() queries per transaction [1]
--skip_trx[=on|off]           Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]
--secondary[=on|off]          Use a secondary index in place of the PRIMARY KEY [off]
--create_secondary[=on|off]   Create a secondary index in addition to the PRIMARY KEY [on]
--index_updates=N             Number of UPDATE index queries per transaction [1]
--range_size=N                Range size for range SELECT queries [100]
--auto_inc[=on|off]           Use AUTO_INCREMENT column as Primary Key (for MySQL), or its alternatives in other DBMS. When disabled, use client-generated IDs [on]
--delete_inserts=N            Number of DELETE/INSERT combination per transaction [1]
--tables=N                    Number of tables [1]
--mysql_storage_engine=STRING Storage engine, if MySQL is used [innodb]
--non_index_updates=N         Number of UPDATE non-index queries per transaction [1]
--table_size=N                Number of rows per table [10000]
--pgsql_variant=STRING        Use this PostgreSQL variant when running with the PostgreSQL driver. The only currently supported variant is 'redshift'. When enabled, create_secondary is automatically disabled, and delete_inserts is set to 0
--simple_ranges=N             Number of simple range SELECT queries per transaction [1]
--order_ranges=N              Number of SELECT ORDER BY queries per transaction [1]
--range_selects[=on|off]      Enable/disable all range SELECT queries [on]
--point_selects=N             Number of point SELECT queries per transaction [10]

I've ended up creating the table like this for InnoDB case:

[openxs@fc23 sysbench]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --report-interval=1 --table-size=1000000 --num-threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-password= prepare
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'

to end up with the following table:

mysql> show table status like 'sbtest1'\G
*************************** 1. row ***************************
           Name: sbtest1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 986400
Avg_row_length: 228
    Data_length: 225132544
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
Auto_increment: 1000001
    Create_time: 2017-03-02 16:18:57
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
Create_options:
        Comment:
1 row in set (0.00 sec)

For MyRocks I also had to specify storage engine explicitly:

[openxs@fc23 fb56]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --table-size=1000000 --threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-password= --mysql_storage_engine=rocksdb prepare
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...

to end up with the following table:

mysql> show table status like 'sbtest1'\G
*************************** 1. row ***************************
           Name: sbtest1
         Engine: ROCKSDB
        Version: 10
     Row_format: Fixed
           Rows: 1000000
Avg_row_length: 198
    Data_length: 198545349
Max_data_length: 0
   Index_length: 16009534
      Data_free: 0
Auto_increment: 1000001
    Create_time: NULL
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
Create_options:
        Comment:
1 row in set (0.00 sec)

Note that in case of InnoDB I've used MySQL 5.7.17 from Oracle, and MyRocks was built from this commit using my usual cmake options:

[openxs@fc23 mysql-5.6]$ git log -1
commit 01c386be8b02e6469b934c063aefdf8403844d99
Author: Herman Lee <herman@fb.com>
Date: Wed Mar 1 18:14:25 2017 -0800

[openxs@fc23 mysql-5.6]$ cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DWITH_EMBEDDED_SERVER=OFF -DENABLED_LOCAL_INFILE=1 -DENABLE_DTRACE=0 -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/fb56

I've run the tests for InnoDB with adaptive hash indexing set to ON (by default) and OFF (changed at run time), and then for MyRocks, using 1, 2, 4, 8, 16, 32 and 64 (all cases but InnoDB with AHI ON) concurrent threads, with sysbench command line like this to run the test for 60 seconds (note new options syntax of sysbench 1.x: --time, --threads etc):

[openxs@fc23 fb56]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --table-size=1000000 --time=60 --threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root run
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Initializing worker threads...

Threads started!

SQL statistics:
    queries performed:
        read:                            821511
        write:                           0
        other:                           0
        total:                           821511
    transactions:                        821511 (13691.77 per sec.)
    queries:                             821511 (13691.77 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.0003s
    total number of events:              821511

Latency (ms):
         min:                                  0.06
         avg:                                  0.07
         max:                                  1.11
         95th percentile:                      0.08
         sum:                              59537.46

Threads fairness:
    events (avg/stddev):           821511.0000/0.00
    execution time (avg/stddev):   59.5375/0.00

and then summarized the results into the following chart:

One day I'll share raw results, as a gist or somehow else, but for now let me summarize my findings as of March 3, 2017:

MyRocks really rocks with this oltp_point_select.lua --table-size=1000000 test of sysbench 1.1.0! With default settings of server variables it outperformed InnoDB from MySQL 5.7.17 at all number of threads tested, from 1 to 64, and proved good scalability on up to 64 threads on my QuadCore box. I've got more than 45K QPS starting from 4 threads.
InnoDB with disabled AHI is somewhat faster for this test than with enabled AHI, highest result was almost 44K QPS with AHI OFF on 4 threads.
It seems my QuadCore is not relevant any more for serious benchmarks, as for quite a some time people use 8 to 18 cores per socket etc and start with 200K QPS with 8 threads already.

Saturday, March 11, 2017

Fun with Bugs #49 - Applying PMP to MySQL

As you maybe noted, in several recent posts I've provided some additional details for slides used during my FOSDEM talk on profiling MySQL. The only part not covered yet is related to using Poor Man's Profiler (and pt-pmp version of it). I see no reason to explain what it does and how to use it once again, but would like to show several recent enough MySQL bug reports where this tool was essential to find, explain or demonstrate the problem.

Quick search for active bugs with "pt-pmp" in MySQL bugs database produces 8 hits at the moment:

Bug #85304 - "Reduce mutex contention of fil_system->mutex". It was reported by Zhai Weixiang few days ago, and pt-pmp output was used as a starting point for the analysis that ended up with a patch suggested.
Bug #85191 - "performance regression with HANDLER READ syntax", from the same reporter. In this report pt-pmp was used to prove the point and show what exactly threads were doing.
Bug #80979 - "Doublewrite buffer event waits should be annotated for Performance Schema", by Laurynas Biveinis. One more case when PMP shows where the time is spent by threads in some specific case, while there is no instrumentation (yet) for the related wait in Perfomance Schema.
Bug #77827 - "reduce cpu time costs of creating dummy index while change buffer is enabled", again by Zhai Weixiang. In this bug report he had used both perf to show that some notable time was spent on the operation, and pt-pmp to show the related backtraces.
Bug #73803 - "don't acquire global_sid_lock->rdlock if gtid is disabled". Once again, Zhai Weixiang used pt-pmp output as a starting point for further code analysis.I wonder why this bug is still "Open", by the way...
Bug #70237 - "the mysqladmin shutdown hangs". Guess who reported it after applying PMP when something hanged. As I stated in all my 3 recent FOSDEM talks, this is exactly what you have to do before killing and restarting MySQL server in production - get backtraces of all threads, raw or at least aggregated with pt-pmp... I am not sure why the bug was not noted in time, there are even ideas of patches shared. Time for somebody to process it formally.
Bug #69812 - "the server stalls at function row_vers_build_for_consistent_read". Same reporter, same tool used, same result - the bug report is still "Open". Looks like I know what my next post(s) in this "Fun with Bugs" series will be devoted to...
Bug #62018 - "innodb adaptive hash index mutex contention". It was reported by Mark Callaghan and PMP outputs were used as a part of the evidence. The bug is "Verified" and even got a patch suggested for 5.7.x by Percona engineers, but still had not got any proper attention from Oracle. I may have some more results related to the "cost" and "benefits" of adaptive hash indexing to share soon, so stay tuned...

Surely, there are way more bugs where PMP was used. Let me share one more that I noted while working on my recent talk on profiling (bug had not found time to put it on slides and comment on):

Bug #78277 - "InnoDB deadlock, thread stuck on kernel calls from transparent page compression", by Mark Callaghan. Again, PMP outputs were provided to prove the point and show where threads are stuck. The bug is "Open".

For many performance related cases applying pt-pmp and sharing the results becomes a de facto community requirement, as you can see, for example, in Bug #84025. Note that Umesh Shastry, who verified the bug, provided pt-pmp outputs in hist testing results. I'd suggest to have gdb and pt-pmp installed and ready to use on any production system using any version and fork of MySQL. Even if your bug will be ignored by Oracle, these outputs are useful for other community members who may hit similar cases or is not lazy to check and work on the code to provide a patch.