Sunday, March 26, 2017

Fun with Bugs #51 - My Bug Reports that Oracle doesn't Want to Fix

This week I noticed (yet another) customer issue related to the output produced by mysqladmin debug command (or when mysqld process gets SIGHUP signal). I mean the output generated by the mysql_print_status() function. In this issue the content of the output was misinterpreted. I've seen this in the past more than once, and requested to document the output properly, but it never happened for a reason that there is an internal feature request to put this information elsewhere, in Performance Schema or Information Schema. The bug ended up with "Won't fix" status.

Surely I complained in a comment and on Facebook, and then decided to check if there are any other my bug reports and documentation request that Oracle explicitly decided not to fix after accepting the fact that there is a problem.

I've ended up with the following short list:
  • Bug #69399 - "Inconsistency in crash report". Here I've got a perfect reason to keep things as they are currently implemented. Functions called from signal handlers must be async signal safe, and time() is like that, but it always outputs in UTC. It would be great to print time in UTC in some messages as well, so that timezone difference is 100% clear, but it's truly not a big deal.
  • Bug #71300 - "Manual does not explain that statement/abstract/* instruments appeared in 5.6.15". This change in naming that happened in 5.6.15 is, indeed, explained in the manual, even if there is no highlighted statements about incompatible change etc. I can live with that.
  • Bug #71303 - "Manual page on P_S build configuration does not provide enough details". I really missed the details at that time on how to instrument individual buffer's mutexes/rwlocks, after getting a hint during my talk that I had no chance to see real most important waits in Bug #68079  with Performance Schema without recompiling it properly. I've got a useful comment at the end of the bug report, but I truly do not understand why this detail ("The only way to enable it is by removing the line which defines PFS_SKIP_BUFFER_MUTEX_RWLOCK in storage/innobase/include/sync0sync.h. Seems to be no compiler flags to enable or disable the above mentioned symbol.") was not added to the small enough page as a note.
  • Bug #71304 - "Manual does not provide enough details about automatic sizing of P_S parameters ". Here my suggestions were refused. Go figure yourself, check the output of mysqld --verbose --help 2>/dev/null | grep performance | grep "\-1" to find out what parameters are auto-sized and go read the code to find out how exactly, if you care. They don't.
  • Bug #71274 - "Manual does not provide enough details about background threads in P_S.threads". All I've got in reply is: "The purpose of the page is to describe the table structure, not enumerate the (subject to change) set of background threads." You may be satisfied with this remark, but I am not.
  • Bug #71346 - "Manual does not provide details on mysqladmin debug output". As you can check here, even for MySQL 8 the command is still there, but all we have about the output is: "Tell the server to write debug information to the error log. Format and content of this information is subject to change. This includes information about the Event Scheduler."
  • Bug #75366 - "mysql_install_db requires explicit --lc-messages-dir if non-default PREFIX used". This was reported at early MySQL 5.7.x development stage, and I've got a recommendation to use mysqld --initialize instead. I do so now, but sometimes some related problem still happen, see Bug #84173 and Bug #80351. I think that even deprecated commands must be properly documented, including any incompatible changes in behavior, options, binaries location etc, until they still exist in GA and supported versions of MySQL.
  • Bug #78822 - "Materialized table from semijoin may change join order and lead to bad plan". In MySQL 5.7 the problem must be fixed, and for 5.6 the following obvious workaround was suggested: set optimizer_switch="semijoin=off";
  • Bug #80601 - "Manual is wrong/not clear while explaining kill flag check for ALTER TABLE".  Even though it was stated that "this is an implementation detail subject to change", some clarifications happened in the manual.
To summarize, out of 310 bug reports I've created since 2005, Oracle decided not to fix just 9, and in many cases provided proper explanations about the reasons to do this, or made some changes in the manual. The remaining cases all are related to MySQL manual and mostly happened in 2014, when nice people found a way to shut me up (temporary) on the topic of MySQL bugs...

Wednesday, March 22, 2017

Fun with Bugs #50 - On Bugs Tagged as "missing manual"

Back in January 2014, some time after many nice people kindly asked me to shut up stop writing about MySQL bugs on Facebook several times per day, I decided to start reading the fine MySQL Manual more carefully than before and report not only typos there, but also any topic or detail not properly explained. Usually these reports, tagged as "missing manual", were the result of careful study of the documentation based on real user question or customer issue. So, most of these reports came from real life, and missing information badly affected poor MySQL users.

Today, for this issue #50 in my series of posts about MySQL bugs, I decided to list and summarize 20 currently active (of 66 total) bugs (mostly documentation requests) tagged as "missing manual", starting from the oldest:
  • Bug #71293 - "Manual page for P_S.FILE_INSTANCES table does not explain EVENT_NAME values". Performance Schema was one of my favorite topics back then, as I was working on my second talk and presentation about it. No single comment since the bug was verified by Umesh Shastry.
  • Bug #71294 - "Manual page for P_S.FILE_INSTANCES table does not explain '~' in FILE_NAME". The bug was re-classified as server one, but still no further activity since then. Go figure what does this output may mean:

    mysql> select * from performance_schema.file_instances where event_name like '%parse%';
    +-----------------------------+------------------------------+------------+
    | FILE_NAME                   | EVENT_NAME                   | OPEN_COUNT |
    +-----------------------------+------------------------------+------------+
    | /var/lib/mysql/test/ti.TRG  | wait/io/file/sql/file_parser |          0 |
    | /var/lib/mysql/test/v2.frm~ | wait/io/file/sql/file_parser |          0 |
    +-----------------------------+------------------------------+------------+
    2 rows in set (0,00 sec)
  • Bug #71521 - "Manual does not list all valid values for innodb_flush_method". Actually, it seems the manual now lists them all, but the bug was not properly closed.
  • Bug #71732 - "Garbage value in output when MASTER_LOG_FILE='' is set". The bug was re-classified as Replication one, but I doubt that current state is documented in details.
  • Bug #71808 - "Manual does not explain what TICK timer is and why it's different on Windows". Still waiting for something... Had not checked if anything was documented, but TICK timer still exists in 5.7.17.
  • Bug #72368 - "Empty/zero statistics for imported tablespace until explicit ANALYZE TABLE". This is the InnoDB bug, and it seems there was some work performed on it internally, but the only information in the manual about the need to run ANALYZE is in user comment dated October, 2014. Had I already informed you that I hate persistent InnoDB statistics, the way it is implemented, for many reasons (including this bug)? Now you know. Statistics must be stored, engine-independent and re-estimated only upon explcit DBA request, if you ask me...
  • Bug #73299 - "DEFAULT value for PRIMARY KEY column depends on the way to declare it PRIMARY". It's probably a server bug, but maybe, until it is fixed, manual should explain current server's behavior in some note?
  • Bug #73305 - "Manual does not explain all cases when SHOW VIEW privilege is needed". SHOW VIEW privilege may be needed to run EXPLAIN against query referring the view. See Bug #73306 also ("Manual does not explain what privileges are needed for EXPLAIN explainable_stmt"). I still remember user's confusion that led to these report...
  • Bug #73413 - "Manual does not explain MTS implementation in details". Try to find out in the manual what threads are created for multi-threaded slave, what are their typical statuses, does replication event format (ROW vs STATEMENT) matter for MTS or not...
  • Bug #76563 - "Manual does not explain when exactly AUTO-INC lock is set for "bulk inserts"". There are reasons to think that when target table is different from the source one, AUTO-INC lock is set on the target table after reading the first row from the source one. Check my old blog post for more details. This is the first still "Verified" bug in this list that is explicitly devoted to InnoDB locking. You'll see several more below.
  • Bug #77390 - "Manual does not explain a "deadlock" case of online ALTER". Trust me, online ALTER sets metadata lock at early stage, but it is not exclusive. Check some of my posts about MDL and this documentation request: Bug #84004 - "Manual misses details on MDL locks set and released for online ALTER TABLE".
  • Bug #79665 - "Manual does not explain locks set by INSERT ... ON DUPLICATE KEY UPDATE properly". It wouldbe great to see  the manual describing all the locks set by INSERT ... ON DUPLICATE KEY UPDATE carefully and properly, covering both the duplicate on PRIMARY key case and duplicate on secondary UNIQUE key case.
  • Bug #80067 - "Index on BIT column is not used when column name only is used in WHERE clause". It's a pure optimizer bug/problem, but while it is not resolved it would be nice for the manual to describe current behavior.
  • Bug #82127 - "Deadlock with 3 concurrent DELETEs by UNIQUE key". Manual does not explain locks set on secondary indexes properly, for too many cases, including this one. InnoDB does work as designed, and you can find some explanations (by my colleague Jan Lindström) of this design and reasons behind it in MDEV-10962. Check Bug #83640 - "Locks set by DELETE statement on already deleted record" also for the idea of how one may (mis-)interpret what really happens in similar cases. This is because InnoDB's implementation of locking is not properly explained, including implicit locks (see some details and links here), locking of secondary indexes etc. This missing information leads to all kinds of misunderstanding and speculations about "lock upgrading" etc, for decades already.
  • Bug #82212 - "mysqlbinlog can produce events larger than max_allowed_packet for mysql". This is a server problem, but, as I put it, please, describe "safe" setting of max_allowed_packet in case of row-based replication in the manual clearly, as well as any workarounds for the case when max_allowed_packet was 1G on the server that produced binary long with huge row based event that one needs to restore now.
  • Bug #83024 - "Internals manual does not explain COM_SLEEP in details". One may argue that this is truly irrelevant for most users, but it's hard to explain slow log content sometimes:
    SET timestamp=1473712798;
    # administrator command: Sleep;
    # Time: 160912 20:39:59
    # User@Host: user[host] @ [192.168.1.51]
    # Thread_id: 36310042 Schema: somedb QC_hit: No
    # Query_time: 17.201526 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0
    
    without this.
  • Bug #85557 - "Manual does not explain locks set by UPDATE with subquery referring other table". I had to report it yesterday, as some users considers current behavior (proper, but not documented at all) a bug and complained. My dear friend Sinisa Milivojevic verified it promptly.
I understand that it's hard to keep the quality of MySQL manual, and some of the documentation requests mentioned above stay active for years just because it's really a lot of work to document things properly. Documentation team is, probably, concentrated on desribing new and shiny features of MySQL 8.0 (one day I'll start to read its manual) or InnoDB clusters/Group replication.

If the team needs somebody to help, please, get in touch with me as I may have a suggestion whom you can hire. (It's not me, I am not qualified as I am not a native speaker. I'd better report problems and missing details here and there...)

Sunday, March 19, 2017

Testing MyRocks vs InnoDB Performance Using sysbench 1.x oltp_point_select.lua

It seems MyRocks is going to become a hot topic in April 2017. Previously (here and there) I tried to compare its performance and scalability vs InnoDB from MySQL 5.7.17 using test case from famous bug #68079. It's an interesting case that took a lot of efforts from Oracle to make InnoDB scale properly, and InnoDB (on my QuadCore box at least, others reported different results on other hardware in comments) still outperformed MyRocks. But maybe it's corner case that is not a big deal in general?

Earlier this month I decided to give MyRocks another chance and try it with "industry-standard" benchmarks, like those provided by sysbench tool. At the same time, I studied the impact of adaptive hash indexing (AHI) on InnoDB (for the reason i am not yet ready to share), along the lines of this great post by Peter Zaitsev. The study is not yet complete, and I am not yet sure that it makes sense to continue doing it on my ages old QuadCore box with Fedora 25, but in the process I've got one interesting and repeatable result that I'd like to share in any case.

For that study I decided to use recent sysbench 1.1.x, so I had to build it from source to begin with. I did the following:
[openxs@fc23 git]$ git clone https://github.com/akopytov/sysbench.git
but then during ./configure run I've got a small problem:
...
checking for pkg-config... yes
checking for xxd... no
configure: error: "xxd is required to build sysbench (usually comes with the vim package)"
So, I had to install vim package:
[openxs@fc23 sysbench]$ sudo yum install vim
...
Installed:
  gpm-libs.x86_64 1.20.7-9.fc24         vim-common.x86_64 2:8.0.386-1.fc25
  vim-enhanced.x86_64 2:8.0.386-1.fc25  vim-filesystem.x86_64 2:8.0.386-1.fc25

Complete!
and then build and installation process (with all defaults and MariaDB software provided by Fedora present) completed without any problem, and I've ended up with nice new sysbench version:
[openxs@fc23 sysbench]$ /usr/local/bin/sysbench --version
sysbench 1.1.0-2343e4b

[openxs@fc23 sysbench]$ ls /usr/local/share/sysbench/
bulk_insert.lua  oltp_point_select.lua  oltp_update_non_index.lua  tests
oltp_common.lua  oltp_read_only.lua     oltp_write_only.lua
oltp_delete.lua  oltp_read_write.lua    select_random_points.lua
oltp_insert.lua  oltp_update_index.lua  select_random_ranges.lua
As I use all default settings for both MyRocks and InnoDB, I decided to start testing with the oltp_point_select.lua simplest test and table size that does NOT fit into the default 128M of buffer pool in InnoDB case:
[openxs@fc23 sysbench]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --report-interval=1 --oltp-table-size=1000000 --max-time=0 --oltp-read-only=off --max-requests=0 --num-threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-password= prepare
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

invalid option: --oltp-table-size=1000000
Note that good old command lines copied from older sysbench versions verbatim may NOT work any more in 1.1.x. Some options changed, now the names are shorter:
[openxs@fc23 sysbench]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua help
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

oltp_point_select.lua options:
  --distinct_ranges=N           Number of SELECT DISTINCT queries per transaction [1]
  --sum_ranges=N                Number of SELECT SUM() queries per transaction [1]
  --skip_trx[=on|off]           Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]
  --secondary[=on|off]          Use a secondary index in place of the PRIMARY KEY [off]
  --create_secondary[=on|off]   Create a secondary index in addition to the PRIMARY KEY [on]
  --index_updates=N             Number of UPDATE index queries per transaction [1]
  --range_size=N                Range size for range SELECT queries [100]
  --auto_inc[=on|off]           Use AUTO_INCREMENT column as Primary Key (for MySQL), or its alternatives in other DBMS. When disabled, use client-generated IDs [on]
  --delete_inserts=N            Number of DELETE/INSERT combination per transaction [1]
  --tables=N                    Number of tables [1]
  --mysql_storage_engine=STRING Storage engine, if MySQL is used [innodb]
  --non_index_updates=N         Number of UPDATE non-index queries per transaction [1]
  --table_size=N                Number of rows per table [10000]
  --pgsql_variant=STRING        Use this PostgreSQL variant when running with the PostgreSQL driver. The only currently supported variant is 'redshift'. When enabled, create_secondary is automatically disabled, and delete_inserts is set to 0
  --simple_ranges=N             Number of simple range SELECT queries per transaction [1]
  --order_ranges=N              Number of SELECT ORDER BY queries per transaction [1]
  --range_selects[=on|off]      Enable/disable all range SELECT queries [on]
  --point_selects=N             Number of point SELECT queries per transaction [10]
I've ended up creating the table like this for InnoDB case:
[openxs@fc23 sysbench]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --report-interval=1 --table-size=1000000 --num-threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-password= prepare
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'
to end up with the following table:
mysql> show table status like 'sbtest1'\G
*************************** 1. row ***************************
           Name: sbtest1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 986400
 Avg_row_length: 228
    Data_length: 225132544
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
 Auto_increment: 1000001
    Create_time: 2017-03-02 16:18:57
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)
For MyRocks I also had to specify storage engine explicitly:
[openxs@fc23 fb56]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --table-size=1000000 --threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root --mysql-password= --mysql_storage_engine=rocksdb prepare
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
to end up with the following table:
mysql> show table status like 'sbtest1'\G
*************************** 1. row ***************************
           Name: sbtest1
         Engine: ROCKSDB
        Version: 10
     Row_format: Fixed
           Rows: 1000000
 Avg_row_length: 198
    Data_length: 198545349
Max_data_length: 0
   Index_length: 16009534
      Data_free: 0
 Auto_increment: 1000001
    Create_time: NULL
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)
Note that in case of InnoDB I've used MySQL 5.7.17 from Oracle, and MyRocks was built from this commit using my usual cmake options:
[openxs@fc23 mysql-5.6]$ git log -1
commit 01c386be8b02e6469b934c063aefdf8403844d99
Author: Herman Lee <herman@fb.com>
Date:   Wed Mar 1 18:14:25 2017 -0800

[openxs@fc23 mysql-5.6]$ cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DWITH_EMBEDDED_SERVER=OFF -DENABLED_LOCAL_INFILE=1 -DENABLE_DTRACE=0 -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/fb56
I've run the tests for InnoDB with adaptive hash indexing set to ON (by default) and OFF (changed at run time), and then for MyRocks, using 1, 2, 4, 8, 16, 32 and 64 (all cases but InnoDB with AHI ON) concurrent threads, with sysbench command line like this to run the test for 60 seconds (note new options syntax of sysbench 1.x: --time, --threads etc):
[openxs@fc23 fb56]$ sysbench /usr/local/share/sysbench/oltp_point_select.lua --table-size=1000000 --time=60 --threads=1 --rand-type=uniform --db-driver=mysql --mysql-db=test --mysql-socket=/tmp/mysql.sock --mysql-user=root run
sysbench 1.1.0-2343e4b (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Initializing worker threads...

Threads started!

SQL statistics:
    queries performed:
        read:                            821511
        write:                           0
        other:                           0
        total:                           821511
    transactions:                        821511 (13691.77 per sec.)
    queries:                             821511 (13691.77 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.0003s
    total number of events:              821511

Latency (ms):
         min:                                  0.06
         avg:                                  0.07
         max:                                  1.11
         95th percentile:                      0.08
         sum:                              59537.46

Threads fairness:
    events (avg/stddev):           821511.0000/0.00
    execution time (avg/stddev):   59.5375/0.00
and then summarized the results into the following chart:


One day I'll share raw results, as a gist or somehow else, but for now let me summarize my findings as of March 3, 2017:
  1. MyRocks really rocks with this oltp_point_select.lua --table-size=1000000 test of sysbench 1.1.0! With default settings of server variables it outperformed InnoDB from MySQL 5.7.17 at all number of threads tested, from 1 to 64, and proved good scalability on up to 64 threads on my QuadCore box. I've got more than 45K QPS starting from 4 threads.
  2. InnoDB with disabled AHI is somewhat faster for this test than with enabled AHI, highest result was almost 44K QPS with AHI OFF on 4 threads.
  3. It seems my QuadCore is not relevant any more for serious benchmarks, as for quite a some time people use 8 to 18 cores per socket etc and start with 200K QPS with 8 threads already.

Saturday, March 11, 2017

Fun with Bugs #49 - Applying PMP to MySQL

As you maybe noted, in several recent posts I've provided some additional details for slides used during my FOSDEM talk on profiling MySQL. The only part not covered yet is related to using Poor Man's Profiler (and pt-pmp version of it). I see no reason to explain what it does and how to use it once again, but would like to show several recent enough MySQL bug reports where this tool was essential to find, explain or demonstrate the problem.

Quick search for active bugs with "pt-pmp" in MySQL bugs database produces 8 hits at the moment:
  •  Bug #85304 - "Reduce mutex contention of fil_system->mutex". It was reported by Zhai Weixiang few days ago, and pt-pmp output was used as a starting point for the analysis that ended up with a patch suggested.
  • Bug #85191 - "performance regression with HANDLER READ syntax", from the same reporter. In this report pt-pmp was used to prove the point and show what exactly threads were doing.
  • Bug #80979 - "Doublewrite buffer event waits should be annotated for Performance Schema", by Laurynas Biveinis. One more case when PMP shows where the time is spent by threads in some specific case, while there is no instrumentation (yet) for the related wait in Perfomance Schema.
  • Bug #77827 - "reduce cpu time costs of creating dummy index while change buffer is enabled", again by Zhai Weixiang. In this bug report he had used both perf to show that some notable time was spent on the operation, and pt-pmp to show the related backtraces.
  • Bug #73803 - "don't acquire global_sid_lock->rdlock if gtid is disabled". Once again, Zhai Weixiang used pt-pmp output as a starting point for further code analysis.I wonder why this bug is still "Open", by the way...
  • Bug #70237 - "the mysqladmin shutdown hangs". Guess who reported it after applying PMP when something hanged. As I stated in all my 3 recent FOSDEM talks, this is exactly what you have to do before killing and restarting MySQL server in production - get backtraces of all threads, raw or at least aggregated with pt-pmp... I am not sure why the bug was not noted in time, there are even ideas of patches shared. Time for somebody to process it formally.
  • Bug #69812 - "the server stalls at function row_vers_build_for_consistent_read". Same reporter, same tool used, same result - the bug report is still "Open". Looks like I know what my next post(s) in this "Fun with Bugs" series will be devoted to...
  • Bug #62018 - "innodb adaptive hash index mutex contention". It was reported by Mark Callaghan and PMP outputs were used as a part of the evidence. The bug is "Verified" and even got a patch suggested for 5.7.x by Percona engineers, but still had not got any proper attention from Oracle. I may have some more results related to the "cost" and "benefits" of adaptive hash indexing to share soon, so stay tuned...
Surely, there are way more bugs where PMP was used. Let me share one more that I noted while working on my recent talk on profiling (bug had not found time to put it on slides and comment on):

  • Bug #78277 - "InnoDB deadlock, thread stuck on kernel calls from transparent page compression", by Mark Callaghan. Again, PMP outputs were provided to prove the point and show where threads are stuck. The bug is "Open".

For many performance related cases applying pt-pmp and sharing the results becomes a de facto community requirement, as you can see, for example, in Bug #84025. Note that Umesh Shastry, who verified the bug, provided pt-pmp outputs in hist testing results. I'd suggest to have gdb and pt-pmp installed and ready to use on any production system using any version and fork of MySQL. Even if your bug will be ignored by Oracle, these outputs are useful for other community members who may hit similar cases or is not lazy to check and work on the code to provide a patch.

Saturday, February 25, 2017

MySQL Support Engineer's Chronicles, Issue #5

A lot of time passed since my previous post in this series. I was busy with work, participating in FOSDEM, blogging about profilers and sharing various lists of MySQL bugs. But I do not plan to stop writing about my usual weeks of doing support engineer's job. So, time for the next post in this series, based on my random notes taken during the week here and there.

This week started for me with checking recent MySQL bug reports (actually I do it every day). I noted recent report by Roel, Bug #85065. Unfortunately it was quickly marked as "Won't fix", and I tend to agree with Laurynas Biveinis that this was probably a wrong decision. See a discussion here. Out of memory conditions happen in production and it would be great for MySQL to process them properly, not just crash randomly. Roel does a favor to all of us by checking debug builds with additional unusual failure conditions introduced, and this work should not be ignored based on some formal approach.

It's a common knowledge already that I try to avoid not only all kind of clusters, but all kinds of connection pools as well, by all means. Sometimes I still fail, and when I find myself in unlucky situation of dealing with connection pool in Apache Tomcat, I consider this reference useful.

This week I had a useful discussion on the need for xtrabackup (version 2.4.6 was released this week) on Windows (customers ask about it once in a while and some even try to use older binaries of version 1.6.x or so from Percona) and any alternatives. I was pointed out to this blog post by Anders Karlsson. I remember reading something about using Volume Snapshot Service on Windows to backup MySQL back in Oracle in 2012, and really have to just try how it works based on the blog above and this reference. But I still think that, at least without a command line wrapper like mylvmbackup, this approach is hardly easy to follow for average Windows user (like me) and is not on pair with xtrabackup for ease of use etc.

I spent some time building new releases of Percona Server, MariaDB and MySQL 5.6 from Facebook on my Fedora 25 box (this is also one of the first things I do every morning, for software that got updates on GitHub), so just to remind myself, here is my usual cmake command line for 5.7-based builds:

cmake . -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/5.7 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CONFIG=mysql_release -DFEATURE_SET=community -DWITH_EMBEDDED_SERVER=OFF -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/home/openxs/boost
I do not have it automated, so sometimes the exact command line gets lost in the history of bash... Never happens with MySQL from Facebook or MariaDB (as we see new commits almost every day), but easily happens with MySQL 5.7 (because they push commit only after the official releases) or Percona Server (as I do not care that much about it for a year already...). This time I noted that one actually needs numactl-devel package to build Percona Server 5.7. Was not the case last time I tried.

I also paid attention to blog posts on two hot topics this week. The first one is comparing ProxySQL (here and there) to recently released MaxScale 2.1. I probably have to build and test both myself, but ProxySQL seems to rock, still, and I have no reasons NOT to trust my former colleague René Cannaò, who had provided a lot of details about his tests.

Another topic that is hot in February, 2017 is group replication vs Galera. Przemek's recent post on this topic caused quite a hot discussion here. I still consider the post as a good, useful and sincere attempt to compare these technologies and highlight problems typical experienced Galera user may expect, suspect or note in the group replication in MySQL 5.7.17. Make sure you read comments to the blog post, as they help to clarify the implementation details, differences and strong features of group replication. IMHO, this post, along with great series by LeFred and comments from community, really helps to set proper expectations and setup proper understanding of both technologies.

The last but not the least, this week Mark Callaghan had confirmed that InnoDB is still faster than MyRocks on all kinds of read-only all-in-memory workloads (he used sysbench 1.0 tests) on "small server"/older hardware. I've noted this on a specific use case of Bug #68079 almost a month ago...

This week I worked on several interesting customer issues (involving Galera, CONNECT engine, security concerns, MySQL 5.1 and the limit of 64 secondary indexes per InnoDB table etc) that are not yet completely resolved, so I expect a lot of fun and useful links noted next week. Stay tuned!

Thursday, February 16, 2017

Fun with Bugs #48 - Group Replication Bugs and Missing Features as of MySQL 5.7.17

It seems recent post on Group Replication by Vadim caused an interesting discussion on Facebook. I am NOT going to continue it here, but decided to present some facts, specifically, list of public bug reports and feature requests for Group Replication (mostly "Verified", that is, accepted by Oracle engineers as valid) as of MySQL 5.7.17 (where the feature is promoted as GA), with just few comments to some of the bugs.

The goal is to double check this post when next Oracle MySQL 5.7.x release appears, to find out how much Oracle carews to fix the problems already identified by MySQL Community.

So, here are the bugs and feature requests in MySQL Server: Group Replication category (I've included only those reported for 5.7.17 into this post), as of today:
  • Bug #84315 - "add mysql group replication bootstrap option to init script". Fair enough, we have a way to boottrap a cluster for a node in all implementations of Galera cluster. This feature request is still "Open".
  • Bug #84329 - "Some Class B private address is not permitted automatically". Manual seems to say it should be.
  • Bug #84337 - "Configure mgr with out root user encounter an error". Seems to be a duplicate of Bug #82687, that is, known for quite a some time before GA.
  • Bug #84367 - "START GROUP_REPLICATION should be possible without error". You get error if it was already started. Not a big deal, but some optional clause like IF NOT STARTED may help (in a similar way to DROP TABLE ... IF EXISTS) to code scripts. Still "Open".
  • Bug #84710 - "group replication does not respect localhost IP". may be a problem for some MySQL Sandbox setups in a hope to test group replication.
  • Bug #84727 - "GR: Partitioned Node Should Get Updated Status and not accept writes". Writes on partitioned node are accepted and hang (forever?). As a side note, I think Kenny Gryp deserves a special recognition as an early adopter of Group Replication feature who cared to report many problems noted in the process.
  • Bug #84728 - "Not Possible To Avoid MySQL From Starting When GR Cannot Start". We need this fixed to avoid split brain situations by default.
  • Bug #84729 - "Group Replication: Block Reads On Partitioned Nodes". In Galera reads are blocked by default when node is not considered a member of cluster.
  • Bug #84730 - "Feature: Troubleshoot Group Replication Transaction Rollbacks". You can get a lot of information about conflicts in case of Galera. Sorry that I have to compare, but when discussing the ways of dealing with common problems in cluster environments related to MySQL one can not ignore existing solutions (NDB clusters and Galera clusters), so I just picked up a (somewhat similar) technology that I know and used (a bit). I think readers will do the same, try to base their conclusions on known examples.
  • Bug #84731 - "Group Replication: mysql client connections hang during group replication start". There is no reason to hang more than needed. We should just make sure reads and writes are NOT accepted until the node is a member of cluster and in sync.
  • Bug #84733 - "Cannot Start GR with super_read_only=1". But this setting may be needed to make sure that there is only one master node in cluster, no matter what happens ot them...
  • Bug #84773 - "Flow control slows down throughput while a node join the cluster". Let me quote: "This is done this way by design, that is, request group to slow down while the member is fetching and applying data.
  • Bug #84774 - "Performance drop every 60 seconds". Now this sounds like a performance problem to work on, maybe by adding some tuning options.
  • Bug #84784 - "Group Replication nodes do not rejoin cluster after network connectivity issues". It would be really nice for nodes to try to re-join the cluster in case of short term connectivity issues. Galera nodes do not give up that fast. The bug is still not verified.
  • Bug #84785  - "Prevent Large Transactions in Group Replication". Galera somehow allows to limit transaction size. Not that there were no related bugs, but still options exist.
  • Bug #84792 - "Idle GR Cluster: Member CPU hog". Not yet verified, but it seems in some cases node can use a notable share of CPU time for no clear/good reason.
  • Bug #84794 - "Cannot kill query inside GR". Weel, you can do STOP GROUP_REPLICATION, but then it can be dangerous, because...
  • Bug #84795 - "STOP GROUP_REPLICATION sets super_read_only=off" - the node with stopped replication may allow to change the data...
  • Bug #84798 - "Group Replication can use some verbosity in the error log". Galera cluster nodes are too verbose, one gets kilometers of log records about everything, anything and nothing. Still, better to skip some usual outputs in the logs than get no evidence at all on what was going on...
  • Bug #84900 - "Getting inconsistent result on different nodes". Now, this is really unfortunate (for a "cluster") and somewhat similar problem was reported by Vadim before, see Bug #82481 and was supposed to be fixed. Anyway, the inconsistency is repatable and looks somewhat scary.
  • Bug #84901 - "Can't execute transaction on second node". Best practice is to write on one and only one node (see Bug #83218). An attempt to write on the other node may fail...
Now, make your own conclusions about the maturity of Group Replication in MySQL 5.7.17. I manage to avoid it so far, as I try to keep myself as far from any kinds of "clusters" as possible... Had not worked well with Galera, unfortunately - I have to read its verbose logs on a daily basis.

Wednesday, February 8, 2017

More on MyRocks Performance for Bug #68079 Case

My previous post on MyRocks was intended to provide some background details for a couple of slides for my FOSDEM talk on profiling MySQL. The results and performance demonstrated by MyRocks vs InnoDB from MySQL 5.7.17 were not really important to show how perf helps to understand where the time was spent while executing of one specific query vs the other (with the same results, but different plan), but they still caused a lot of comments from people who care, so I decided to double check and clarify few more details.

First of all, it was noted that one may get the same execution plan for both queries (even in INNER JOIN case we may end up with filesort and access only via PRIMARY keys), like this:
# bin/mysql test -e 'explain select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category\G'
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: task
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 96
ref: NULL
rows: 8092
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: incident
type: eq_ref
possible_keys: PRIMARY,incident_category
key: PRIMARY
key_len: 96
ref: test.task.sys_id
rows: 1
Extra: NULL
Indeed, I've ended up with this plan easily, sometimes, after executing ANALYZE for the tables involved. But this was not always the case (here we see for InnoDB) and next ANALYZE or OPTIMIZE may change the plan:

mysql> analyze table task;
+-----------+---------+----------+----------+
| Table     | Op      | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.task | analyze | status   | OK       |
+-----------+---------+----------+----------+
1 row in set (0.06 sec)

mysql> analyze table incident;
+---------------+---------+----------+----------+
| Table         | Op      | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| test.incident | analyze | status   | OK       |
+---------------+---------+----------+----------+
1 row in set (0.15 sec)

mysql> show table status like 'task'\G
*************************** 1. row ***************************
           Name: task
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 8452
 Avg_row_length: 60
    Data_length: 507904
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2017-01-31 12:00:38
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

mysql> show table status like 'incident'\G
*************************** 1. row ***************************
           Name: incident
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 7786
 Avg_row_length: 204
    Data_length: 1589248
Max_data_length: 0
   Index_length: 507904
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2017-01-31 12:00:35
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

mysql> explain select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category;
+----+-------------+----------+------------+--------+---------------------------+-------------------+---------+----------------------+------+----------+-------------+
| id | select_type | table    | partitions | type   | possible_keys             | key               | key_len | ref                  | rows | filtered | Extra       |
+----+-------------+----------+------------+--------+---------------------------+-------------------+---------+----------------------+------+----------+-------------+
|  1 | SIMPLE      | incident | NULL       | index  | PRIMARY,incident_category | incident_category | 123     | NULL                 | 7786 |   100.00 | Using index |
|  1 | SIMPLE      | task     | NULL       | eq_ref | PRIMARY                   | PRIMARY           | 96      | test.incident.sys_id |    1 |   100.00 | Using index |
+----+-------------+----------+------------+--------+---------------------------+-------------------+---------+----------------------+------+----------+-------------+
2 rows in set, 1 warning (0.02 sec)
Surely I can always force the plan needed, this way or that:

mysql> explain select count(*), category from task inner join incident force index(primary) on task.sys_id=incident.sys_id group by incident.category;
+----+-------------+----------+------------+--------+---------------------------+---------+---------+------------------+------+----------+----------------------------------------------+
| id | select_type | table    | partitions | type   | possible_keys             | key     | key_len | ref              | rows | filtered | Extra                                        |
+----+-------------+----------+------------+--------+---------------------------+---------+---------+------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | task     | NULL       | index  | PRIMARY                   | PRIMARY | 96      | NULL             | 8452 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | incident | NULL       | eq_ref | PRIMARY,incident_category | PRIMARY | 96      | test.task.sys_id |    1 |   100.00 | NULL                                         |
+----+-------------+----------+------------+--------+---------------------------+---------+---------+------------------+------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> explain select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category;
+----+-------------+----------+------------+--------+---------------------------+---------+---------+------------------+------+----------+----------------------------------------------+
| id | select_type | table    | partitions | type   | possible_keys             | key     | key_len | ref              | rows | filtered | Extra                                        |
+----+-------------+----------+------------+--------+---------------------------+---------+---------+------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | task     | NULL       | index  | PRIMARY                   | PRIMARY | 96      | NULL             | 8452 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | incident | NULL       | eq_ref | PRIMARY,incident_category | PRIMARY | 96      | test.task.sys_id |    1 |   100.00 | NULL                                         |
+----+-------------+----------+------------+--------+---------------------------+---------+---------+------------------+------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
I was interested more in comparing scalability and performance of these different ways to execute the same query, just because usually it's considered good to have covering secondary index used instead of creating temporary table and filesort for this kind of queries... The reality shows this is not always the case.

I've got suggestion to maybe tune something for MyRocks case or use jemalloc library. I am open to any tuning suggestions from real experts based on the fact that these are default settings in my build:

[openxs@fc23 fb56]$ bin/mysql -uroot -e"show variables like 'rocks%'"
+-------------------------------------------------+----------------------+
| Variable_name                                   | Value                |
+-------------------------------------------------+----------------------+
| rocksdb_access_hint_on_compaction_start         | 1                    |
| rocksdb_advise_random_on_open                   | ON                   |
| rocksdb_allow_concurrent_memtable_write         | OFF                  |
| rocksdb_allow_mmap_reads                        | OFF                  |
| rocksdb_allow_mmap_writes                       | OFF                  |
| rocksdb_background_sync                         | OFF                  |
| rocksdb_base_background_compactions             | 1                    |
| rocksdb_block_cache_size                        | 536870912            || rocksdb_block_restart_interval                  | 16                   |
| rocksdb_block_size                              | 4096                 || rocksdb_block_size_deviation                    | 10                   |
| rocksdb_bulk_load                               | OFF                  |
| rocksdb_bulk_load_size                          | 1000                 |
| rocksdb_bytes_per_sync                          | 0                    |
| rocksdb_cache_index_and_filter_blocks           | ON                   |
| rocksdb_checksums_pct                           | 100                  |
| rocksdb_collect_sst_properties                  | ON                   |
| rocksdb_commit_in_the_middle                    | OFF                  |
| rocksdb_compact_cf                              |                      |
| rocksdb_compaction_readahead_size               | 0                    |
| rocksdb_compaction_sequential_deletes           | 0                    |
| rocksdb_compaction_sequential_deletes_count_sd  | OFF                  |
| rocksdb_compaction_sequential_deletes_file_size | 0                    |
| rocksdb_compaction_sequential_deletes_window    | 0                    |
| rocksdb_create_checkpoint                       |                      |
| rocksdb_create_if_missing                       | ON                   |
| rocksdb_create_missing_column_families          | OFF                  |
| rocksdb_datadir                                 | ./.rocksdb           |
| rocksdb_db_write_buffer_size                    | 0                    |
| rocksdb_deadlock_detect                         | OFF                  |
| rocksdb_debug_optimizer_no_zero_cardinality     | ON                   |
| rocksdb_default_cf_options                      |                      |
| rocksdb_delete_obsolete_files_period_micros     | 21600000000          |
| rocksdb_disabledatasync                         | OFF                  |
| rocksdb_enable_2pc                              | ON                   |
| rocksdb_enable_bulk_load_api                    | ON                   |
| rocksdb_enable_thread_tracking                  | OFF                  |
| rocksdb_enable_write_thread_adaptive_yield      | OFF                  |
| rocksdb_error_if_exists                         | OFF                  |
| rocksdb_flush_memtable_on_analyze               | ON                   |
| rocksdb_force_flush_memtable_now                | OFF                  |
| rocksdb_force_index_records_in_range            | 0                    |
| rocksdb_hash_index_allow_collision              | ON                   |
| rocksdb_index_type                              | kBinarySearch        |
| rocksdb_info_log_level                          | error_level          |
| rocksdb_is_fd_close_on_exec                     | ON                   |
| rocksdb_keep_log_file_num                       | 1000                 |
| rocksdb_lock_scanned_rows                       | OFF                  |
| rocksdb_lock_wait_timeout                       | 1                    |
| rocksdb_log_file_time_to_roll                   | 0                    |
| rocksdb_manifest_preallocation_size             | 4194304              |
| rocksdb_max_background_compactions              | 1                    |
| rocksdb_max_background_flushes                  | 1                    |
| rocksdb_max_log_file_size                       | 0                    |
| rocksdb_max_manifest_file_size                  | 18446744073709551615 |
| rocksdb_max_open_files                          | -1                   |
| rocksdb_max_row_locks                           | 1073741824           |
| rocksdb_max_subcompactions                      | 1                    |
| rocksdb_max_total_wal_size                      | 0                    |
| rocksdb_merge_buf_size                          | 67108864             |
| rocksdb_merge_combine_read_size                 | 1073741824           |
| rocksdb_new_table_reader_for_compaction_inputs  | OFF                  |
| rocksdb_no_block_cache                          | OFF                  |
| rocksdb_override_cf_options                     |                      |
| rocksdb_paranoid_checks                         | ON                   |
| rocksdb_pause_background_work                   | OFF                  |
| rocksdb_perf_context_level                      | 0                    |
| rocksdb_persistent_cache_path                   |                      |
| rocksdb_persistent_cache_size                   | 0                    |
| rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON                   |
| rocksdb_print_snapshot_conflict_queries         | OFF                  |
| rocksdb_rate_limiter_bytes_per_sec              | 0                    |
| rocksdb_read_free_rpl_tables                    |                      |
| rocksdb_records_in_range                        | 0                    |
| rocksdb_seconds_between_stat_computes           | 3600                 |
| rocksdb_signal_drop_index_thread                | OFF                  |
| rocksdb_skip_bloom_filter_on_read               | OFF                  |
| rocksdb_skip_fill_cache                         | OFF                  |
| rocksdb_skip_unique_check_tables                | .*                   |
| rocksdb_stats_dump_period_sec                   | 600                  |
| rocksdb_store_row_debug_checksums               | OFF                  |
| rocksdb_strict_collation_check                  | ON                   |
| rocksdb_strict_collation_exceptions             |                      |
| rocksdb_table_cache_numshardbits                | 6                    |
| rocksdb_table_stats_sampling_pct                | 10                   |
| rocksdb_tmpdir                                  |                      |
| rocksdb_trace_sst_api                           | OFF                  |
| rocksdb_unsafe_for_binlog                       | OFF                  |
| rocksdb_use_adaptive_mutex                      | OFF                  |
| rocksdb_use_direct_reads                        | OFF                  |
| rocksdb_use_direct_writes                       | OFF                  |
| rocksdb_use_fsync                               | OFF                  |
| rocksdb_validate_tables                         | 1                    |
| rocksdb_verify_row_debug_checksums              | OFF                  |
| rocksdb_wal_bytes_per_sync                      | 0                    |
| rocksdb_wal_dir                                 |                      |
| rocksdb_wal_recovery_mode                       | 2                    |
| rocksdb_wal_size_limit_mb                       | 0                    |
| rocksdb_wal_ttl_seconds                         | 0                    |
| rocksdb_whole_key_filtering                     | ON                   |
| rocksdb_write_disable_wal                       | OFF                  |
| rocksdb_write_ignore_missing_column_families    | OFF                  |
| rocksdb_write_sync                              | OFF                  |
+-------------------------------------------------+----------------------+
and hardware details of my QuadCore box with HDD were shared previously. I tend to think no tuning is needed for such a small tables and corner read-only use case (if maybe READ COMMITTED isolation level, that I plan to study separately, as initial attempts had not shown much difference).

Just for completeness, this is the minimal configuration file I've used and the commit I've built from (after creating a new clone yesterday, this time the default branch was fb-mysql-5.6.35):
[openxs@fc23 fb56]$ cat ~/fb56.cnf
[mysqld]
rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM

log-bin
binlog-format=ROW

#transaction-isolation=READ-COMMITTED
#rocksdb_max_open_files=-1
#rocksdb_block_cache_size=128M

[openxs@fc23 mysql-5.6]$ git log -1commit 95dd650dcfb07b91971b20cbac82d61f244a05a9
Author: Gunnar Kudrjavets <gunnarku@fb.com>
Date:   Fri Feb 3 10:07:25 2017 -0800
...
You can guess that I tried to add some more options along the lines of the fine manual, but they had not changes things much to better, so I've commented them out.

So, with this fresh build based on MySQL 5.6.35 I've decided to study the influence of jemalloc for both queries with 4, 8 and 16 threads (most interesting settings for the QuadCore). I've used the --malloc-lib option for mysqld_safe to force use of jemalloc, and checked pmap ouput for mysqld process for jemalloc entries to confirm that it was really preloaded:



[openxs@fc23 fb56]$ ps aux | grep mysqld
openxs   20444  0.0  0.0 119164  3264 pts/0    S    11:49   0:00 /bin/sh bin/mysqld_safe --defaults-file=/home/openxs/fb56.cnf --malloc-lib=/usr/lib64/libjemalloc.so
openxs   20630  0.8  0.2 170704 24284 pts/0    Sl   11:49   0:00 /home/openxs/dbs/fb56/bin/mysqld --defaults-file=/home/openxs/fb56.cnf --basedir=/home/openxs/dbs/fb56 --datadir=/home/openxs/dbs/fb56/data --plugin-dir=/home/openxs/dbs/fb56/lib/plugin --log-error=/home/openxs/dbs/fb56/data/fc23.err --pid-file=/home/openxs/dbs/fb56/data/fc23.pid
openxs   20750  0.0  0.0 118520   904 pts/0    S+   11:49   0:00 grep --color=auto mysqld

[openxs@fc23 fb56]$ sudo pmap 20630 | grep jemalloc
00007fb8637db000    268K r-x-- libjemalloc.so.2
00007fb86381e000   2044K ----- libjemalloc.so.2
00007fb863a1d000     12K r---- libjemalloc.so.2
00007fb863a20000      4K rw--- libjemalloc.so.2
So, here are the raw results of mysqlslap runs with jemalloc preloaded after I made sure queries use different plans (STRAIGHT_JOIN uses PRIMARY keys and filesort, while INNER JOIN uses covering secondary index and avoids filesort):

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=4 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 19.695 seconds
        Minimum number of seconds to run all queries: 19.627 seconds
        Maximum number of seconds to run all queries: 19.769 seconds
        Number of clients running queries: 4
        Average number of queries per client: 250

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=8 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 19.332 seconds
        Minimum number of seconds to run all queries: 19.306 seconds
        Maximum number of seconds to run all queries: 19.349 seconds
        Number of clients running queries: 8
        Average number of queries per client: 125

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=16 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 19.276 seconds
        Minimum number of seconds to run all queries: 19.225 seconds
        Maximum number of seconds to run all queries: 19.309 seconds
        Number of clients running queries: 16
        Average number of queries per client: 62

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=4 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 18.553 seconds
        Minimum number of seconds to run all queries: 18.336 seconds
        Maximum number of seconds to run all queries: 18.748 seconds
        Number of clients running queries: 4
        Average number of queries per client: 250

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=8 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 18.227 seconds
        Minimum number of seconds to run all queries: 18.198 seconds
        Maximum number of seconds to run all queries: 18.269 seconds
        Number of clients running queries: 8
        Average number of queries per client: 125

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=16 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 17.941 seconds
        Minimum number of seconds to run all queries: 17.886 seconds
        Maximum number of seconds to run all queries: 18.000 seconds
        Number of clients running queries: 16
        Average number of queries per client: 62


Now, the same but with default memory allocator picked up (no jemalloc):

[openxs@fc23 fb56]$ ps aux | grep mysqld
openxs   27696  0.0  0.0 119164  3372 pts/0    S    13:50   0:00 /bin/sh bin/mysqld_safe --defaults-file=/home/openxs/fb56.cnf
openxs   27834  0.3  0.2 607012 23096 pts/0    Sl   13:50   0:00 ./bin/mysqld --defaults-file=/home/openxs/fb56.cnf --basedir=. --datadir=./data --plugin-dir=./lib/plugin --log-error=./data/fc23.err --pid-file=./data/fc23.pid
openxs   28029  0.0  0.0 118520   880 pts/0    S+   13:50   0:00 grep --color=auto mysqld
[openxs@fc23 fb56]$ pmap 27834 | grep jemalloc
[openxs@fc23 fb56]$

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=4 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 20.207 seconds
        Minimum number of seconds to run all queries: 19.833 seconds
        Maximum number of seconds to run all queries: 20.543 seconds
        Number of clients running queries: 4
        Average number of queries per client: 250

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=8 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 19.746 seconds
        Minimum number of seconds to run all queries: 19.500 seconds
        Maximum number of seconds to run all queries: 20.402 seconds
        Number of clients running queries: 8
        Average number of queries per client: 125

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=16 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 19.150 seconds
        Minimum number of seconds to run all queries: 19.084 seconds
        Maximum number of seconds to run all queries: 19.292 seconds
        Number of clients running queries: 16
        Average number of queries per client: 62

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=4 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 18.692 seconds
        Minimum number of seconds to run all queries: 18.516 seconds
        Maximum number of seconds to run all queries: 18.889 seconds
        Number of clients running queries: 4
        Average number of queries per client: 250

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=8 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 18.290 seconds
        Minimum number of seconds to run all queries: 18.250 seconds
        Maximum number of seconds to run all queries: 18.342 seconds
        Number of clients running queries: 8
        Average number of queries per client: 125

[openxs@fc23 fb56]$ bin/mysqlslap -uroot --iterations=10 --concurrency=16 --create-schema=test --no-drop --number-of-queries=1000 --query='select count(*), category from task straight_join incident on task.sys_id=incident.sys_id group by incident.category'
Benchmark
        Average number of seconds to run all queries: 18.155 seconds
        Minimum number of seconds to run all queries: 18.066 seconds
        Maximum number of seconds to run all queries: 18.397 seconds
        Number of clients running queries: 16
        Average number of queries per client: 62


Finally, this is how it looks like on a chart:


I can make the following conclusions:
  1. Using jemalloc helps to get a bit better throughput for both queries (I tend to consider the INNER case with 16 threads a fluctuation for now), but not much.
  2. New build based on fb-mysql-5.6.35 branch as of yesterday's morning demonstrates worse performance for STRAIGHT_JOIN case (access via PRIMARY keys + filesort) comparing to the previous build from January 30, 2017 based on webscalesql-5.6.27.75 branch. I am yet to find out why is it so, as INNER JOIN case performs a bit better.
  3. I am open to any MyRocks runing suggestions for my old QuadCore box, as for now MySQL 5.7.17 performs notably better on this same hardware and data in the tables, with --no-defaults. No wonder, a lot of efforts were spent by Oracle on Bug #68079, while all I've got so far for MyRocks is public "Can't repeat" of a kind, plus simple tuning advices with limited impact for my case.
I am sure I am doing something wrong with MyRocks, just tell me what exactly...