Sunday, June 24, 2018

On InnoDB Data Compression in MySQL

Another story that I've prepared back in April for my meeting with one of customers in London was a "compression story". We spent a lot of time on it in several support issues in the past, with only limited success.

In case of InnoDB tables, there are actually two ways to compress data (besides relying on filesystem compression or compressing individual columns at server or application side). Historically the first one was introduced by the Barracuda InnoDB file format and ROW_FORMAT=COMPRESSED it supported. Notable number of related bugs were reported with time, and it may be not that easy to identify them all (you can find current list of bugs tagged with "compression" here). I've picked up the following bugs for my "story":
  • Bug #88220 - "compressing and uncompressing InnoDB tables seems to be inconsistent". Over years Simon Mudd, Monty Solomon (see related Bug #70534 - "Removing table compression leaves compressed keys") and other community members reported several bugs related to inconsistencies and surprises with key_block_size option. It is used for both MyISAM and InnoDB storage engines (for compressed tables) and it seems nobody is going to fix the remaining problems until they are gone with MyISAM engine.
  • Bug #69588 - "MyISAM to InnoDB compressed slower than MyISAM to InnoDB, Then InnoDB to Compressed". Just a detail to take into account, noted 5 years ago by Joffrey MICHAIE, verified almost 4 years ago and then getting zero public attention from Oracle engineers.
  • Bug #62431 - "What is needed to make innodb compression work for 32KB pages?". Nothing can be done according to the manual:
    "In particular, ROW_FORMAT=COMPRESSED in the Barracuda file format assumes that the page size is at most 16KB and uses 14-bit pointers."
  • Bug #78827 - "Speedup replication of compressed tables". Come on, Daniël van Eeden, nobody cares that
    "Replication and InnoDB compressed tables are not efficiently working together."
    The bug is still "Open".
  • Bug #75110 - "Massive, to-be-compressed not committed InnoDB table is total database downtime". This problem was reported by Jouni Järvinen back in 2014. Surely this is not a bug, but it seems nobody even tried to speed up compression in any way on multiple cores.
  • Bug #84439 - "Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1". It was reported by Jean-François Gagné, who asked for a reasonable error message at least. Nothing happens after verification.
  • Bug #77089 - "Misleading innochecksum error for compressed tables with key_block_size=16". This problem was reported by Laurynas Biveinis more than three years ago, immediately verified and then got zero attention.
The boats above do not use the space for mooring efficiently. They need better compression.
Transparent Page Compression for InnoDB tables was added later and looked promising. If you are lucky to use filesystem with sparse file and hole punching support and proper OS or kernel version, then you could expect notable saving of disk space with very few additional keystrokes (like COMPRESSION="zlib") when defining the table. Different compression libraries were supported. Moreover (see here), only uncompressed pages are stored in memory in this case, and this improved the efficiency of buffer pool usage. Sounded promising originally, but there are still bugs to consider:
  • Bug #78277 - "InnoDB deadlock, thread stuck on kernel calls from transparent page compression". This bug alone (reported by Mark Callaghan back in 2015) may be a reason to NOT use the feature in production, as soon as you hit it (chances are high). there are many interesting comments that there are environments where the feature works as fast as expected, but I think this summary is good enough for most users:
    "[19 Oct 2015 15:56] Mark Callaghan
    Slow on XFS, slow on ext4, btrfs core team tells me it will be slow there. But we can celebrate that it isn't slow on NVMFS - closed source, not GA, can't even find out where to buy it, not aware of anyone running it."
    The bug is still "Open".
  • Bug #81145 - "Sparse file and punch hole compression not working on Windows". Not that I care about Windows that much, but still. The bug is "Verified" for 2 years.
  • Bug #87723 - "mysqlbackup cannot work with mysql5.7 using innodb page-level compression" Now this is awesome! Oracle's own MySQL Enterprise Backup does NOT support the feature. Clearly they cared about making it useful...
    As a side note, same problem affects Percona's xtrabackup (see PXB-1394). MariaDB resolved the problem (and several related ones like MDEV-13023) with mariabackup tool.
  • Bug #87603 - "compression/tablespace ignored in create/alter table when not using InnoDB". COMPRESSION='.../' option is supported for MyISAM tables as well, and this again leads to problems when switching to another storage engine, as Tomislav Plavcic noted.
  • Bug #78672 - "assert fails in fil_io during linkbench with transparent innodb compression". This crash (assertion failure) was noted by Mark Callaghan back in 2015. May not crash anymore since 5.7.10 according to the last comment, but nobody cares to close the bug or comment anything useful. The bug is still "Verified".
That's almost all I prepared for my "compression story". It had to be sad one.

What about the moral of the story? For me it's the following:
  1. Classical InnoDB compression (page_format=compressed) has limited efficiency and does not get any attention from developers recently. If you hit some problem with this feature you have to live with it.
  2. Transparent page compression for InnoDB seems to be originally more like a proof of concept in MySQL that may not work well in production on commodity hardware, and software and was not integrated with backup tools. MariaDB improved it, added support for backing up page compressed tables efficiently with the same familiar xtrabackup-based approach, but there are still open problems to resolve (see MDEV-15527 and MDEV-15528 that I also picked up for my "story").
  3. It seems (based on public sources review at least) that both compression options do not get much attention from Oracle developers recently. If you check new features of MySQL 8.0 GA here,  you may notice that zlib version is updated, compressed temporary InnoDB tables are no longer supported and... that's all about compression for InnoDB!
This story could probably be shortened to just one link to the summary post by Mark Callaghan from Facebook (who studied the efficiency of data compression by various engines a lot, among other performance metrics), or by simple statement that if you want data to be compressed efficiently at server side do NOT use current InnoDB implementations and better use RocksDB engine (with MariaDB or Percona Server if you need other modern features also). But I can not write any story about MySQL without referring to some bugs, and this is how I've ended up with the above.

What if you just switched to MySQL 8.0 GA and need some new features from it badly? Then just wait for a miracle to happen (and hope Percona will make it one day :)

Saturday, June 23, 2018

On Partitioning in MySQL

Back in April I was preparing for vacations that my wife and I planned to spend in UK. Among other things planned I wanted to visit a customer's office in London and discuss few MySQL and MariaDB related topics, let's call them "stories". I tried to prepare myself for the discussion and collected a list of known active bugs (what else could I do as MySQL entomologist) for each of them. Surely live discussion was not suitable to share lists of bugs (and for some "stories" they were long), so I promised to share them later, in my blog. Time to do what I promised had finally come!

One of the stories we briefly discussed was "partitioning story". Right now I can immediately identify at least 47 active MySQL bugs in the related category.  While preparing I checked the same list and picked up 15 or so bug reports that had to illustrate my points. Let me share them here in no specific order, and add few more.
In April the latest still active bug in partitioning reported by MySQL community was  Bug #88916 - "Assertion `table->s->db_create_options == part_table->s->db_create_options'", from my colleague Elena Stepanova. Note a very simple test case that leads to assertion in debug builds, immediately verified.

Recently two more bugs were reported. Reporter of Bug #91190 - "DROP PARTITION and REORGANIZE PARTITION are slow" suspects a performance regression in MySQL 8.0.11. I've subscribed to this bug and is following the progress carefully. Same with Bug #91203 - "For partitions table, deal with NULL with is mismatch with reference guide". I think what happens with NULL value and range partitioning perfectly matches the manual, but the fact that INFORMATION_SCHEMA.PARTITIONS table may return wrong information after dropping partition with NULL value is somewhat unexpected.

Now back to the original lists for the "story" I prepared in April:
  • Bug #60023 - "No Loose Index Scan for GROUP BY / DISTINCT on InnoDB partitioned table". It was reported by Rene' Cannao' and since 2013 I strongly suspect that it's fixed in MySQL 5.6+ or, as noted in another comment, may depend on statistics properly collected for the table. Still the status remains "Verified".
  • Bug #78164 - "alter table command affect partitioned table data directory". Your custom DATA DIRECTORY settings may get lost when ALTER is applied to the whole table. Quick test shows that at least in MariaDB 10.3.7 this is no longer the case. The bug is still "Verified".
  • Bug #85126 - "Delete by range in presence of partitioning and no PK always picks wrong index". It was reported by Riccardo Pizzi 16 months ago, immediately verified (without explicit list of versions affected, by the way). One more case when ordering of indexes in CREATE TABLE may matter...
  • Bug #81712 - "lower_case_table_names=2 ignored on ADD PARTITION on Windows". Who cares about Windows these days?
  • Bug #84356 - "General tablespace table encryption". It seems partitioning allows to overcome documented limitation. If this is intended, then the manual is wrong, otherwise I suspect the lack of careful testing of partitioning integration with other features.
  • Bug #88673 - "Regression CREATE TBL from 5.7.17 to 20 (part #1: innodb_file_per_table = ON)." I've probably mentioned this bug reported by Jean-François Gagné in more than one blog post already. Take care and do not use long partition names.
  • Bug #85413 - "Failing to rename a column involved in partition". As simple as it sounds, and it still happens.
  • Bug #83435 - "ALTER TABLE is very slow when using PARTITIONED table". It was reported by Roel Van de Paar back in 2016 and still remains "Verified".
  • Bug #73084 - "Exchanging partitions defined with DATA DIRECTORY and INDEX DIRECTORY options". The bug still remains "Open" (see Bug #77772 also).
  • Bug #73648 - "innodb table replication is very slow with some of the partitioned table". It seems to be fixed last year as internal Bug #25687813 (see release notes for 5.6.38), but nobody cares to find this older duplicate and change its status or re-verify it.
  • Bug #83750 - "Import via TTS of a partitioned table only uses 1 cpu core". This feature requested by Daniël van Eeden makes a lot of sense. I truly hope to see parallel operations implemented for partitioned tables in GA MySQL versions (as I saw some parallel processing for partitions done for some upcoming "6.1" or so version back in 2008 in Riga during the MySQL's last company meeting I've attended).
  • Bug #64498 - "Running out of file handles when ALTERing partitioned MyISAM table". Too many file handles are needed. This is a documented limitation that DBAs should still take into account.
I also prepared a separate small list of partition pruning bugs:
  • Bug #83248 - "Partition pruning is not working with LEFT JOIN". I've reported it back in 2016 and it is still not fixed. There are reasons to think it is not so easy.
  • Bug #75085 - "Partition pruning on key partitioning with ENUM". It was reported by  Daniël van Eeden back in 2014!
  • Bug #77318 - "Selects waiting on MDL when altering partitioned table". One of the worst expectations DBA may have is that partitioned tables help to workaround "global" MDL locks because of partition pruning! This is not the case.
Does this story have any moral? I think so, and for me it's the following:
  1. Partitioning bugs do not get proper attention from Oracle engineers. We see bugs with wrong status and even a bug with a clear test case and a duplicate that is "Open" for 4 years. Some typical use cases are affected badly, and still no fixes (even though since 5.7 we have native partitioning in InnoDB and changing implementation gave good chance to review and either fix or re-check these bugs).
  2. MySQL DBAs should expect all kinds of surprises when running usual DDL statements (ALTER TABLE to add column even) with partitioned tables. In the best case DDL is just unexpectedly slow for them.
  3. Partition pruning may not work they way one expects.
  4. We miss parallel processing for partitioned tables. They should allow to speed up queries and DDL, not to slow them down instead...
  5. One can suspect that there is no careful internal testing performed on integration of partitioning with other features, or even basic partition maintenance operations.