Tuesday, February 9, 2016

My First Steps in Exploring RocksDB

RocksDB and storage engine for MySQL based on it (so called "MyRocks") is widely discussed in my circles since August 2015 at least, so I decided to spend some time checking it. The easy way to get it running is to use Facebook's MySQL 5.6, so I just clonned it and built from source with minor customization based on instructions (that just work in case of Fedora Core 23):
33       mkdir git
34       cd git
35       git clone https://github.com/facebook/mysql-5.6.git
36       cd mysql-5.6/
37       git submodule init
38       git submodule update
39       cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/fb56
...
41       time make
42       make install && make clean
...
44       cd
45       vi fb56.cnf
46       cd dbs/fb56/
...
48       scripts/mysql_install_db --defaults-file=/home/openxs/fb56.cnf
49       ls data/mysql/
50       bin/mysqld_safe --defaults-file=/home/openxs/fb56.cnf &
51       bin/mysql -uroot test
On Ubuntu 14.04.3 last week I was affected by the problem similar to Issue #147, but with today's code (commit e9d85381d22a2c3a2f8cea614baa70f7e0cef7b7) there was no problem to build there as well.

The fb56.cnf file is quite simple:
[openxs@fc23 ~]$ cat fb56.cnf
[mysqld]
rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM

log-bin
binlog-format=ROW
 and the reasons for the settings there are explained on their wiki (here and there).

As a result, I've got my first RocksDB table(s) created in a matter of minutes (spent mostly on building from source), and my first bug reports filed almost immediately:
  • Issue #159 - "Indexes on RocksDB table are listed as BTREE ones in SHOW INDEXES". It was closed very soon.
  • Issue #160 - "ANALYZE TABLE does not seem to update statistics for the RocksDB table". It was closed less than a day ago.Unfortunately it seems now ANALYZE TABLE updates some statistics but does not set data size properly, see my Issue #164 reported today.
  • Issue #163 - "Strange EXPLAIN output for UPDATE ("Using temporary")". I've reported this today and it seems something in optimizer (maybe just a feature that I am not aware about) that produce the result I consider strange. In the process I've also noted that ICP (index condition pushdown) is NOT sued for the PRIMARY key of RocksDB tables (unlike for MyISAM ones), but the same limitation is known and documented for InnoDB tables.
Besides some testing, I've surely executed the command I expected to be there, SHOW ENGINE ROCKSDB STATUS:
mysql> show engine rocksdb status\G
*************************** 1. row ***************************
  Type: DBSTATS
  Name: rocksdb
Status:
** DB Stats **
Uptime(secs): 106.0 total, 106.0 interval
Cumulative writes: 6 writes, 524K keys, 6 batches, 0.9 writes per batch, ingest: 0.01 GB, 0.08 MB/s
Cumulative WAL: 6 writes, 4 syncs, 1.20 writes per sync, written: 0.01 GB, 0.08 MB/s
Cumulative compaction: 0.01 GB write, 0.08 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.6 seconds
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 6 writes, 524K keys, 6 batches, 0.9 writes per batch, ingest: 8.00 MB, 0.08 MB/s
Interval WAL: 6 writes, 4 syncs, 1.20 writes per sync, written: 0.01 MB, 0.08 MB/s
Interval compaction: 0.01 GB write, 0.08 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.6 seconds
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
** Level 0 read latency histogram (micros):
Count: 5  Average: 10.2000  StdDev: 8.70
Min: 1.0000  Median: 6.5000  Max: 22.0000
Percentiles: P50: 6.50 P75: 19.50 P99: 22.00 P99.9: 22.00 P99.99: 22.00
------------------------------------------------------
[       0,       1 )        1  20.000%  20.000% ####
[       1,       2 )        1  20.000%  40.000% ####
[       6,       7 )        1  20.000%  60.000% ####
[      18,      20 )        1  20.000%  80.000% ####
[      20,      25 )        1  20.000% 100.000% ####
...
and checked the content of the datadir related to RocksDB:
[openxs@fc23 ~]$ ls -la dbs/fb56/data/
total 1228
drwxrwxr-x.  6 openxs openxs    4096 Feb  9 11:27 .
drwxrwxr-x. 13 openxs openxs    4096 Feb  4 12:36 ..
-rw-rw----.  1 openxs openxs      56 Feb  4 12:37 auto.cnf
-rw-rw----.  1 openxs openxs   24872 Feb  4 12:36 fc23-bin.000001
-rw-rw----.  1 openxs openxs 1148080 Feb  4 12:36 fc23-bin.000002
-rw-rw----.  1 openxs openxs     488 Feb  4 13:26 fc23-bin.000003
-rw-rw----.  1 openxs openxs     139 Feb  4 13:26 fc23-bin.000004
-rw-rw----.  1 openxs openxs    1990 Feb  9 10:55 fc23-bin.000005
-rw-rw----.  1 openxs openxs     684 Feb  9 11:30 fc23-bin.000006
-rw-rw----.  1 openxs openxs     108 Feb  9 11:27 fc23-bin.index
-rw-r-----.  1 openxs openxs   24184 Feb  9 11:30 fc23.err
-rw-rw----.  1 openxs openxs       6 Feb  9 11:27 fc23.pid
drwx------.  2 openxs openxs    4096 Feb  4 12:36 mysql
drwx------.  2 openxs openxs    4096 Feb  4 12:36 performance_schema
drwxr-x--x.  2 openxs openxs    4096 Feb  9 11:30 .rocksdb
drwxrwxr-x.  2 openxs openxs    4096 Feb  9 11:30 test
[openxs@fc23 ~]$ ls -la dbs/fb56/data/.rocksdb/
total 176
drwxr-x--x. 2 openxs openxs  4096 Feb  9 11:30 .
drwxrwxr-x. 6 openxs openxs  4096 Feb  9 11:27 ..
-rw-r-----. 1 openxs openxs   702 Feb  9 10:35 000040.sst
-rw-r-----. 1 openxs openxs   219 Feb  9 11:30 000047.log
-rw-r-----. 1 openxs openxs  1622 Feb  9 11:27 000054.sst
-rw-r-----. 1 openxs openxs    16 Feb  9 11:27 CURRENT
-rw-r-----. 1 openxs openxs    37 Feb  4 12:36 IDENTITY
-rw-r-----. 1 openxs openxs     0 Feb  4 12:36 LOCK
-rw-rw----. 1 openxs openxs 28361 Feb  9 11:30 LOG
-rw-rw----. 1 openxs openxs 19212 Feb  4 12:36 LOG.old.1454582184760374
-rw-rw----. 1 openxs openxs 19438 Feb  4 12:36 LOG.old.1454582234660587
-rw-rw----. 1 openxs openxs 20023 Feb  4 13:26 LOG.old.1455004201986761
-rw-rw----. 1 openxs openxs 38739 Feb  9 10:55 LOG.old.1455010078294084
-rw-r-----. 1 openxs openxs   556 Feb  9 11:30 MANIFEST-000045
-rw-r-----. 1 openxs openxs  5414 Feb  9 11:27 OPTIONS-000051
-rw-r-----. 1 openxs openxs  5415 Feb  9 11:27 OPTIONS-000053
I've also checked some messages in the error log related to RocksDB:
2016-02-09 11:27:58 14773 [Warning] The option innodb (skip-innodb) is deprecated and will be removed in a future release2016-02-09 11:27:58 14773 [Note] Plugin 'InnoDB' is disabled.
2016-02-09 11:27:58 14773 [Note] Plugin 'FEDERATED' is disabled.
2016-02-09 11:27:58 14773 [Note] RocksDB: 2 column families found
2016-02-09 11:27:58 14773 [Note] RocksDB: Column Families at start:
2016-02-09 11:27:58 14773 [Note]   cf=default
2016-02-09 11:27:58 14773 [Note]     write_buffer_size=4194304
2016-02-09 11:27:58 14773 [Note]     target_file_size_base=2097152
2016-02-09 11:27:58 14773 [Note]   cf=__system__
2016-02-09 11:27:58 14773 [Note]     write_buffer_size=4194304
2016-02-09 11:27:58 14773 [Note]     target_file_size_base=2097152
2016-02-09 11:27:58 14773 [Note] RocksDB: Table_store: loaded DDL data for 6 tables
2016-02-09 11:27:58 14773 [Note] RocksDB instance opened
2016-02-09 11:27:58 14773 [Note] Starting crash recovery...
RocksDB: Last binlog file position 1772, file name fc23-bin.000005
2016-02-09 11:27:58 14773 [Note] Crash recovery finished.
All these details will be discussed in the next blog posts eventually.

So, these were my very first steps with RocksDB as a storage engine for MySQL. I am really impressed by the speed of bug fixing and adding missing features.

The real fun started when I've attached gdb to mysqld and set some proper breakpoints to find out how RocksDB locks the data accessed. Stay tuned - it's a topic for next (maybe several) posts about RocksDB.

10 comments:

  1. Thanks for your feedback. I think MyRocks is a big opportunity for Percona and MariaDB.

    ReplyDelete
  2. You had more luck w/ building. When I started to build on fc23 during the RocksDB talk at FOSDEM I had to make a minor change to get it to compile.

    ReplyDelete
  3. I personally do mkdir build; cd build; cmake .. ...

    That way you get an out-of-tree build that is easier to cleanup/rebuild if things go wrong.

    ReplyDelete
    Replies
    1. I also do make -j8 (for a six core box) - this compiles MUCH faster.

      Delete
    2. Yes, I usually use make -j N where N is number of cores, but when I was writing this the box was used by my wife for her browsing (and getting familiar with new Fedora UI, previous was FC 14), so I tried not to slow it down much.

      Delete
  4. In FC23 cmake didn't properly detect bz2, snappy, and lz4 libraries so I had to install them because linking failed.

    ReplyDelete
    Replies
    1. After playing a lot on Ubuntu a week ago, finding and solving problems with dependencies myself, for fresh Fedora I've just copy pasted their instructions:

      sudo yum install cmake gcc-c++ bzip2-devel libaio-devel bison zlib-devel snappy-devel
      sudo yum install gflags-devel readline-devel ncurses-devel openssl-devel lz4-devel gdb git

      and everything worked (it was even automatically translated to that new package manager of Fedora properly)

      Delete
  5. This is what I had to change to get it to compile
    https://gist.github.com/dveeden/d0da92da20452f118f41

    ReplyDelete
    Replies
    1. My change for Ubuntu 14.04 on February 2 was different (the problem was more or less like https://github.com/facebook/mysql-5.6/issues/147), and on recently installed Fedora Core 23 and with the code cloned on February 9 there was no need to change anything.

      Delete