Blog of (former?) MySQL Entomologist: April 2016

Saturday, April 30, 2016

Fun with Bugs #42 - Bugs Fixed in MySQL 5.7.12

MySQL 5.7.12 was released more than 2 weeks ago. New features introduced there in a form of "rapid plugins" are widely discussed, but I am more interested in bugs reported by MySQL Community users that are fixed there. Unfortunately I do not see MySQL Community Release Notes by Morgan (like this) for quite a some time, so I have to continue describing key bug fixes and name people who reported and verified bugs in my "Fun with Bugs" series.

As usual, let's start with InnoDB bugs fixed:

Bug #80070 - "allocated_size and file_size differ if create general tablespace outside datadir". It was reported by my former colleague from Percona Shahriyar Rzayev and verified by Bogdan Kecman. Nice to see more people from Oracle involved in processing community bug reports!
Bug #79185 - "Innodb freeze running REPLACE statements". This bug (that affected many users, also on versions 5.5.x and 5.6.x, and was a kind of a regression) was reported by Will Bryant and verified (probably) and fixed by Shaohua Wang. The fix is also included into versions 5.5.49 and 5.6.30.
Bug #73816 - ''MySQL instance stalling “doing SYNC index”". It was reported by Denis Jedig and a lot of additional evidence was provided by my former colleague Aurimas Mikalauskas. This bug was fixed (and probably verified) by Shaohua Wang.
Bug #79200 - "InnoDB: "data directory" option of create table fails with pwrite() OS error 22", is a widely noted regression (I've seen customer issue with a potentially related MariaDB problem this week). This bug was reported by Frank Ullrich and verified by Bogdan Kecman. It is also fixed in MySQL 5.6.30.
Bug #79725 - "Check algorithm=innodb on crc32 checksum mismatch before crc32(big endian)". This bug was created to track the patch contributed by Daniel Black at GitHub. It was verified by Umesh.

Next, let's review replication bugs fixed in 5.7.12:

Bug #79504 - "STOP SLAVE IO THREAD prints wrong LOST CONNECTION message in error log file". It was reported by Venkatesh Duggirala.
Bug #78722 - "Relay log info currently_executing_gtid is not properly initialized or protected". This bug was reported by Pedro Gomes. It contains a nice simple test case and fix suggested.
Bug #78445 is private. So, I can only quote the release notes:
"RESET SLAVE ALL could delete a channel even when master_pos_wait and wait_until_sql_thread_after_gtid were still waiting for binlog to be applied. This could cause a MySQL server exit when the functions tried to access the channel that was deleted. Now, a channel reference counter was added that is increased if the channel should not be deleted when the functions are running. RESET SLAVE ALL will wait for no reference, and then it will delete the channel."
I am not sure this crash is a "security" bug of any kind, but what do I know...
Bug #78352 - "Slow startup of 5.7.x slave with relay_log_recovery = ON and many relay logs". I reported it based on regression comparing to 5.6.x reported by a customer of Percona, and verified by Umesh. Nice to see it fixed, as it was really annoying for almost anyone who upgraded production replication setup to 5.7.
Bug #78133 - "Slave_worker::write_info() incorrect DBUG_ENTER (contribution)". This bug was created to track the patch contributed by Stewart Smith at GitHub. It was verified by Umesh.
Bug #77740 - "silent failure to start if mysql.gtids_executed gets HA_ERR_LOCK_WAIT_TIMEOUT ". It was reported and verified by Shane Bester.
Bug #77237 - "Multi-threaded slave log spamming on failure". This bug was reported by Davi Arnaut and verified by Umesh. Fix is also included in MySQL 5.6.30.
Bug #78963 - "super_read_only aborts STOP SLAVE if relay_log_info_repository=TABLE, dbg crash". It was reported by my former colleague in Percona Laurynas Biveinis and verified by Umesh. Check also related Bug #79328 - "super_read_only broken as a server option".
Bug #77684 - "DROP TABLE IF EXISTS may brake replication if slave has replication filters". This bug was reported by my former colleague in Percona Fernando Laudares Camargos for MySQL 5.6.x and verified by Umesh. MySQL 5.6.30 also got this fixed.

We all remember that Performance Schema is perfect and the next greatest thing after sliced bread, but sometimes bugs are noted even there. Check Bug #79934 - "i_perfschema.table_leak random result failure" reported and verified by Magnus Blåudd. Another example is Bug #79784 - "update setup_instruments do not affect the global mutex/rwlock/cond" reported by Zhang Yingqiang and verified by Umesh. The later, IMHO, is related to or a super set of my good old report, Bug #68097 - "Manual does not explain that some P_S instruments must be enabled at startup" that remains open as a feature request (after some changes in the manual) for more than 3 years already. I truly hope 5.7.12 fixed this for a general case - it's truly important to be able to enable instruments dynamically if we expect Performance Schema to be used as a main tool for troubleshooting.

I'd also want to highlight a couple of fixes related to optimizer:

Bug #77209 - "Update may use index merge without any reason (increasing chances for deadlock)". It was reported and verified by my former colleagues from Oracle, Andrii Nikitin. MySQL 5.6.30 also includes the fix.
Bug #72858 - "EXPLAIN .. SELECT .. FOR UPDATE takes locks". This bug was reported by my former colleague in Percona (and, I hope, my colleague again soon) Justin Swanhart, who has a birthday today. Happy Birthday to you, Justin! The bug was verified by Umesh and is also fixed in MySQL 5.6.30. Justin had reported another bug fixed in 5.7.12, Bug #69375 - "LOAD DATA INFILE claims to be holding 'System Lock' in processlist".

Several more bugs reported by community were also fixed, but they were in the areas (or for platforms) I am not particularly interested in.

To summarize, MySQL 5.7.12 contains important bug fixes in replication and InnoDB and it makes sense to consider upgrade even if you do not care about any "rapid plugins", X protocol, encryption of data at rest, MySQL Keyring and other "cool" new shiny features.

Wednesday, April 27, 2016

Building MaxScale 1.4.2 from GitHub on Fedora 23

MariaDB MaxScale is mentioned in many blog posts recently. It's Application of the Year 2016 after all! I'd like to test it, follow posts like this etc, all that on my favorite and readily available testing platforms that are now Ubuntu of all kinds and, surely, Fedora 23 (on my wife's workstation, the most powerful hardware at hand).

My old habits force me to build open source software I test from source, and I do not want to even discuss the topic of "MaxScale binaries availability" that was quite "popular" some time ago. So, after building MaxScale 1.4.1 on CentOS 6.7 back on March 31, 2016 (mostly just following MariaDB KB article on the topic) using libmysqld.a from MariaDB 10.0.23, this morning I decided to check new branch, 1.4.2, and build it on Fedora 23, following that same KB article (that unfortunately does not even mention Fedora after the fix to MXS-248). Thing is, Fedora is not officially supported as a platform for MaxScale 1.4.x, but why should we, those who can build things from source for testing purposes, care about this?

I started with cloning MaxScale:

git clone https://github.com/mariadb-corporation/MaxScale.git
cd MaxScale

and then:

[openxs@fc23 MaxScale]$ git branch -r
...
origin/HEAD -> origin/develop
...
origin/release-1.4.2
...

I remember spending enough time fighting with develop branch while building on CentOS 6.7, mostly with sqlite-related things it contained, so this time I proceed immediately to the branch I want to build:

[openxs@fc23 MaxScale]$ git checkout release-1.4.2
Branch release-1.4.2 set up to track remote branch release-1.4.2 from origin.
Switched to a new branch 'release-1.4.2'
[openxs@fc23 MaxScale]$ git branch
develop
* release-1.4.2

[openxs@fc23 MaxScale]$ mkdir build
[openxs@fc23 MaxScale]$ cd build

Last two steps originate from the KB article. We are almost ready for building, but what about the prerequisites? I've collected all the packages required for CentOS in that article and tried to install them all:

[openxs@fc23 build]$ sudo yum install mariadb-devel mariadb-embedded-devel libedit-devel gcc gcc-c++ ncurses-devel bison flex glibc-devel cmake libgcc perl make libtool openssl-devel libaio libaio-devel librabbitmq-devel libcurl-devel pcre-devel rpm-build[sudo] password for openxs:
Yum command has been deprecated, redirecting to '/usr/bin/dnf install mariadb-devel mariadb-embedded-devel libedit-devel gcc gcc-c++ ncurses-devel bison flex glibc-devel cmake libgcc perl make libtool openssl-devel libaio libaio-devel librabbitmq-devel libcurl-devel pcre-devel rpm-build'.
See 'man dnf' and 'man yum2dnf' for more information.
To transfer transaction metadata from yum to DNF, run:
'dnf install python-dnf-plugins-extras-migrate && dnf-2 migrate'

Last metadata expiration check: 0:26:04 ago on Wed Apr 27 10:43:24 2016.
Package gcc-5.3.1-6.fc23.x86_64 is already installed, skipping.
...
Package pcre-devel-8.38-7.fc23.x86_64 is already installed, skipping.
Dependencies resolved.
================================================================================
Package                  Arch     Version                      Repository
                                                                           Size
================================================================================
Installing:
autoconf                 noarch   2.69-21.fc23                 fedora    709 k
automake                 noarch   1.15-4.fc23                  fedora    695 k
dwz                      x86_64   0.12-1.fc23                  fedora    106 k
flex                     x86_64   2.5.39-2.fc23                fedora    328 k
ghc-srpm-macros          noarch   1.4.2-2.fc23                 fedora    8.2 k
gnat-srpm-macros         noarch   2-1.fc23                     fedora    8.4 k
go-srpm-macros           noarch   2-3.fc23                     fedora    8.0 k
libcurl-devel            x86_64   7.43.0-6.fc23                updates   590 k
libedit-devel            x86_64   3.1-13.20150325cvs.fc23      fedora     34 k
librabbitmq              x86_64   0.8.0-1.fc23                 updates    43 k
librabbitmq-devel        x86_64   0.8.0-1.fc23                 updates    52 k
libtool                  x86_64   2.4.6-8.fc23                 updates   707 k
mariadb-common           x86_64   1:10.0.23-1.fc23             updates    74 k
mariadb-config           x86_64   1:10.0.23-1.fc23             updates    25 k
mariadb-devel            x86_64   1:10.0.23-1.fc23             updates   869 k
mariadb-embedded         x86_64   1:10.0.23-1.fc23             updates   4.0 M
mariadb-embedded-devel   x86_64   1:10.0.23-1.fc23             updates   8.3 M
mariadb-errmsg           x86_64   1:10.0.23-1.fc23             updates   199 k
mariadb-libs             x86_64   1:10.0.23-1.fc23             updates   637 k
ocaml-srpm-macros        noarch   2-3.fc23                     fedora    8.1 k
patch                    x86_64   2.7.5-2.fc23                 fedora    123 k
perl-Thread-Queue        noarch   3.07-1.fc23                  updates    22 k
perl-generators          noarch   1.06-1.fc23                  updates    15 k
perl-srpm-macros         noarch   1-17.fc23                    fedora    9.7 k
python-srpm-macros       noarch   3-7.fc23                     updates   8.1 k
redhat-rpm-config        noarch   36-1.fc23.1                  updates    59 k
rpm-build                x86_64   4.13.0-0.rc1.13.fc23         updates   137 k

Transaction Summary
================================================================================
Install 27 Packages

Total download size: 18 M
Installed size: 64 M
Is this ok [y/N]: Y

...

Complete!

Now, let's try simple approach:

[openxs@fc23 build]$ cmake ..
...
-- MySQL version: 10.0.23
-- MySQL provider: MariaDB
-- Looking for pcre_stack_guard in MYSQL_EMBEDDED_LIBRARIES_STATIC-NOTFOUND
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
MYSQL_EMBEDDED_LIBRARIES_STATIC
linked by target "cmTC_2494a" in directory /home/openxs/git/MaxScale/build/CMakeFiles/CMakeTmp

CMake Error: Internal CMake error, TryCompile configure of cmake failed
-- Looking for pcre_stack_guard in MYSQL_EMBEDDED_LIBRARIES_STATIC-NOTFOUND - not found
-- PCRE libs: /usr/lib64/libpcre.so
-- PCRE include directory: /usr/include
-- Embedded mysqld does not have pcre_stack_guard, linking with system pcre.
CMake Error at cmake/FindMySQL.cmake:115 (message):
Library not found: libmysqld. If your install of MySQL is in a non-default
location, please provide the location with -DMYSQL_EMBEDDED_LIBRARIES=<path
to library>
Call Stack (most recent call first):
CMakeLists.txt:37 (find_package)

-- Configuring incomplete, errors occurred!
See also "/home/openxs/git/MaxScale/build/CMakeFiles/CMakeOutput.log".
See also "/home/openxs/git/MaxScale/build/CMakeFiles/CMakeError.log".

Failure, cmake can not find libmysqld.a it seems. Let me try to find it:

[openxs@fc23 build]$ sudo find / -name libmysqld.a 2>/dev/null
/home/openxs/git/percona-xtrabackup/libmysqld/libmysqld.a
/home/openxs/dbs/5.7/lib/libmysqld.a
/home/openxs/dbs/p5.6/lib/libmysqld.a
/home/openxs/dbs/fb56/lib/libmysqld.a
/home/openxs/10.1.12/lib/libmysqld.a

That's all, even though I installed all packages that looked as required based on the article! I have the library in many places (in my own builds and even in sandbox with MariaDB 10.1.12), but it's not installed where expected. Some more desperate tries (installing MariaDB server with sudo yum install mariadb-server, searches for package that provides libmysqld.a etc), chat with engineers of MariaDB and I've ended up with the fact that my packages are from Fedora (not MariaDB) and they just do not include the static library. Looks like a bug in Fedora packaging, if you ask me.

I was not ready to add MariaDB's repository at the moment (to get MariaDB-devel etc, something KB article also suggests for supported platforms), so I decided that it would be fair just to build current MariaDB 10.1.13 from source and use everything needed from there. Last time I built 10.2 branch, so I had to check out 10.1 first:

[openxs@fc23 server]$ git checkout 10.1
Switched to branch '10.1'
Your branch is behind 'origin/10.1' by 2 commits, and can be fast-forwarded.
(use "git pull" to update your local branch)
[openxs@fc23 server]$ git pull
Updating 1cf852d..071ae30
Fast-forward
client/mysqlbinlog.cc                    | 523 ++++++++++++++++++++++---------
mysql-test/r/mysqlbinlog_raw_mode.result | 274 ++++++++++++++++
mysql-test/t/mysqlbinlog_raw_mode.test   | 387 +++++++++++++++++++++++
sql/sql_priv.h                           |   3 +-
storage/innobase/dict/dict0boot.cc       | 20 +-
storage/xtradb/dict/dict0boot.cc         | 20 +-
6 files changed, 1062 insertions(+), 165 deletions(-)
create mode 100644 mysql-test/r/mysqlbinlog_raw_mode.result
create mode 100644 mysql-test/t/mysqlbinlog_raw_mode.test

Then I've executed the following while in server directory:

make clean

rm CMakeCache.txt

cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DWITH_JEMALLOC=system -DWITH_WSREP=ON -DWITH_INNODB_DISALLOW_WRITES=ON -DWITH_EMBEDDED_SERVER=ON -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/maria10.1

make

make install && make clean

Note that I've explicitly asked to build embedded server. I checked that the library is in the location I need:

[openxs@fc23 server]$ sudo find / -name libmysqld.a 2>/dev/null
/home/openxs/git/percona-xtrabackup/libmysqld/libmysqld.a
/home/openxs/dbs/maria10.1/lib/libmysqld.a
/home/openxs/dbs/5.7/lib/libmysqld.a
/home/openxs/dbs/p5.6/lib/libmysqld.a
/home/openxs/dbs/fb56/lib/libmysqld.a
/home/openxs/10.1.12/lib/libmysqld.a

Then I moved back to MaxScale/build directory and explicitly pointed out the location of headers, library and messages that I want to use with MaxScale:

[openxs@fc23 build]$ cmake .. -DMYSQL_EMBEDDED_INCLUDE_DIR=/home/openxs/dbs/maria10.1/include/mysql -DMYSQL_EMBEDDED_LIBRARIES=/home/openxs/dbs/maria10.1/lib/libmysqld.a -DERRMSG=/home/openxs/dbs/maria10.1/share/english/errmsg.sys -DCMAKE_INSTALL_PREFIX=/home/openxs/maxscale -DWITH_MAXSCALE_CNF=N
...
-- Build files have been written to: /home/openxs/git/MaxScale/build

[openxs@fc23 build]$ make
...
[ 95%] [BISON][ruleparser] Building parser with bison 3.0.4
ruleparser.y:34.1-13: warning: deprecated directive, use Б-?%name-prefixБ-? [-Wdeprecated]
%name-prefix="dbfw_yy"
^^^^^^^^^^^^^
[ 96%] Building C object server/modules/filter/dbfwfilter/CMakeFiles/dbfwfilter.dir/ruleparser.c.o
[ 96%] Building C object server/modules/filter/dbfwfilter/CMakeFiles/dbfwfilter.dir/token.c.o
[ 97%] Linking C shared library libdbfwfilter.so
[ 97%] Built target dbfwfilter
Scanning dependencies of target maxadmin
[ 98%] Building C object client/CMakeFiles/maxadmin.dir/maxadmin.c.o
[ 98%] Linking C executable maxadmin
[100%] Built target maxadmin

It seems build completed without problems this time. We can try to test it (some tests do fail):

[openxs@fc23 build]$ make testcore
...
1/22 Test #1: Internal-TestQueryClassifier .....***Exception: Other 0.35 sec
      Start 2: Internal-CanonicalQuery
2/22 Test #2: Internal-CanonicalQuery ..........***Failed    0.25 sec
      Start 3: Internal-CanonicalQuerySelect
3/22 Test #3: Internal-CanonicalQuerySelect ....***Failed    0.04 sec
      Start 4: Internal-CanonicalQueryAlter
4/22 Test #4: Internal-CanonicalQueryAlter .....***Failed    0.04 sec
      Start 5: Internal-CanonicalQueryComment
5/22 Test #5: Internal-CanonicalQueryComment ...***Failed    0.04 sec
      Start 6: Internal-TestAdminUsers
6/22 Test #6: Internal-TestAdminUsers ..........   Passed    0.44 sec
      Start 7: Internal-TestBuffer
7/22 Test #7: Internal-TestBuffer ..............   Passed    0.01 sec
      Start 8: Internal-TestDCB
8/22 Test #8: Internal-TestDCB .................   Passed    0.01 sec
      Start 9: Internal-TestFilter
9/22 Test #9: Internal-TestFilter ..............   Passed    0.03 sec
...

(As a side note, make install in my case had NOT installed anything to /home/openxs/maxscale, something to deal with later, as on CentOS 6.7 it worked...)

In any case, I now have binaries to work with, of version 1.4.2:

[openxs@fc23 build]$ ls bin/
maxadmin maxbinlogcheck maxkeys maxpasswd maxscale
[openxs@fc23 build]$ bin/maxscale --version

MariaDB Corporation MaxScale 1.4.2 Wed Apr 27 13:24:01 2016
------------------------------------------------------
MaxScale 1.4.2

[openxs@fc23 build]$ bin/maxadmin --version
bin/maxadmin Version 1.4.2

To be continued one day... Stay tuned!

Sunday, April 24, 2016

Building MariaDB 10.1.x and Galera from Source for Multiple Node Cluster Testing Setup

My Facebook followers probably noted that I quit from Percona some time ago and work for MariaDB since March 1, 2016. I changed the company, but neither the job role (I am still a Support Engineer), nor the approach to do my job. I still prefer to test everything I suggest to customers and I usually use software I build from source myself for these tests.

While I try to avoid all kinds of clusters as much as possible for 15 years or so already (it does not matter if it's Oracle RAC, MySQL Cluster or Percona XtraDB Cluster, all of them), it's really hard to avoid Galera clusters while working for MariaDB. One of the reasons for this is that Galera, starting from MariaDB 10.1, can be easily "enabled"/used with any MariaDB 10.1.x instance, any time (at least when we speak about official binaries or those properly built - they are all "Galera ready"). Most of MariaDB customers do use Galera or can try to use it any time, so I have to be ready to test something Galera-specific any moment.

For simple cases I decided to use a setup with several (2 to begin with) cluster nodes on one box. This approach is described in the manual for Percona XtraDB Cluster and was also used by my former colleague Fernando Laudares for his blog post and many real life related tests.

So, I decided to proceed with the mix of ideas from the sources above and MariaDB's KB article on building Galera from source. As I decided to do this on my wife's Fedora 23 workstation, I checked this KB article for some details also. It lists prerequisites (boost-devel check-devel glibc-devel openssl-devel scons) and some of these packages (like scons in one of my cases) could be missing even on a system previosly used for builds for all kinds of MySQL related software. You can find something missing and fix the problem at later stage, but reading and following the manual or KB articles may help to save some time otherwise spent on trial and error.

I've started with making directories in my home directory (/home/openxs) for this Galera related testing setup, like these:

[openxs@fc23 ~]$ mkdir galera
[openxs@fc23 ~]$ cd galera
[openxs@fc23 galera]$ mkdir node1[openxs@fc23 galera]$ mkdir node2
[openxs@fc23 galera]$ mkdir node3
[openxs@fc23 galera]$ ls
node1 node2 node3

I plan to use 3 nodes one day, but for this blog post I'll set up only 2, to have the smallest possible and simplest cluster as a proof of concept.

Then I proceeded with cloning Galera from Codership's GitHub (this is supposed to be the latest and greatest). I changed current directory to my usual git repository and executed git clone https://github.com/codership/galera.git. When this command completed I've got a subdirectory named galera.

In that directory, assuming that all prerequisites are installed, to build current Galera library version it's enough to execute simple script while in galera directory, ./scripts/build.sh. I ended up with the following:

[openxs@fc23 galera]$ ls -l libgalera_smm.so
-rwxrwxr-x. 1 openxs openxs 40204824 Mar 31 12:21 libgalera_smm.so
[openxs@fc23 galera]$ file libgalera_smm.so
libgalera_smm.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=11457fa9fd69dabe617708c0dd288b218255a886, not stripped

[openxs@fc23 galera]$ pwd
/home/openxs/git/galera
[openxs@fc23 galera]$ cp libgalera_smm.so ~/galera/

and copied the library to the target directory for my testing setup (that should NOT conflict with whatever software I may have installed later from packages).

Now, time to build MariaDB properly to let it use Galera if needed. I already had recent (at the moment) 10.1.13 in the server subdirectory of my git repository. I've executed the following commands then:

[openxs@fc23 server]$ cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1 -DWITH_JEMALLOC=system -DWITH_WSREP=ON -DWITH_INNODB_DISALLOW_WRITES=ON -DCMAKE_INSTALL_PREFIX=/home/openxs/dbs/maria10.1
-- Running cmake version 3.4.1
-- MariaDB 10.1.13
...

[openxs@fc23 server]$ time make -j 4...
real    9m28.164s
user    32m43.960s
sys     2m45.637s

This was my usual command line to build MariaDB 10.x with only 2 extra options added: -DWITH_WSREP=ON -DWITH_INNODB_DISALLOW_WRITES=ON.After make completed, I've executed make install && make clean and was ready to use my shiny new Galera-ready MariaDB 10.1.13.

To take into account the directories I am going to use for my cluster nodes and make sure they can start and communicate as separate mysqld instances, I have to create configuration files for them. I've changed working directory to /home/openxs/dbs/mariadb10.1 and started with this configuration file for the first node:

[openxs@fc23 maria10.1]$ cat /home/openxs/galera/mynode1.cnf
[mysqld]
datadir=/home/openxs/galera/node1
port=3306
socket=/tmp/mysql-node1.sock
pid-file=/tmp/mysql-node1.pid
log-error=/tmp/mysql-node1.errbinlog_format=ROW
innodb_autoinc_lock_mode=2

wsrep_on=ON # this is important for 10.1!
wsrep_provider=/home/openxs/galera/libgalera_smm.so
wsrep_cluster_name = singlebox
wsrep_node_name = node1
# wsrep_cluster_address=gcomm://
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020?pc.wait_prim=no

It's one of the shortest possible. I had to specify unique datadir, error log location, pid file, port and socket for the instance, set binlog format and point out Galera library location, set cluster name and node name. With proper planning I was able to specify wsrep_cluster_address referring to all other nodes properly, but for initial setup of the first node I can have it "empty" as commented out in the above, so that we start as a new cluster node. There is one essential setting for MariaDB 10.1.x that is not needed for "cluster-specific" instances like Percona XtraDB Cluster or older 10.0.x Galera packages from MariaDB (where it's ON by default). This is wsrep_on=ON. Without it MariaDB works as normal, non-cluster instance and ignores anything cluster-related. You can save a lot of time in case of upgrade to 10.1.x if you put it in your configuration file explicitly right now, no matter what the version is used.

Then I copied and modified configuration file for the second node:

[openxs@fc23 maria10.1]$ cp /home/openxs/galera/mynode1.cnf /home/openxs/galera/mynode2.cnf
[openxs@fc23 maria10.1]$ vi /home/openxs/galera/mynode2.cnf
[openxs@fc23 maria10.1]$ cat /home/openxs/galera/mynode2.cnf

[mysqld]
datadir=/home/openxs/galera/node2
port=3307
socket=/tmp/mysql-node2.sock
pid-file=/tmp/mysql-node2.pid
log-error=/tmp/mysql-node2.errbinlog_format=ROW
innodb_autoinc_lock_mode=2
wsrep_on=ON # this is important for 10.1!wsrep_provider=/home/openxs/galera/libgalera_smm.so
wsrep_cluster_name = singlebox
wsrep_node_name = node2
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020
?pc.wait_prim=nowsrep_provider_options = "base_port=5020;"

Note that while Galera node uses 4 ports, I specified only 2 unique ones explicitly, port for MySQL clients and base port for all Galera-related communication like IST and SST, with base_port setting. Note also how I referred to all cluster nodes with wsrep_cluster_address - this same value can be used for the configuration file of the first node actually. We can just start it as the first node of a new cluster (see below).

Now we have configuration files for 2 nodes ready (we can always add node3 later in the same way). But before starting new cluster we have to install system databases. For node1 it was performed in the following way:

[openxs@fc23 maria10.1]$ scripts/mysql_install_db --defaults-file=/home/openxs/galera/mynode1.cnf
Installing MariaDB/MySQL system tables in '/home/openxs/galera/node1' ...
2016-03-31 12:51:34 139766046820480 [Note] ./bin/mysqld (mysqld 10.1.13-MariaDB) starting as process 28297 ...
...

[openxs@fc23 maria10.1]$ ls -l /home/openxs/galera/node1
-rw-rw----. 1 openxs openxs    16384 Mar 31 12:51 aria_log.00000001
-rw-rw----. 1 openxs openxs       52 Mar 31 12:51 aria_log_control
-rw-rw----. 1 openxs openxs 12582912 Mar 31 12:51 ibdata1
-rw-rw----. 1 openxs openxs 50331648 Mar 31 12:51 ib_logfile0
-rw-rw----. 1 openxs openxs 50331648 Mar 31 12:51 ib_logfile1
drwx------. 2 openxs openxs     4096 Mar 31 12:51 mysql
drwx------. 2 openxs openxs     4096 Mar 31 12:51 performance_schema
drwx------. 2 openxs openxs     4096 Mar 31 12:51 test

Then I started node1 as a new cluster:

[openxs@fc23 maria10.1]$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &

and created a table, t1, with some data in it. After that I repeated installation of system tables etc for node2, just referencing proper configuration file, and started node2 that was supposed to join the cluster:

openxs@fc23 maria10.1]$ bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &

Let's check if we do have both instances running and communicating in Galera cluster:

[openxs@fc23 maria10.1]$ tail /tmp/mysql-node2.err
2016-03-31 13:40:29 139627414767744 [Note] WSREP: Signalling provider to continue.
2016-03-31 13:40:29 139627414767744 [Note] WSREP: SST received: c91d17b6-f72b-11e5-95de-96e95167f593:0
2016-03-31 13:40:29 139627117668096 [Note] WSREP: 1.0 (node2): State transfer from 0.0 (node1) complete.
2016-03-31 13:40:29 139627117668096 [Note] WSREP: Shifting JOINER -> JOINED (TO: 0)
2016-03-31 13:40:29 139627117668096 [Note] WSREP: Member 1.0 (node2) synced with group.
2016-03-31 13:40:29 139627117668096 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)2016-03-31 13:40:29 139627414452992 [Note] WSREP: Synchronized with group, ready for connections
2016-03-31 13:40:29 139627414452992 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-31 13:40:29 139627414767744 [Note] /home/openxs/dbs/maria10.1/bin/mysqld: ready for connections.
Version: '10.1.13-MariaDB' socket: '/tmp/mysql-node2.sock' port: 3307 Source distribution
[openxs@fc23 maria10.1]$ tail /tmp/mysql-node1.err
2016-03-31 13:40:27 140071390934784 [Note] WSREP: Provider resumed.
2016-03-31 13:40:27 140072133322496 [Note] WSREP: 0.0 (node1): State transfer to 1.0 (node2) complete.
2016-03-31 13:40:27 140072133322496 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 0)
2016-03-31 13:40:27 140072133322496 [Note] WSREP: Member 0.0 (node1) synced with group.
2016-03-31 13:40:27 140072133322496 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
2016-03-31 13:40:27 140072429247232 [Note] WSREP: Synchronized with group, ready for connections
2016-03-31 13:40:27 140072429247232 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-03-31 13:40:27 140072141715200 [Note] WSREP: (c91c99ec, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-03-31 13:40:29 140072133322496 [Note] WSREP: 1.0 (node2): State transfer from 0.0 (node1) complete.
2016-03-31 13:40:29 140072133322496 [Note] WSREP: Member 1.0 (node2) synced with group.

Familiar messages (unfortunately...) that prove we had a second node joined and performed state transfer from the first one. Now it's time to connect and test how cluster works. This is what I had after node1 started and table with some data created there, but before node2 started:

[openxs@fc23 maria10.1]$ bin/mysql -uroot --socket=/tmp/mysql-node1.sock
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 10.1.13-MariaDB Source distribution

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show variables like 'wsrep_cluster%';
+-----------------------+-------------------------------------------------------+
| Variable_name         | Value                                                 |
+-----------------------+-------------------------------------------------------+
| wsrep_cluster_address | gcomm://127.0.0.1:4567,127.0.0.1:5020?pc.wait_prim=no |
| wsrep_cluster_name    | singlebox                                             |
+-----------------------+-------------------------------------------------------+
2 rows in set (0.00 sec)

MariaDB [(none)]> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 1                                    |
| wsrep_cluster_size       | 1                                    |
| wsrep_cluster_state_uuid | c91d17b6-f72b-11e5-95de-96e95167f593 |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.00 sec)

MariaDB [(none)]> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [test]> select * from t1;
+----+------+
| id | c1   |
+----+------+
| 1 |    1 |
| 2 |    2 |
+----+------+
2 rows in set (0.00 sec)

Then, when node2 joined the cluster, I checked that the data we've added on node1 are there:

[openxs@fc23 maria10.1]$ bin/mysql -uroot --socket=/tmp/mysql-node2.sock
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.13-MariaDB Source distribution

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [test]> select * from t1;
+----+------+
| id | c1   |
+----+------+
| 1 |    1 |
| 2 |    2 |
+----+------+
2 rows in set (0.00 sec)

MariaDB [test]> show status like 'wsrep_cluster%'; +--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 2                                    |
| wsrep_cluster_size       | 2                                    |
| wsrep_cluster_state_uuid | c91d17b6-f72b-11e5-95de-96e95167f593 |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.01 sec)

So, the first basic test with the Galera cluster of 2 nodes (both running on the same box) built from current source of Galera and MariaDB 10.1.x on Fedora 23 is completed successfully. I plan to play with it more in the future, use current xtrabackup built from source for SST and so on, and create blog posts about these steps and any interesting tests in this setup. Stay tuned.

From the dates above you can conclude that it took me 3 weeks to publish this post. That's because I was busy with the company meeting in Berlin and some usual Support work, and was not sure is it really a good idea for me to write any post with "Galera" or "MariaDB" words used in it even once...