Blog of (former?) MySQL Entomologist: Fun with Bugs #48 - Group Replication Bugs and Missing Features as of MySQL 5.7.17

It seems recent post on Group Replication by Vadim caused an interesting discussion on Facebook. I am NOT going to continue it here, but decided to present some facts, specifically, list of public bug reports and feature requests for Group Replication (mostly "Verified", that is, accepted by Oracle engineers as valid) as of MySQL 5.7.17 (where the feature is promoted as GA), with just few comments to some of the bugs.

The goal is to double check this post when next Oracle MySQL 5.7.x release appears, to find out how much Oracle carews to fix the problems already identified by MySQL Community.

So, here are the bugs and feature requests in MySQL Server: Group Replication category (I've included only those reported for 5.7.17 into this post), as of today:

Bug #84315 - "add mysql group replication bootstrap option to init script". Fair enough, we have a way to boottrap a cluster for a node in all implementations of Galera cluster. This feature request is still "Open".
Bug #84329 - "Some Class B private address is not permitted automatically". Manual seems to say it should be.
Bug #84337 - "Configure mgr with out root user encounter an error". Seems to be a duplicate of Bug #82687, that is, known for quite a some time before GA.
Bug #84367 - "START GROUP_REPLICATION should be possible without error". You get error if it was already started. Not a big deal, but some optional clause like IF NOT STARTED may help (in a similar way to DROP TABLE ... IF EXISTS) to code scripts. Still "Open".
Bug #84710 - "group replication does not respect localhost IP". may be a problem for some MySQL Sandbox setups in a hope to test group replication.
Bug #84727 - "GR: Partitioned Node Should Get Updated Status and not accept writes". Writes on partitioned node are accepted and hang (forever?). As a side note, I think Kenny Gryp deserves a special recognition as an early adopter of Group Replication feature who cared to report many problems noted in the process.
Bug #84728 - "Not Possible To Avoid MySQL From Starting When GR Cannot Start". We need this fixed to avoid split brain situations by default.
Bug #84729 - "Group Replication: Block Reads On Partitioned Nodes". In Galera reads are blocked by default when node is not considered a member of cluster.
Bug #84730 - "Feature: Troubleshoot Group Replication Transaction Rollbacks". You can get a lot of information about conflicts in case of Galera. Sorry that I have to compare, but when discussing the ways of dealing with common problems in cluster environments related to MySQL one can not ignore existing solutions (NDB clusters and Galera clusters), so I just picked up a (somewhat similar) technology that I know and used (a bit). I think readers will do the same, try to base their conclusions on known examples.
Bug #84731 - "Group Replication: mysql client connections hang during group replication start". There is no reason to hang more than needed. We should just make sure reads and writes are NOT accepted until the node is a member of cluster and in sync.
Bug #84733 - "Cannot Start GR with super_read_only=1". But this setting may be needed to make sure that there is only one master node in cluster, no matter what happens ot them...
Bug #84773 - "Flow control slows down throughput while a node join the cluster". Let me quote: "This is done this way by design, that is, request group to slow down while the member is fetching and applying data."
Bug #84774 - "Performance drop every 60 seconds". Now this sounds like a performance problem to work on, maybe by adding some tuning options.
Bug #84784 - "Group Replication nodes do not rejoin cluster after network connectivity issues". It would be really nice for nodes to try to re-join the cluster in case of short term connectivity issues. Galera nodes do not give up that fast. The bug is still not verified.
Bug #84785 - "Prevent Large Transactions in Group Replication". Galera somehow allows to limit transaction size. Not that there were no related bugs, but still options exist.
Bug #84792 - "Idle GR Cluster: Member CPU hog". Not yet verified, but it seems in some cases node can use a notable share of CPU time for no clear/good reason.
Bug #84794 - "Cannot kill query inside GR". Weel, you can do STOP GROUP_REPLICATION, but then it can be dangerous, because...
Bug #84795 - "STOP GROUP_REPLICATION sets super_read_only=off" - the node with stopped replication may allow to change the data...
Bug #84798 - "Group Replication can use some verbosity in the error log". Galera cluster nodes are too verbose, one gets kilometers of log records about everything, anything and nothing. Still, better to skip some usual outputs in the logs than get no evidence at all on what was going on...
Bug #84900 - "Getting inconsistent result on different nodes". Now, this is really unfortunate (for a "cluster") and somewhat similar problem was reported by Vadim before, see Bug #82481 and was supposed to be fixed. Anyway, the inconsistency is repatable and looks somewhat scary.
Bug #84901 - "Can't execute transaction on second node". Best practice is to write on one and only one node (see Bug #83218). An attempt to write on the other node may fail...

Now, make your own conclusions about the maturity of Group Replication in MySQL 5.7.17. I manage to avoid it so far, as I try to keep myself as far from any kinds of "clusters" as possible... Had not worked well with Galera, unfortunately - I have to read its verbose logs on a daily basis.

Blog of (former?) MySQL Entomologist

Thursday, February 16, 2017

Fun with Bugs #48 - Group Replication Bugs and Missing Features as of MySQL 5.7.17

1 comment: