Showing posts with label Oracle. Show all posts
Showing posts with label Oracle. Show all posts

Sunday, July 1, 2018

What's Right and What's Wrong With Oracle's Way of MySQL Server Development

Recently it's quite common to state that "Oracle's Acquisition Was Actually the Best Thing to Happen to MySQL". I am not going to argue with that - Oracle proved over years that they are committed to continue active development of this great open source RDBMS, and they have invested a lot into making it better and implementing features that were missed or became important recently. Unlike Sun Microsystems, they seem to clearly know what to do with this software to make it more popular and make money on it.

Among the right things Oracle does for MySQL server development I'd like to highlight the following:
  1. MySQL server development continues, with new features added, most popular OSes supported, regular releases happened and source code still published at GitHub under GPL license.
  2. Oracle continues to maintain public MySQL bugs database and fix bugs reported there.
  3. Oracle accepts external contributions to MySQL server under clear conditions. They acknowledge contributions in public. The release notes, in particular, mention authors of each community-provided patch.
  4. Oracle works hard on improving performance and scalability of MySQL server.
  5. Oracle tries to provide good background for their new MySQL designs (check new InnoDB redo logging, for example).
  6. Oracle cooperates with MySQL Community. They organize their own community events and participate in numerous related conferences, including (but not limited to) the biggest Percona Live ones. Oracle engineers speak and write about their work in progress. Oracle seems to actively support some open source tools that work with MySQL server, like ProxySQL.
  7. Oracle still keeps and maintains pluggable storage engine architecture and plugin APIs, even though their own development is recently mostly related to InnoDB storage engine.
  8. Oracle still maintains and improves public MySQL Manual.
So, Oracle is doing good with MySQL, and dozens of their customers, community members and MySQL experts keep stating this all the time. But, as a former and current "MySQL Entomologist" (somebody who worked on processing MySQL bug reports from community and reported MySQL bugs for 13 years), I clearly see problems with the way Oracle handles MySQL server development. I write and speak about these problems in public since the end of 2012 or so, and would like to summarize them in this post.
MySQL's future is bright, but there are some clouds


Here is the list of problems I see:
  1. Oracle does not develop MySQL server in a true open source way.
  2. Oracle does not care enough to maintain public bugs database properly.
  3. Some older MySQL features remain half-backed, not well tested, not properly integrated with each other and new features, and not documented properly, for years.
    In general, Oracle's focus seem to be more on new developments and cool features for MySQL (with some of them got ignored and going nowhere with time).
  4. Oracle's internal QA efforts still seem to be somewhat limited.
    We get regression bugs, ASAN failures, debug assertions, crashes, test failures etc in the official releases, and Oracle MySQL still relies a lot on QA by MySQL Community (while not highlighting this fact that much in public).
  5. MySQL Manual still have many details missing and is not fixed fast enough.
    Moreover, it is not open source, so there is no other way for community to fix or improve it other than add comments or report documentation bugs, and wait.
In the upcoming weeks I am going to explain each of these items in a separate post, with some links to my older blog posts, MySQL server bug reports and other sources that should illustrate my points. In the meantime I am open for comments from those who disagree with the theses presented above.

Sunday, April 9, 2017

Accessing Oracle tables via MariaDB CONNECT engine and ODBC

In my previous working environment it was typical to consider MariaDB Server as just a set of "random" storage engines mixed together, sometimes for unclear reasons and with questionable extra value. But even at that times I was already impressed by some features of one of the engines supported only by MariaDB, CONNECT. PIVOT table type specifically was of interest in a couple of customer cases I had to work on, but I quickly found out that the engine does not work with Oracle's MySQL or Percona Server, and thus had to forget about it for some time.

Recently while working in MariaDB I've got several more chances to deal with some new CONNECT use cases (and some related problems, that ended up mostly minor or caused by ignorance). This engine can be (and is, widely) used to access tables from virtually any other database management system or data source, as long as there is an ODBC or JDBC driver for it. I had to set up a testing environment to reproduce some of the problems noted while accessing remote tables in Oracle RDBMS, and eventually decided to document setup steps, mistakes and findings for myself and my readers.

I think the topic may require a series of posts, and in this first one I plan to concentrate on creating a basic testing environment to access Oracle RDBMS via ODBC driver and CONNECT engine on my main testing box running Fedora 25, and resolving simplest access issues, like those I described in MDEV-12355 (ended as not a bug thanks to quick help and clarifications from engine creator, Olivier Bertrand).

For pure testing purposes I had not planned to spend time installing and setting up Oracle RDBMS on that Fedora 25 box directly (later I tried and failed), but decided to rely on some existing Docker image. So, I started with installing Docker and dependencies (see this article for some hints, if needed):
[openxs@fc23 ~]$ sudo dnf install docker
...
Installed:
  container-selinux.noarch 2:2.10-1.fc25
  docker.x86_64 2:1.12.6-6.gitae7d637.fc25
  docker-common.x86_64 2:1.12.6-6.gitae7d637.fc25
  oci-register-machine.x86_64 0-2.7.gitbb20b00.fc25
  oci-systemd-hook.x86_64 1:0.1.6-1.gitfe22236.fc25
  skopeo-containers.x86_64 0.1.17-1.dev.git2b3af4a.fc25

Complete!

[openxs@fc23 ~]$ sudo systemctl start docker
I am going to skip details on images that had not worked well (not a topic for this blog). Eventually I ended up working with https://hub.docker.com/r/sath89/oracle-12c/:
[openxs@fc23 server]$ sudo docker run -d -p 8080:8080 -p 1521:1521 -e ORACLE_ALLOW_REMOTE=true sath89/oracle-12c:latest
4515bc4a6fb4804ac8d714115de12e47a9e8f9baabb2300cde672aedb4843c2b
[openxs@fc23 server]$ sudo docker ps
CONTAINER ID        IMAGE                      COMMAND             CREATED             STATUS              PORTS                                            NAMES
4515bc4a6fb4        sath89/oracle-12c:latest   "/entrypoint.sh "   14 seconds ago      Up 8 seconds        0.0.0.0:1521->1521/tcp, 0.0.0.0:8080->8080/tcp   adoring_turing

[openxs@fc23 server]$ sudo docker logs -f 4515bc4a6fb4
ls: cannot access /u01/app/oracle/oradata: No such file or directory
Database not initialized. Initializing database.
Starting tnslsnr
Copying database files
1% complete
3% complete
11% complete
18% complete
26% complete
37% complete
Creating and starting Oracle instance
40% complete
45% complete
50% complete
55% complete
56% complete
60% complete
62% complete
Completing Database Creation
66% complete
70% complete
73% complete
85% complete
96% complete
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/xe/xe.log" for further details.
Configuring Apex console
Database initialized. Please visit http://#containeer:8080/em http://#containeer:8080/apex for extra configuration if needed
Starting web management console
PL/SQL procedure successfully completed.
Starting import from '/docker-entrypoint-initdb.d':
found file /docker-entrypoint-initdb.d//docker-entrypoint-initdb.d/*
[IMPORT] /entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
Import finished
Database ready to use. Enjoy! ;)
^C
This image initializes new database by default (unless you pass a volume on host, then it is initialized for the first time if the database is not there), so you have to wait (maybe for many minutes) and check the logs before trying to access the database. Note that container's port 1521 is mapped to host's port 1521 (and my Fedora 25 IP address, 192.168.1.85, is then used later below to access "remote" Oracle server on this port).

So, my "Oracle server" is ready (probably). Now, my Fedora 25 host will work as a client, so I had downloaded and installed basic Oracle client software RPMs (including ODBC driver and SQL*Plus) from OTN:
[openxs@fc23 maria10.1]$ rpm -q -a | grep oracle
oracle-instantclient11.2-sqlplus-11.2.0.4.0-1.x86_64
oracle-xe-11.2.0-1.0.x86_64
oracle-instantclient11.2-basic-11.2.0.4.0-1.x86_64
oracle-instantclient11.2-odbc-11.2.0.4.0-1.x86_64
I also installed unixODBC (including -devel, as I built MariaDB from source and one needs unixODBC-devel to have CONNECT engine build properly) as fine manual stated at the beginning:
[openxs@fc23 maria10.1]$ rpm -q -a | grep unixODBC
unixODBC-2.3.4-3.fc25.x86_64
unixODBC-devel-2.3.4-3.fc25.x86_64
One has to set up tnsnames.ora for Oracle client to work with (remote) instance, and /etc/odbcinst.ini + /etc/odbc.ini to inform unixODBC about the known drivers and system data sources (this discussion had some useful hints):
[openxs@fc23 maria10.1]$ cat /etc/oracle/tnsnames.ora
XE =
 ( DESCRIPTION =
  (ADDRESS_LIST =
  (ADDRESS =
  (PROTOCOL = TCP)
  (Host = 192.168.1.85)
  (Port = 1521)
  )
 )
 (CONNECT_DATA = (SID = XE)
 )
)

[openxs@fc23 maria10.1]$ cat /etc/odbcinst.ini
...

[OracleODBC]
Description = Oracle ODBC driver for Oracle 11g
Driver64 = /usr/lib/oracle/11.2/client64/lib/libsqora.so.11.1
FileUsage = 1
Driver Logging = 7

[openxs@fc23 maria10.1]$ cat /etc/odbc.ini
[oracle]
Driver = OracleODBCDSN = OracleODBC
ServerName = XE
UserID = system
Password = oracle
 Surely, I tried to access my Oracle server in Docker container via sqlplus command first:
[openxs@fc23 server]$ /usr/lib/oracle/11.2/client64/bin/sqlplus system/oracle@localhost:1521/xe.oracle.docker/usr/lib/oracle/11.2/client64/bin/sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory
[openxs@fc23 server]$ ldd /usr/lib/oracle/11.2/client64/bin/sqlplus
        linux-vdso.so.1 (0x00007ffe8813b000)
        libsqlplus.so => not found
        libclntsh.so.11.1 => not found
        libnnz11.so => not found
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f0aadc93000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f0aad98a000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0aad76a000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f0aad551000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f0aad18b000)
        /lib64/ld-linux-x86-64.so.2 (0x0000563b14bfb000)
This is what happens when one does not care to read all the details and forgets to set some environment variables and inform dynamic loader about libraries location. So, I did the following to resolve the problem:
[openxs@fc23 server]$ export ORACLE_HOME=/usr/lib/oracle/11.2/client64
[openxs@fc23 server]$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib
[openxs@fc23 server]$ export PATH=$PATH:$ORACLE_HOME/bin
[openxs@fc23 server]$ sudo ldconfig
[openxs@fc23 server]$ ldd /usr/lib/oracle/11.2/client64/bin/sqlplus
        linux-vdso.so.1 (0x00007ffd31b50000)
        libsqlplus.so => /usr/lib/oracle/11.2/client64/lib/libsqlplus.so (0x00007f88f680f000)
        libclntsh.so.11.1 => /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1 (0x00007f88f3ea0000)
        libnnz11.so => /usr/lib/oracle/11.2/client64/lib/libnnz11.so (0x00007f88f3ad3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f88f38b2000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f88f35a9000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f88f3389000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f88f3170000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f88f2daa000)
        libaio.so.1 => /lib64/libaio.so.1 (0x00007f88f2ba8000)
        /lib64/ld-linux-x86-64.so.2 (0x000056123639a000)

[openxs@fc23 server]$ sqlplus system/oracle@localhost:1521/xe.oracle.docker    
SQL*Plus: Release 11.2.0.4.0 Production on Fri Mar 24 11:00:21 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production

SQL> select count(*) from scott.emp;

  COUNT(*)
----------
        14

SQL> create table scott.t1(c1 number(30));

Table created.

SQL> insert into scott.t1 values (12345);

1 row created.

SQL> desc scott.t1;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 C1                                                 NUMBER(30)

SQL>

SQL> exit
Disconnected from Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production
So, SQL*Plus (that I still miss sometimes, after using it maybe only 10 times over last 12 years) client works. Let's try to access Oracle via ODBC and isql now:
[openxs@fc23 server]$ isql -v oracle
[08004][unixODBC][Oracle][ODBC][Ora]ORA-12154: TNS:could not resolve the connect identifier specified

[ISQL]ERROR: Could not SQLConnect
Surely, one more environment variable is not set, the one pointing out the location of tnsnames.ora file:
[openxs@fc23 server]$ export TNS_ADMIN=/etc/oracle
[openxs@fc23 server]$ isql -v oracle
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select * from scott.t1;
+---------------------------------+
| C1                              |
+---------------------------------+
| 12345                           |
+---------------------------------+
SQLRowCount returns -1
1 rows fetched
SQL> select count(*) from scott.emp;+-----------------------------------------+
| COUNT(*)                                |
+-----------------------------------------+
| 14                                      |
+-----------------------------------------+
SQLRowCount returns -1
1 rows fetched
SQL> quit
[openxs@fc23 server]$

Looks like now we should be able to use ODBC table type of CONNECT storage engine of MariaDB (that was already loaded with INSTALL SONAME 'ha_connect.so'; check the details here) to access remote Oracle table:
MariaDB [test]> create table oracle_t1 engine=connect table_type=ODBC tabname='t1' dbschema='scott' connection='dsn=oracle';
ERROR 1105 (HY000): Cannot get columns from t1
MariaDB [test]> show warnings\G
*************************** 1. row ***************************
  Level: Error
   Code: 1105
Message: Cannot get columns from t1
*************************** 2. row ***************************
  Level: Error
   Code: 1030
Message: Got error 122 "Internal (unspecified) error in handler" from storage engine CONNECT
2 rows in set (0.00 sec)
Do not be like me and do not report bugs when you see these messages! Go read the fine manual that explicitly mentions Oracle as being case sensitive and converting identifiers to upper case unless they are quoted.

As soon as you use proper case in tabname (and dbname) settings, everything works as expected, at least for my primitive use case:
MariaDB [test]> create table t1_oracle engine=connect table_type=ODBC tabname='t1' dbname='scott' connection='dsn=oracle';
ERROR 1105 (HY000): Cannot get columns from t1
MariaDB [test]> create table t1_oracle engine=connect table_type=ODBC tabname='T1' dbname='scott' connection='dsn=oracle';ERROR 1105 (HY000): Cannot get columns from T1
MariaDB [test]> create table t1_oracle engine=connect table_type=ODBC tabname='T1' dbname='SCOTT' connection='dsn=oracle';Query OK, 0 rows affected (1.29 sec)

MariaDB [test]> select * from t1_oracle;
+-------+
| C1    |
+-------+
| 12345 |
+-------+
1 row in set (0.07 sec)

MariaDB [test]> create table t2_oracle engine=connect table_type=ODBC tabname='SCOTT.T1' connection='dsn=oracle';
Query OK, 0 rows affected (3.54 sec)

MariaDB [test]> select * from t2_oracle;
+-------+
| C1    |
+-------+
| 12345 |
+-------+
1 row in set (0.06 sec)
To summarize, with some initial setup efforts (that should include reading the official CONNECT manual) it is easy to access Oracle tables from MariaDB server via ODBC driver. It may be very useful if you plan to migrate or use some data from Oracle RDBMS.

As a side note, Docker really rocks sometimes. I spent maybe 15+ years installing Oracle RDBMS (starting from version 7.0.16 or so) on all kinds of OSes dozens of times, and I do not miss this experience and do not mind to have it up and running with one simple command any time (rarely) I need it.

Saturday, November 12, 2016

MySQL Support Engineer's Chronicles, Issue #3

The original idea of this series was to publish one post per week, but it seems every other week I have some special topic that well deserves a dedicated post. Last week I had no time to complete my writing because of long (and, I hope, useful) Howto post on replacing corrupted partition using non-corrupted one from other server in replication setup. But I had links and notes collected in a draft that I am going to complete now.

First of all, during the previous week I had time to submit two more talks for the  "MySQL and Friends Devroom" at FOSDEM 2017, "Applying profilers to MySQL" and "Background Story of a MySQL Bug". Call for papers is still open, as far as I understand, so I may come up with few more ideas on what to talk about.

Strange to admit this, but sometimes I can be excited with something MySQL-related. Two weeks ago I've added a note to myself about great, detailed changelogs that MariaDB publishes, with proper links to GitHub commits. Check this one for 10.0.28, for example. I wish Oracle provides the same level of details for MySQL releases as a part of their official release notes.

Still, sometimes important changes happen in upstream MySQL, get merged, and details about inherited incompatible change (and its real impact) are still missing in any release notes of any kind. Let's consider recent example. Historically MySQL treated incorrect utf8 bytes sequences differently for INSERT than for LOAD DATA. INSERT failed, LOAD DATA just had the value truncated on the first incorrect character and continued. Eventually (in MySQL 5.6.32) this was fixed by upstream MySQL (it was also fixed in a wider context in MariaDB 10.2 in frames of MDEV-9823). MySQL 5.6.32 release notes says about the incompatible change:
  • "Incompatible Change: For multibyte character sets, LOAD DATA could fail to allocate space correctly and ignore input rows as a result. (Bug #76237, Bug #20683959, Bug #23080148)
    References: This issue is a regression of: Bug #14653594."
But it surely says nothing about the impact for replication or another Bug #78758 that is fixed in 5.6.32 (by making LOAD DATA fail with error). It costed me some time to figure out all the details. Read MDEV-11217 for the historical details, nice test case and different views on the decisions made. Note also that now error message about the bad utf8 character from LOAD DATA looks (IMHO) weird enough, as it contains actually only a valid part of the string. See my MDEV-11216 about this.

I am still having fun with InnoDB locks. This week I checked what locks are set when the same DELETE (for the table with the primary key and unique secondary index) is executed twice in the same transaction. Check Bug #83640 and tell me how this sequence of locks set in one transaction may make any sense:
---TRANSACTION 636201, ACTIVE 202 sec
5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 1
MySQL thread id 1, OS thread handle 0x7f9e513a7700, query id 92 localhost root init
show engine innodb status
TABLE LOCK table `test`.`tu` trx id 636201 lock mode IX
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 3 n bits 72 index `PRIMARY` of table `test`.`tu` trx id 636201 lock_mode X locks rec but not gap
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X
RECORD LOCKS space id 11 page no 4 n bits 72 index `u` of table `test`.`tu` trx id 636201 lock_mode X locks gap before rec
My colleague Jan Lindström was also surprised, so we have MDEV-11215 as well, and a chance to see this studies and maybe changed by MariaDB engineers. Related problems were discussed in the past, see Bug #19762 and Bug #55717.

Some days I keep wondering what happens to XtraBackup these days in Percona. As far as I remember I was not able to reproduce lp:1461833 while working there, but the fact that the bug is still open and got no comments since I re-opened it gives me no additional confidence.

I report bugs and missing details in MySQL way too often even for my own liking. But MySQL manual really misses many details to explain results that users see in production. This week I'd like to remind about one of my bug reports about missing details in MySQL documentation, Bug #77390, and my request there:
"Please, explain all metadata and InnoDB locks set by online ALTER, with examples and details enough to explain non-trivial cases..."
Honestly, until this month I never noted that DDL log exists in MySQL. This is a ddl_log.log binary file that can be "dumped" into a somewhat readable form using a script by Mattias Jonsson from Bug #47343:
[openxs@fc23 5.7]$ perl ~/ddl_log_dump.pl data/ddl_log.log
Header: Num entries: 4 Name length: 512 Block length 4096
Entry 1 type i action s next 0
  name ./test/trange2
not 'e' entry (i)
Entry 2 type l action d next 0
  name ./test/#sql-trange2
not 'e' entry (l)
Entry 3 type l action d next 2
  name ./test/trange2#P#pmax
not 'e' entry (l)

This file may grow until MySQL server restart completes, but what's worse, when it grows over 4GB in size it becomes unusable and effectively blocks any further concurrent DDL until we get rid of it. I had a lot of fun reading the code and reporting related Bug #83708. Unlucky users who do a lot of partitioning-related DDL may find the situation less funny when they hit this bug.


I plan to describe what I had to work on this week soon, while I still remember all the relevant details and feelings. So, stay tuned!

Sunday, June 2, 2013

Fun with Bugs #8 - what's wrong with Oracle's way of handling public MySQL bugs database

Many people seem unhappy with the way Oracle develops MySQL. I am not one of them. I think very few really important things are missing and in this post I'd like to concentrate on one of them: having internal and public bugs databases not in sync in too many cases.

Let me quote myself to explain where problem starts:

"Now the most important thing you should know about MySQL bugs processing the way it is done now in Oracle. When bug is "Verified" and(!) considered serious enough, it is copied to the Oracle internal bugs database and further processing, including status changes etc, is done there. All further comments to the public bug report are then copied to internal bug report automatically, but no comments or status changes in internal bug report are copied to public bug in any other way but by explicit action of some Oracle engineer."
Why is this a problem? Check Bug #42415 for example. It had a lot of activity and even patches pushed until December, 2010, but it is still "Verified". The problem is real and there is even a workaround implemented for cases like this in Percona Server. On October 2012 Sherii asked for status update:

"It says a patch has been committed, but the "status" is "patch pending". Was this actually put in a version of MySQL? If so, which version?

I am interested in this fix, and put in my "vote" to get this fixed."
Then bug became just "Verified" and we know it is not fixed. But why, what prevents this or when it will be fixed? Nobody knows. Well, I have a hint based on a comment from former colleague upon explicit request... But what if you are NOT hanging around on my Facebook page on a regular basis and you are NOT an Oracle customer? You may have to just wait fishing in the dark or keep asking in public (or investing time into workarounds for this problem in your software, or keep telling others that Oracle "kills" MySQL and creating foundations to "save" it, up to you to decide).

Anything wrong with this? I am not sure it's totally wrong in general. There should be some reason for you to become Oracle customer, and having access to more information may be a good enough reason (having a way to request and even demand fixes you need is even better reason). But I think this is wrong if in the corresponding bug in the internal Oracle database there is public comment explaining current status of the bug.

This leads to my point: public bug reports are usually NOT in sync with corresponding internal bug reports.

Status may be not in sync and useful public (and not any confidential or security-related) comments in the internal bugs database are not accessible to the community until some kind soul in Oracle closes the public bug and documents the fix. We know from some examples that this may not happen for months after the bug is really closed. Public bug may be declared a "Duplicate" of something that we know nothing about, check Bug #65326 (here I've got no hints after public request unfortunately), and so on.

How this can be solved (assuming Oracle is going to solve this and still plan to rely on public bugs database and community QA efforts - who knows if they really want these and for how long)? I see two options:

  1. Automatically copy back (with a delay, with some manual approval work if needed) all public comments and status changes from internal bug to corresponding public bug report. Technically this is probably easy and there should be scripts almost ready (as forwarding from public bugs database is automated by scripts). Exceptions like security bugs are easy to identify. Delayed replication should help to prevent mistakes. I asked for this more than once when I still worked in Oracle...
     
  2. Manually copy back all essential comments and status changes. This is probably a full time job for an engineer or technical writer (or maybe even several), or member of Bugs Verification team, or some independent contractors with non-disclosure agreement signed and trainings passed. I'd happily participate in this kind of activity as an independent contractor in my free time and, I hope, Percona would not be against this.

If nothing like this (investment of time or costs from Oracle side to increase value of public bugs database, to put it simple) happens any time soon it will be a clear indication (for me alt least) of decreasing interest in community bug reports and QA efforts in general. Small step to the direction I personally do not like.

Sunday, May 26, 2013

Fun with bugs #7, still mostly about 5.6.11

It looks like now Oracle will release new 5.6.x every 2 months, so while I'd happily write about bugs fixed in 5.6.12 we still have to wait for the official release to happen. I am too impatient to wait for 5.6.12, so let me write this post of a classical kind - just a summary of my MySQL bugs' related posts on Facebook since the previous one.

I have to start with replication-related bugs in 5.6.11 pointed out by Giuseppe Maxia:
  • Bug #69095 -  replication fails with GTID enabled and master changes from SBR to RBR. This bug is "Verified" since April 30, but got no public comments since that.
  • Bug #69135 - mysql.slave_master_info is not updated. This serious bug is verified since May 6, but, again, got no visible attention since that.
It's really sad, but looks like most of new replication features (GTID-based replication and "crash safe" replication are among them) in 5.6 works well only in simple cases and, in general, were not really well tested before GA (no matter what explicit or hidden messages about MySQL-related QA activities Oracle makes).

While searching for something else, I've noted Bug #60101 in "Patch pending" status for more than 2 years. As usual, I've complained in public and got this bug... "Closed" with a comment saying nothing about specific versions where the fix first appeared. Well done, Sinisa, really, but this bug report is still a yet another example showing that Oracle does NOT always care to keep public bugs database in sync with real development and follow their own procedures of bugs processing. Unfortunately...

And now back to performance that 5.6.x is all about. Mark Callaghan had not only reported in public about his findings on singe-thread performance of 5.6.11 (in Bug #69236), but later had provided a detailed enough analysis showing that it's not only a client or UTF-8 related problem, but something else as well... Still the bug is "Open" since May 15. I had written about it at least 3 times, but it seems this does not help any more to get anybody's attention. I wonder should I become more annoying blogger again and publish new "Fun with Bugs" issue every week.

Obviously I was so impressed by "300 man-years of QA experience" Oracle now invests into MySQL (statement from famous keynote session Tomas Ulin gave during PLMCE 2013) that now I pay extra attention to every bug report that points out some problem with tests that QA engineers are supposed to run. Here are the recent examples:

Bug #69252 - All the parts.partition_max* tests are broken with MTR --parallel. If nobody ever tried to run these tests with --parallel (as it seems), then we can safely assume that nobody cared to run them at all in production, as "--parallel" feature was added (long time ago) to make test runs faster in production environment...
Bug #69265 - -DBUILD_CONFIG=mysql_release -DWITH_DEBUG=ON fails 4 and skips 27 MTR tests. Similar situation - if nobody runs tests on debug builds on a regular basis (as it seems), what QA is really doing instead? This bug report is still "Open".

One may say that all these problems are known internally and are in the internal bugs database (reported long time ago), but I have no way to verify this kind of statement, so feel free to ignore it if it ever made. Make your own conclusions, but personally I do not understand how or why quality may increase without regular testing. Fortunately, many community members had NOT given up yet and continue to report bugs and test failures - this gives me some more hope to see MySQL 5.6 as the "best release ever" one day.

On a positive side, InnoDB team rocks as usual when they see useful bug report. Check Bug #69276 - it was accepted and verified by developers in a matter of hours.

Surely there are things to note in older MySQL server versions as well. Recently my Bug #67259 had got more evidence from my colleague (as customers continue to hit it or something similar to it), so note that the problem is still repeatable on Oracle's MySQL 5.5.31 as well.

I'd like to end this post with a word of warning about Federated storage engine. It is still able to provoke crashes, like in Bug #68354, and users hit these crashes in production. Unfortunately Oracle is hardly going to fix any bug in this storage engine (or any other storage engine but InnoDB actually), so here MariaDB may be our only hope. Others would just happily remove it once and forever, while I am not sure there are good alternatives for all use cases it covers...

That's all for now. I skipped "funny" reports, "not a bugs" and hot discussions not really interesting to anybody but participants this time. Let's hope next release of "Fun with Bugs" will be about great fixes in MySQL 5.6.12.