Showing posts with label Percona Server. Show all posts
Showing posts with label Percona Server. Show all posts

Tuesday, January 26, 2021

What mysql_upgrade really does in MariaDB, Part II, Bugs and Missing Features

Both /proc sampling and bpftrace are cool topics to write about, but I should not forget my third upcoming talk at FOSDEM 2021 in less than 2 weeks, "Upgrading to a newer major version of MariaDB", that is mostly devoted to mysql_upgrade internals and is based on this blog post. Today I am going to provide some more background details and, probably for the first time, act as MariaDB entomologist and study some bug reports and feature requests related to mysql_upgrade (a.k.a. mariadb-upgrade in 10.5+) that were added over last 10 months or so.

I tried different ways to search for MariaDB JIRA issues related to mysql_upgrade. For examle, this is how I tried to find any bugs in mysql_upgrade closed over last two months (I recall there were few):

text ~ mysql_upgrade AND project = "MariaDB Server" AND status = Closed AND createdDate >= "2020-12-01" ORDER BY createdDate DESC

I've got a list of 7 reports, with 2 relevant bugs that are already fixed:

  • MDEV-24566 - "mysql_upgrade failed with "('mariadb.sys'@'localhost') does not exist" and mariadb 10.4/10.5 on docker". This is fixed in current docker images at https://hub.docker.com/r/mariadb/server. The problem was related to the Docker image only, and mysql_upgrade was actually affected by the initial database content there.
  • MDEV-24452 - "ALTER TABLE event take infinite time which for example breaks mysql_upgrade". Now this was a real blocker bug in recent 10.5. If any event managed to start before you executed mysql_upgrade, the utility and proper upgrade process was blocked. Good to see this fixed in upcoming 10.5.9.

So, Monty really fixes related bugs when they are reported, as promised. Let's consider now the following query, still not closed issues reported since April 1, 2020:

text ~ "mysql_upgrade" AND project = "MariaDB Server" AND status != Closed AND createdDate >= "2020-04-01" ORDER BY createdDate DESC

I checked then one by one (as they may be related to upgrade process but not to mysql_upgrade specifically) and placed into two lists, bugs and feature requests. Let me start with tasks (feature requests), so you know what was kind of missing by design or has to be done differently:

  • MDEV-24586 - "remove scripts/mysql_to_mariadb.sql". Actually proper logic is implemented in scripts/mysql_system_tables_fix.sql that forms part of mysql_upgrade, so separate script is no longer needed.
  • MDEV-24540 - "Detect incompatible MySQL virtual columns and report in error log." MariaDB does not support migration from MySQL-generated physical database tables containing virtual columns, and produces column mismatch errors, failures in mysql_upgrade etc. It would be great to have proper error messages from mysql_upgrade in this case, explaining the real problem and possible solutions (dump, drop and reload or whatever).
  • MDEV-24453 - "mysql_upgrade does not honor --verbose parameter". It is not passed to other binaries called and this may make debugging upgrade issues more complex.
  • MDEV-24316 - "cross upgrade from MySQL - have precheck tool". According to the reporter, Daniel Black, the goal would be to check on a database/global scale looking at tables, at features used, at settings, at character sets in table and determine the "migratablilty" of a given MySQL instance. I voted for this feature!
  • MDEV-24093 - "Detect during mysql_upgrade if type_mysql_json.so is needed and load it". After MDEV-18323, MYSQL_JSON type is available as a dynamically loadable plugin. To make mysql_upgrade runs seamlessly we need to make sure it is loaded appropriately and unloaded when done with upgrade). This is already in review, so will be implemented really soon.
  • MDEV-23962 - "Remove arc directory support". I think only Monty (bug reporter) knows what is this about. I don't :)
  • MDEV-23008 - "store mysql_upgrade version info in system table instead of local file". One of the really important feature requests from my colleague since 2005, Hartmut Holzgraefe.
  • MDEV-22357 - "Clearing InnoDB autoincrement counter when upgrading from MariaDB < 10.2.4". CHECK TABLE ... FOR UPGRADE should work differently for InnoDB tables, for mysql_upgrade to work properly.
  • MDEV-22323 - "Upgrading MariaDB". This is the "umbrella" task to cover all problematic  cases of MySQL to MariaDB, Percona Server to MariaDB and minor MariaDB server upgrades.
  • MDEV-22322 - "Percona Server -> Mariadb Upgrades". Summary of all the related issues. See MDEV-22679 etc.

Now back to more or less serious bugs that are still waiting for the fix:

  • MDEV-24579 - "Error table->get_ref_count() after update to 10.5.8". It seems DDL executed on mysql.* tables with InnoDB persistent statistics (like those executed by mysql_upgrade!) may cause problems for concurrent queries (up to assertion failure in this case). So we either should remove those tables (I wish!) or do something with mysql_upgrade, or (IMHO even better) do not let users connect and execute queries while mysql_upgrade is running, like MySQL 8 does when the server is started for the first time on older datadir and performs upgrade. Take care in the meantime...
  • MDEV-23652 - "Assertion failures upon reading InnoDB system table after normal upgrade from 10.2". Now this is a real bug :) Assertion failure during mysql_upgrade, this is surely something to fix!
  • MDEV-23636 - "mysql_upgrade [ERROR] InnoDB: Fetch of persistent statistics requested for table". I am not sure what's going on here, and why.
  • MDEV-23392 - "main.mysql_upgrade_view failed in buildbot with another wrong result". MTR test case failures is something to care about.
  • MDEV-22683 - "mysql_upgrade misses some changes to mysql schema". Over ttime and different versions, some structures in mysql schema get changed, but not all the changes make it to the scripts executed by mysql_ugrade. As a result a schema freshly created by mysql_install_db on a versions 10.x.y differs from a schema created on an earlier version and upgraded to 10.x.y by mysql_ugrade. The real diffs are listed, per version, in this bug report from Elena Stepanova.
  • MDEV-22655 - "CHECK TABLE ... FOR UPGRADE fails to report old datetime format". That's my favorite, unfortunately. Makes running mysql_upgrade useless for some cases of upgrade from pre-5.6 MySQL versions and leads to problems for tables partioned by datetime etc columns. See also MDEV-24499 - "Server upgrade causes compound index and related query to fail.".
  • MDEV-22645 - "default_role gets removed when migrating from 10.1 to 10.4". May have something to do with mydumper/myloader used, but still a problem.
  • MDEV-22482 - "pam v2: mysql_upgrade doesn't fix the ownership/privileges of auth_pam_tool". No comments.
  • MDEV-22477 - "mysql_upgrade fails with sql_notes=OFF". mysql_ugrade, or, more exactly mysql_system_tables.sql, uses @@warning_count variable in the upgrade logic. The variable, in turn, depends on the value of sql_notes. When it is OFF, @@warnings_count is not incremented, and mysql_upgrade doesn't work as expected.
After Monty's post in April 2020 many new mysql_upgrade bugs were reported, and some were already fixed. So, we are on the way...

* * *

To summarize:

  1. There are still some bugs and missing features in mysql_ugrade.
  2. MariaDB actively works on fixing them, as once promised by Monty.
  3. Check the lists in this blog post if you plan to upgrade to recent MariaDB 10.x.y versions, carefully.
  4. Please, report any problem with mysql_upgrade or upgrades in general to our JIRA.

Monday, July 17, 2017

Fun with Bugs #53 - On Some Percona Software Bugs I've Reported

So far in this series I had written only/mostly about MySQL server bugs. Does it mean that there are no unique or interesting bugs in other MySQL-related software or MySQL forks and projects that use MySQL code?

Surely no, it doesn't. So, today I'd like to present a short list of currently active (that is, new or probably not yet resolved) bugs that I had reported for Percona's software at Launchpad over the years I use it. The list is based on this link, but I excluded minor reports and those that probably are just forgotten, while the problem is fixed long time ago.

Let me start with bugs in XtraBackup, the tool that plays a key role in most MySQL environments that are not using Oracle's Enterprise subscriptions:
  • lp:1704636 - "xtrabackup 2.4.7 crashes while backing up MariaDB 10.2.x with ftwrl-* options". I've reported it yesterday. Probably some unique background thread of MariaDB 10.2.x is not taken into account and some calculations are performed for it that lead to crash. Sorry, but there is no way to use --ftwrl-* options of XtraBackup with MariaDB 10.2.x.
  • lp:1418150 - "Provide a way for xtrabackup to communicate with storage managers via XBSA API". This is a feature request created based on customer's request to store backups in Tivoli Storage Manager. Similar request was recently noted from MariaDB customer as well. MySQL's Enterprise Backup allows to work with storage managers, even if not in the most direct way. I've used TSM a lot in the past in production as Informix DBA, so I care about this feature more than about many others...
Let's continue with another unique software package from Percona, Percona Toolkit. I still use some of the tools every day (and do not hesitate to write about this), and would like to see more attention to the toolkit in areas other than MongoDB. I've reported the following two bugs in the past that are still active:
  • lp:1566939 - "pt-stalk should not call mysqladmin debug by default". Having in mind documentation Bug #71346 that Oracle just refuses to fix, I'd say there may be a dozen of people in the world who can make a good use of the output even when it's correct, useful and really needed. For the rest of us it just pollutes error logs and pt-stalk data collections.
  • lp:1533271 - "pt-online-schema-change does not report anything when it waits for slave to replicate new table creation". The problem is clear enough from description. There is nothing useful to find out the reason for the hang in this case without PTDEBUG=1.
Finally, let's check several Percona Server bugs. Note that in many cases i had to file bug for Percona Server even if upstream MySQL was also affected, in a hope that Percona may fix the problem faster (and this happened sometimes). The following bugs were not inherited from MySQL directly:
  • lp:1455432 - "Audit log plugin should allow to include or exclude logging for specific users". This was request for the same functionality that was originally provided in MariaDB's implementation. Actually the bug was fixed in Percona Server 5.6.32-78.0 and 5.7.14-7, but remains in the list as it is still just "Triaged" for 5.5.x branch. Time to close it completely, I'd say.
  • lp:1408971 - "Manual does not describe Innodb_current_row_locks in any useful way". Nothing changed, check the manual.
  • lp:1401528 - "Add option to switch to "old" get_lock() behavior". It's hard to introduce new features in minor release during GA period, and do it right. Somebody had to think about backward compatibility, upgrades and downgrades, and about the completeness of implementation (see lp:1396336 about that). In this specific case Percona failed to do it right.
  • lp:1396035 - "Percona-Server-test-56 RPM package does NOT check /usr/share/percona-server directory and thus is unusable "as is". Packaging is another thing that is hard to do right. I was not lazy today, so I re-check this on CentOS 6.x. with 5.7.18, and ./mtr is somewhat useful there:

    [root@centos mysql-test]# ls -l ./mtr
    lrwxrwxrwx. 1 root root 19 Jun  7 18:20 ./mtr -> ./mysql-test-run.pl
    [root@centos mysql-test]# perl ./mtr analyze
    Logging: ./mtr  analyze
    MySQL Version 5.7.18
    Checking supported features...
     - SSL connections supported
    Collecting tests...
    Checking leftover processes...
    Removing old var directory...
    Creating var directory '/usr/share/mysql-test/var'...
    Installing system database...
    Using parallel: 1

    ==============================================================================

    TEST                                      RESULT   TIME (ms) or COMMENT
    --------------------------------------------------------------------------

    worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
    worker[1] mysql-test-run: WARNING: running this script as _root_ will cause some tests to be skipped
    main.analyze                             [ pass ]  10920
    --------------------------------------------------------------------------
    The servers were restarted 0 times
    Spent 10.920 of 58 seconds executing testcases

    Completed: All 1 tests were successful.
  • lp:1263968 - "DOCS: http://www.percona.com/doc/percona-server/5.6/installation.html does not provide enough details on further steps after installing .deb". I'd still say that some details requested are missing. Writing proper manuals is hard.
I had surely reported a lot more bugs, but many of them have related upstream MySQL ones, so there are no reasons to discuss them separately now. Also, note that in Percona I was involved in setting priorities for the bugs and mostly reported them based on customer issues or complains. So, no wonder that my bug reports were fixed fast, and not so many active ones left by now.

There was yet another Percona tool that I spent as lot of time on, both on building it from current sources, using it and  reporting problems with it. It's Percona Playback, Based on this, they seem to work on other but similar tool now. Still, I consider this request useful for any tool that allows to replay specific load: lp:1365538.

Monday, February 29, 2016

Exploring Metadata Locks with gdb - Studying Simple Case on Percona Server 5.7

I had forgotten the topic of studying metadata locks with gdb for quite a some time. Now that I've upgraded to Percona Server 5.7 and enjoy the remaining of my last free day as an independent MySQL Support Engineer doing nothing but some blogging, I think it's time to get back to gdb interactive sessions and try to answer the question I've got from customer back in Percona but had no chance to answer.

In a bit rewritten way, it sounded as follows:
"Do ALTER TABLE ... STATS_AUTO_RECALC=1" and the reverse ALTER
TABLE...STATS_AUTO_RECALC=default block select, update, delete, and insert on the table?
"
There are two ways to answer this question. First, we can just try to set up a simple test with 3 sessions and simple table, like this:
mysql> create table ti(id int primary key, c1 int, key(c1)) engine=InnoDB stats_auto_recalc=default;Query OK, 0 rows affected (0.16 sec)

mysql> insert into ti values (1,1), (2,2);
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0
Now, in one session we can SELECT from the table in transaction that is not completed:
mysql> start transaction;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from ti;
+----+------+
| id | c1   |
+----+------+
|  1 |    1 |
|  2 |    2 |
+----+------+
2 rows in set (0.00 sec)
and then in another session run the ALTER TABLE in question:
mysql> alter table ti stats_auto_recalc=1;
to find out in the first session that it actually hangs waiting on some metadata lock:
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                               | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
|  3 | root | localhost | test | Query   |   15 | Waiting for table metadata lock | alter table ti stats_auto_recalc=1 |         0 |             0 ||  4 | root | localhost | test | Query   |    0 | starting                        | show processlist                   |         0 |             0 |
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
2 rows in set (0.00 sec)
So, one of metadata locks set by ALTER TABLE even in this very simple case (when we try to modify the way of statistics recalculation for InnoDB table) even in 5.7 is too strict, so that it conflicts with one of metadata locks set by simple SELECT for the duration of the entire transaction. What's more interesting, now in third session we can try to run whatever statements we want to check (those that were "compatible" with SELECT at metadata locks level) and see them also blocked, like this:
mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                               | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
|  3 | root | localhost | test | Query   |   52 | Waiting for table metadata lock | alter table ti stats_auto_recalc=1 |         0 |             0 |
|  4 | root | localhost | test | Query   |    0 | starting                        | show processlist                   |         0 |             0 |
|  5 | root | localhost | test | Query   |    4 | Waiting for table metadata lock | select * from ti                   |         0 |             0 |+----+------+-----------+------+---------+------+---------------------------------+------------------------------------+-----------+---------------+
3 rows in set (0.00 sec)
This way we can already conclude that ALTER TABLE of this kind does block even SELECT from the table! Good that it runs fast (there is no need to make a table copy), so will not block for a long time, but if ALTER itself is blocked by some transaction (like in my case) - we are still in trouble!

That simple test would give the answer to customer that he could be satisfied with. But I know that ALTER TABLE sets many metadata locks at different stages and while for this trivial one it may not matter much, I still wanted to find some time to attach gdb and see what locks are requested and when/in what order.

I had not done this with 5.7, so tried the same approach as in the initial post in this series (please, check it carefully for the details from the source code, like types of the locks etc), setting breakpoint on MDL_request::init, but it seems this method does NOT exist in MySQL 5.7. So, I ended up with the following breakpoints set:
(gdb) b MDL_context::acquire_lockBreakpoint 1 at 0xbf9083: file /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc, line 3506.
(gdb) b lock_rec_lock
Breakpoint 2 at 0xf24ffd: file /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/storage/innobase/lock/lock0lock.cc, line 2343.
(gdb) c
Continuing.
With these breakpoints in place, I've executed alter table ti stats_auto_recalc=1; and ended up with the following in gdb:
Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1b2df95b0, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);
(gdb) p mdl_request
$1 = (MDL_request *) 0x7fc1b2df95b0
(gdb) p *mdl_request
$2 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_STATEMENT,
  next_in_list = 0x7fc1e27f7f70, prev_in_list = 0x7fc1b2df9778, ticket = 0x0,
  key = {m_length = 3, m_db_name_length = 0,
    m_ptr = "\000\000\000\000L\274q\324\301...
We see something not that much useful immediately, some metadata lock request at the weakest possible level for the duration of the statement. Just to see how we can see all of requests in the same "batch", let's do the following in gdb:
(gdb) p *(mdl_request->next_in_list)
$5 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_TRANSACTION,
  next_in_list = 0x7fc1e27f7d70, prev_in_list = 0x7fc1b2df95b8, ticket = 0x0,
  key = {m_length = 7, m_db_name_length = 4,
    m_ptr = "\002test", '\000' <repeats 175 times>"\377, \377\377\037\000\000\000\000\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 31 times>, "\a", '\000' <repeats 15 times>, "\004\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\001\001\000\001", '\000' <repeats 44 times>, "\001", '\000' <repeats 69 times>, static m_namespace_to_wait_state_name = {{m_key = 111,
...
So, the next one is also the weakest, but is set for the duration of transaction at the database (named "test") level. We can see the next after it as well:
(gdb) p *(mdl_request->next_in_list->next_in_list)
$6 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION,
  next_in_list = 0x0, prev_in_list = 0x7fc1e27f7f78, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti", '\000' <repeats 378 times>,
    static m_namespace_to_wait_state_name = {{m_key = 111,
...

(gdb) c
Continuing.
Here we can see a stronger metadata lock request for the duration of transaction on the test.ti table. The next_in_list pointer is NULL, so we are done with this "batch" of requests. If we continue, we'll see the same locks requested one by one while hitting breakpoint:
Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1e27f7f70, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);

(gdb) p *mdl_request
$7 = {type = MDL_INTENTION_EXCLUSIVE, duration = MDL_TRANSACTION,
  next_in_list = 0x7fc1e27f7d70, prev_in_list = 0x7fc1b2df95b8, ticket = 0x0,
  key = {m_length = 7, m_db_name_length = 4,
    m_ptr = "\002test", '\000' <repeats 175 times>"\377, \377\377\037\000\000\000\000\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 31 times>, "\a",
...

(gdb) p *(mdl_request->next_in_list)
$8 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION,
  next_in_list = 0x0, prev_in_list = 0x7fc1e27f7f78, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti", '\000' <repeats 378 times>,
    static m_namespace_to_wait_state_name = {{m_key = 111,
...

(gdb) c
Continuing.

Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1e27f7d70, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);

(gdb) p *mdl_request
$9 = {type = MDL_SHARED_UPGRADABLE, duration = MDL_TRANSACTION,
  next_in_list = 0x0, prev_in_list = 0x7fc1e27f7f78, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti", '\000' <repeats 378 times>,
    static m_namespace_to_wait_state_name = {{m_key = 111,
...

(gdb) c
Continuing.
So, we have two ways to study the locks and new one in this post is to check the list of the lock requests in a single mdl_request list of pointers.

Eventually we also get the following breakpoint hit:
(gdb) c
Continuing.

Breakpoint 1, MDL_context::acquire_lock (this=0x7fc1e27ec0c8,
    mdl_request=0x7fc1b2df98a0, lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.7.10-3/percona-server-5.7.10-3/sql/mdl.cc:3506
3506      set_timespec(&abs_timeout, lock_wait_timeout);
(gdb) p *mdl_request
$15 = {type = MDL_EXCLUSIVE, duration = MDL_TRANSACTION,
  next_in_list = 0xd044f7, prev_in_list = 0x7fc1b2dfa914, ticket = 0x0, key = {
    m_length = 9, m_db_name_length = 4,
    m_ptr = "\003test\000ti\000_3\000\001\000\000\000\003\000\000\000<\240V\001\
...
Now, this is a real problem for any concurrent activity with the table, it's the strongest MDL_EXCLUSIVE lock requested on test.ti table that is hold (when obtained) till the end of ALTER statement...

To summarize this post:
  1. We see that we still can use gdb to study metadata locks in MySQL (or Percona Server) 5.7.x, but the breakpoint should be set on MDL_context::acquire_lock (at least until I'll find time to study changes in the source code for any alternatives).
  2. When breakpoint is hit you can study the content of structure pointed by the mdl_request parameter to find out what kind of lock is requested.
  3. At table level ALTER TABLE ... STATS_AUTO_RECALC=1 in MySQL 5.7 starts with weak enough MDL_SHARED_UPGRADABLE lock, but then eventually we end up with the request for the strongest MDL_EXCLUSIVE lock that is either blocked by any concurrent access to the table until transaction there is completed, and/or blocks everything (including SELECT from the table) until ALTER finally completes.
Next time in this series I plan to check what happens with metadata locks for partitioned tables and how much maintenance of individual partitions may affect other partitions. Other idea would be to check what metadata locks are set when statistics for the InnoDB table is recalculated automatically when 10% of data changes. Stay tuned!

Friday, February 26, 2016

On Upgrades to Percona Server 5.7 GA: Ubuntu 14.04 Case

After upgrading Percona Server 5.6.28 (and "full stack" of Percona software) to 5.7.10-3 GA on CentOS 6 and building it from Git Source on Fedora Core 23 yesterday, I've decided to spend some time upgrading Percona Server on my netbook that plays a role of "home build/test/VMs server" and runs under Ubuntu 14.04:
openxs@ao756:~$ uname -a
Linux ao756 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
openxs@ao756:~$ cat /etc/issue
Ubuntu 14.04.4 LTS \n \l
It had the following set of Percona packages (as a result of some tests with Percona XtraDB Cluster I had to use sometime while working there) and mysqld process running after OS startup:
openxs@ao756:~$ dpkg -l | grep -i percona | grep ii
ii  libmysqlclient18                                      1:5.5.35-rel33.0-611.raring                         amd64        Percona Server database client library
ii  libmysqlclient18-dev                                  1:5.5.35-rel33.0-611.raring                         amd64        Percona Server database development files
ii  libperconaserverclient18                              5.5.47-rel37.7-1.trusty                             amd64        Percona Server database client library
ii  libperconaserverclient18.1                            5.6.28-76.1-1.trusty                                amd64        Percona Server database client library
ii  percona-server-5.6-dbg                                5.6.28-76.1-1.trusty                                amd64        Debugging package for Percona Server
ii  percona-server-client-5.6                             5.6.28-76.1-1.trusty                                amd64        Percona Server database client binaries
ii  percona-server-common-5.5                             5.5.47-rel37.7-1.trusty                             amd64        Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii  percona-server-common-5.6                             5.6.28-76.1-1.trusty                                amd64        Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii  percona-server-server-5.6                             5.6.28-76.1-1.trusty                                amd64        Percona Server database server binaries
ii  percona-toolkit                                       2.2.16-1                                            all          Advanced MySQL and system command-line tools
ii  percona-xtrabackup                                    2.3.3-1.trusty                                      amd64        Open source backup tool for InnoDB and XtraDB
ii  percona-xtradb-cluster-common-5.6                     5.6.28-25.14-1.trusty                               amd64        Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii  percona-xtradb-cluster-galera-3.x                     3.14-1.trusty                                       amd64        Galera components of Percona XtraDB Cluster

openxs@ao756:~$ ps aux | grep mysqld
root      1283  0.0  0.0   4440   756 ?        S    10:35   0:00 /bin/sh /usr/bin/mysqld_safe
mysql     1505  0.1 13.0 895592 505516 ?       Sl   10:35   0:01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/lib/mysql/ao756.err --pid-file=/var/lib/mysql/ao756.pid
openxs    2938  0.0  0.0  14640   972 pts/4    S+   10:55   0:00 grep --color=auto mysqld
 This time I decided to follow the manual closely, so I checked for all usual/possible locations of configuration files, stopped MySQL server first and then installed the packages for 5.7 (I've noted TokuDB package was NOT installed, so I've added it to the list, as TokuDB storage engine is one of the reasons to use Percona Server 5.7 instead of well tested and known upstream MySQL 5.7):
openxs@ao756:~$ ls -l /etc/mysql/
total 16
drwxr-xr-x 2 root root 4096 тра 22  2014 conf.d
-rw------- 1 root root  333 тра 22  2014 debian.cnf
-rwxr-xr-x 1 root root 1285 січ  9 02:24 debian-start
-rw-r--r-- 1 root root  120 січ 12 13:03 my.cnf

openxs@ao756:~$ ls -l /etc/mysql/conf.d/
total 0

openxs@ao756:~$ ls -l /etc/my.cnf
-rw-r--r-- 1 root root 180 лис 17 14:22 /etc/my.cnf

openxs@ao756:~$ sudo service mysql stop
[sudo] password for openxs:
 * Stopping MySQL (Percona Server) mysqld                                [ OK ]
openxs@ao756:~$ ps aux | grep mysqld
openxs    3102  0.0  0.0  14636   968 pts/4    S+   11:01   0:00 grep --color=auto mysqld

openxs@ao756:~$ sudo apt-get install percona-server-server-5.7 percona-server-tokudb-5.7Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libperconaserverclient18.1 percona-server-common-5.6
  percona-xtradb-cluster-galera-3.x socat
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
  libmecab2 percona-server-client-5.7 percona-server-common-5.7
The following packages will be REMOVED:
  percona-server-5.6-dbg percona-server-client-5.6 percona-server-server-5.6
The following NEW packages will be installed:
  libmecab2 percona-server-client-5.7 percona-server-common-5.7
  percona-server-server-5.7 percona-server-tokudb-5.7
0 upgraded, 5 newly installed, 3 to remove and 3 not upgraded.
Need to get 21,6 MB of archives.
After this operation, 166 MB disk space will be freed.
...
Setting up libmecab2 (0.996-1.1) ...
Setting up percona-server-server-5.7 (5.7.10-3-1.trusty) ...
Installing new version of config file /etc/default/mysql ...
Installing new version of config file /etc/init.d/mysql ...
...
.................... * Percona Server 5.7.10-3 did not start. Please check logs for more details.
Processing triggers for ureadahead (0.100.0-16) ...
Setting up percona-server-tokudb-5.7 (5.7.10-3-1.trusty) ...


 * This release of Percona Server is distributed with TokuDB storage engine.
 * Run the following script to enable the TokuDB storage engine in Percona Server:

        ps_tokudb_admin --enable -u <mysql_admin_user> -p[mysql_admin_pass] [-S <socket>] [-h <host> -P <port>]

 * See http://www.percona.com/doc/percona-server/5.7/tokudb/tokudb_installation.html for more installation details

 * See http://www.percona.com/doc/percona-server/5.7/tokudb/tokudb_intro.html for an introduction to TokuDB


Processing triggers for libc-bin (2.19-0ubuntu6.7) ...
openxs@ao756:~$ ps aux | grep mysqldopenxs   17729  0.0  0.0  14636   972 pts/4    S+   11:20   0:00 grep --color=auto mysqld
So, even though the server was stopped (by the manual) at the moment of installation, there was an attempt to start server in the process that failed after painful 900 seconds of waiting... I've noted that new /etc/init.d/mysql script was installed and decided to check it (starting it manually under bash -x etc, reading and so on). While 900 seconds startup and 300 second shutdown default timeouts were visible in the code, it was not clear what's wrong. I suspected Percona Server Bug #1490897 again (it was a problem for me on Ubuntu 15.04), but here usual init.d approach to startup was used and none of the configuration files I checked had a single pid_file setting. Attempts to start using the script directly also ended up with 900 dots printed:
openxs@ao756:~$ sudo /etc/init.d/mysql start
No directory, logging in with HOME=/
...
Getting a bit tired of all these, I've checked that at least one way to start server up works, without defaults:
openxs@ao756:~$ sudo mysqld_safe --no-defaults &
[1] 21048
openxs@ao756:~$ 160226 11:51:32 mysqld_safe Adding '/usr/lib/x86_64-linux-gnu/libjemalloc.so.1' to LD_PRELOAD for mysqld
160226 11:51:32 mysqld_safe Logging to '/var/lib/mysql/ao756.err'.
160226 11:51:32 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

openxs@ao756:~$ ps aux | grep mysqld
root     21048  0.0  0.0  74060  2204 pts/4    S    11:51   0:00 sudo mysqld_safe --no-defaults
root     21049  0.0  0.0   4440   784 pts/4    S    11:51   0:00 /bin/sh /usr/bin/mysqld_safe --no-defaults
mysql    21124  1.8 14.8 1042020 573708 pts/4  Sl   11:51   0:00 /usr/sbin/mysqld --no-defaults --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/lib/mysql/ao756.err --pid-file=/var/lib/mysql/ao756.pid
openxs   21154  0.0  0.0  14636   972 pts/4    S+   11:51   0:00 grep --color=auto mysqld

openxs@ao756:~$ mysql -uroot -proot test
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.10-3 Percona Server (GPL), Release '3', Revision '63dafaf'

Copyright (c) 2009-2015 Percona LLC and/or its affiliates
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
 So, the problem was surely with some settings in the configuration files and/or init.d script. One of the problems was actually something to expect from 5.7, based on the messages in the error log:
2016-02-26T09:54:47.927969Z 0 [ERROR] You have enabled the binary log, but you haven't provided the mandatory server-id. Please refer to the proper server start-up parameters documentation
2016-02-26T09:54:47.928010Z 0 [ERROR] Aborting

2016-02-26T09:54:47.928052Z 0 [Note] Binlog end
2016-02-26T09:54:47.928131Z 0 [Note] /usr/sbin/mysqld: Shutdown complete
and taking into account the content of /etc/my.cnf at the moment:
openxs@ao756:~$ cat /etc/my.cnf[mysqld_safe]
pid_file=/var/lib/mysql/ao756.pid
[mysql]
connect_timeout=2
[mysqld]
innodb_file_format=Barracuda
table_open_cache=1156
table_open_cache_instances=1
log_bin
gtid_mode=ON
enforce_gtid_consistency
log_slave_updates
 So, I've added mandatory (for my case) server_id=N setting to /etc/my.cnf and tried to start again, to find out error messages that I've seen too many times recently and this year:
2016-02-26T09:56:47.576196Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2016-02-26T09:56:47.666859Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2016-02-26T09:56:47.668162Z 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
2016-02-26T09:56:47.668218Z 0 [ERROR] InnoDB: Tried to read 2048 bytes at offset 0 was only able to read02016-02-26T09:56:47.668238Z 0 [ERROR] InnoDB: Operating system error number 22 in a file operation.
2016-02-26T09:56:47.668263Z 0 [ERROR] InnoDB: Error number 22 means 'Invalid argument'
2016-02-26T09:56:47.668274Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2016-02-26T09:56:47.668287Z 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 122. Cannot continue operation2016-02-26T09:56:47.668298Z 0 [ERROR] InnoDB: Cannot continue operation.
I know these messages by heart already thanks to Percona Server Bug #1131949. One of my configuration files had to have innodb_flush_method set to ALL_O_DIRECT:
openxs@ao756:~$ cat /etc/mysql/my.cnf
[mysqld]
innodb_flush_method = ALL_O_DIRECT
innodb_track_changed_pages = 1
innodb_log_file_size                    = 2G
After replacing it with usual O_DIRECT I was able to finally start the server as expected and execute mysql_upgrade:
openxs@ao756:~$ sudo mysqld_safe &
[1] 21711
openxs@ao756:~$ 160226 11:58:33 mysqld_safe Adding '/usr/lib/x86_64-linux-gnu/libjemalloc.so.1' to LD_PRELOAD for mysqld
160226 11:58:33 mysqld_safe Logging to '/var/lib/mysql/ao756.err'.
160226 11:58:33 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

openxs@ao756:~$ sudo mysql_upgrade -uroot -proot
mysql_upgrade: [Warning] Using a password on the command line interface can be insecure.
Checking if update is needed.
Checking server version.
Running queries to upgrade MySQL server.
...
Repairing tables
mysql.proxies_priv                                 OK
Upgrade process completed successfully.
Checking if update is needed.

openxs@ao756:~$ ps aux | grep mysqld
root     21711  0.0  0.0  74060  2200 pts/4    S    11:58   0:00 sudo mysqld_safe
root     21712  0.0  0.0   4440   760 pts/4    S    11:58   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    21938  6.8 15.8 1066860 610576 pts/4  Sl   11:58   0:12 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/lib/mysql/ao756.err --pid-file=/var/lib/mysql/ao756.pid
openxs   22292  0.0  0.0  14636   972 pts/4    S+   12:01   0:00 grep --color=auto mysqld
Moreover, now it can be started as a service also (even though the output looks unusual):
openxs@ao756:~$ sudo service mysql start
No directory, logging in with HOME=/
....
 * Percona Server 5.7.10-3 is started
Next step was to enable TokuDB based on instructions I've already seen:
openxs@ao756:~$ sudo ps_tokudb_admin --enable -uroot -proot
Checking if Percona Server is running with jemalloc enabled...
INFO: Percona Server is running with jemalloc enabled.

Checking transparent huge pages status on the system...
INFO: Transparent huge pages are enabled (should be disabled).

Checking if thp-setting=never option is already set in config file...
INFO: Option thp-setting=never is not set in the config file.
      (needed only if THP is not disabled permanently on the system)

Checking TokuDB engine plugin status...
INFO: TokuDB engine plugin is not installed.

Disabling transparent huge pages for the current session...
INFO: Successfully disabled transparent huge pages for this session.

Adding thp-setting=never option into /etc/my.cnf
INFO: Successfully added thp-setting=never option into /etc/my.cnf

Installing TokuDB engine...
INFO: Successfully installed TokuDB engine plugin.
and create my first TokuDB table on Percona Server 5.7:
openxs@ao756:~$ mysql -uroot -proot testmysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.7.10-3-log Percona Server (GPL), Release '3', Revision '63dafaf'

Copyright (c) 2009-2015 Percona LLC and/or its affiliates
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table tt(id int primary key) engine=tokudb;
Query OK, 0 rows affected (0.40 sec)

mysql> show create table tt\G
*************************** 1. row ***************************
       Table: tt
Create Table: CREATE TABLE `tt` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
As I want to check "full stack", I definitely need xtrabackup version 2.4 that is not installed by default (note the unusual package name for .deb packages, server has dot in the name before minor version):
openxs@ao756:~$ sudo apt-get install percona-xtrabackup-24
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libperconaserverclient18.1 percona-server-common-5.6
  percona-xtradb-cluster-galera-3.x socat
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  percona-xtrabackup
The following NEW packages will be installed:
  percona-xtrabackup-24
0 upgraded, 1 newly installed, 1 to remove and 3 not upgraded.
Need to get 6 084 kB of archives.
After this operation, 10,3 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://repo.percona.com/apt/ trusty/main percona-xtrabackup-24 amd64 2.4.1-1.trusty [6 084 kB]
Fetched 6 084 kB in 8s (691 kB/s)
(Reading database ... 801961 files and directories currently installed.)
Removing percona-xtrabackup (2.3.3-1.trusty) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Selecting previously unselected package percona-xtrabackup-24.
(Reading database ... 801947 files and directories currently installed.)
Preparing to unpack .../percona-xtrabackup-24_2.4.1-1.trusty_amd64.deb ...
Unpacking percona-xtrabackup-24 (2.4.1-1.trusty) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Setting up percona-xtrabackup-24 (2.4.1-1.trusty) ...

openxs@ao756:~$ xtrabackup --version
xtrabackup version 2.4.1 based on MySQL server 5.7.10 Linux (x86_64) (revision id: a2dc9d4)
It seems at least in my case xtrabackup was not able to find the socket by default, but eventually it worked:
openxs@ao756:~$ sudo xtrabackup --user=root --password=root --backup
160226 12:08:45  version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup' as 'root'  (using password: YES).
160226 12:08:45  version_check Connected to MySQL server
160226 12:08:45  version_check Executing a version check against the server...
160226 12:08:47  version_check Done.
160226 12:08:47 Connecting to MySQL server host: localhost, user: root, password: set, port: 0, socket: (null)
Failed to connect to MySQL server: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2).

openxs@ao756:~$ mysql -uroot -proot test -e "show variables like 'socket%'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-----------------------------+
| Variable_name | Value                       |
+---------------+-----------------------------+
| socket        | /var/run/mysqld/mysqld.sock |
+---------------+-----------------------------+

openxs@ao756:~$ sudo xtrabackup --user=root --password=root --socket=/var/run/mysqld/mysqld.sock  --backup
...
160226 12:11:43 Finished backing up non-InnoDB tables and files
160226 12:11:43 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
xtrabackup: The latest check point (for incremental): '7481732856'
xtrabackup: Stopping log copying thread.
.160226 12:11:43 >> log scanned up to (7481732865)

160226 12:11:43 Executing UNLOCK TABLES
160226 12:11:43 All tables unlocked
160226 12:11:43 [00] Copying ib_buffer_pool to /home/openxs/xtrabackup_backupfiles/ib_buffer_pool
160226 12:11:43 [00]        ...done
160226 12:11:43 Backup created in directory '/home/openxs/xtrabackup_backupfiles/'
160226 12:11:43 [00] Writing backup-my.cnf
160226 12:11:43 [00]        ...done
160226 12:11:43 [00] Writing xtrabackup_info
160226 12:11:43 [00]        ...done
xtrabackup: Transaction log of lsn (7481732815) to (7481732865) was copied.
160226 12:11:44 completed OK!
I've also installed debug package to have symbols while working with gdb, and checked that pt-mysql-summary works:
openxs@ao756:~$ sudo apt-get install percona-server-5.7-dbg
...
Fetched 160 MB in 59s (2 702 kB/s)
Selecting previously unselected package percona-server-5.7-dbg.
(Reading database ... 801962 files and directories currently installed.)
Preparing to unpack .../percona-server-5.7-dbg_5.7.10-3-1.trusty_amd64.deb ...
Unpacking percona-server-5.7-dbg (5.7.10-3-1.trusty) ...
Setting up percona-server-5.7-dbg (5.7.10-3-1.trusty) ...
openxs@ao756:~$ pt-mysql-summary
# Percona Toolkit MySQL Summary Report #######################
              System time | 2016-02-26 10:17:09 UTC (local TZ: EET +0200)
# Instances ##################################################
  Port  Data Directory             Nice OOM Socket
  ===== ========================== ==== === ======
        /var/lib/mysql             0    0
# MySQL Executable ###########################################
       Path to executable | /usr/sbin/mysqld
              Has symbols | No
# Report On Port 3306 ########################################
                     User | @localhost
                     Time | 2016-02-26 12:17:08 (EET)
                 Hostname | ao756
                  Version | 5.7.10-3-log Percona Server (GPL), Release '3', Revision '63dafaf'
                 Built On | debian-linux-gnu x86_64
                  Started | 2016-02-26 12:02 (up 0+00:14:11)
                Databases | 2
                  Datadir | /var/lib/mysql/
                Processes | 1 connected, 1 running
              Replication | Is not a slave, has 0 slaves connected
                  Pidfile | /var/lib/mysql/ao756.pid (does not exist)...
As a last note, service can not be stopped properly:
openxs@ao756:~$ sudo service mysql stop
 * Percona Server 5.7.10-3 is already stopped
openxs@ao756:~$ ps aux | grep mysqld
mysql    24382  0.1  0.0   4440   760 ?        S    12:21   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    24643  2.9  5.6 1234144 218252 ?      Sl   12:21   0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --log-error=/var/lib/mysql/ao756.err --pid-file=/var/lib/mysql/ao756.pid
openxs   24756  0.0  0.0  14636   972 pts/4    S+   12:21   0:00 grep --color=auto mysqld
so one has to use mysqladmin:
openxs@ao756:~$ sudo mysqladmin -uroot -proot shutdownmysqladmin: [Warning] Using a password on the command line interface can be insecure.

openxs@ao756:~$ ps aux | grep mysqld
openxs   24765  0.0  0.0  14636   972 pts/4    S+   12:21   0:00 grep --color=auto mysqld
To summarize my experience of a less naive user of Ubuntu 14.04 (the one who already upgraded to Percona Server 5.7.10-3 on other Linux versions), one should note the following:

  1. Please, check all your configuration files carefully for any possible incompatibility with 5.7 before attempting upgrade.
  2. Make sure ALL_O_DIRECT value is NOT used for innodb_flush_method. Check this my Facebook post for more details and discussion with Laurynas about the bugs involved (Percona Server Bug #1529885 and Percona Server Bug #1550404) and manual statements related.
  3. If you need to use xtrabackup with Percona Server 5.7, you have to install percona-xtrabackup-24 explicitly and may have to specify --socket option explicitly.
  4. I think /etc/init.d/mysql script from Percona Server 5.7.10-3 is somewhat broken on Ubuntu 14.04. Something to study in more details later.
I hope this step by step guide will help some of you to upgrade faster and provide some troubleshooting hints in case of problems.

Wednesday, January 6, 2016

Exploring Metadata Locks with gdb - Double Checking the Initial Results

Some results in my initial post in this series led me to questions that I'll try to answer here. First of all, I noted that SELECT from a single table ended up with just one metadata lock request:

(gdb) b MDL_request::init
Breakpoint 1 at 0x648f13: file /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc, line 1266.
Breakpoint 2 at 0x648e70: file /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc, line 1245.
warning: Multiple breakpoints were set.
Use the "delete" command to delete unwanted breakpoints.
(gdb) c
Continuing.
[Switching to Thread 0x7ff224c9f700 (LWP 2017)]

Breakpoint 2, MDL_request::init (this=0x7ff1fbe425a8,
    mdl_namespace=MDL_key::TABLE, db_arg=0x7ff1fbe421c8 "test",
    name_arg=0x7ff1fbe421d0 "t1", mdl_type_arg=MDL_SHARED_READ,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {


MDL_SHARED_READ lock on the table is expected, we read it after all. Let's check with backtrace where this request happens:






(gdb) bt
#0  MDL_request::init (this=0x7ff1fbe425a8, mdl_namespace=MDL_key::TABLE,
    db_arg=0x7ff1fbe421c8 "test", name_arg=0x7ff1fbe421d0 "t1",
    mdl_type_arg=MDL_SHARED_READ, mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x00000000006d3033 in st_select_lex::add_table_to_list (this=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_parse.cc:7479
#2  0x000000000077eb6d in MYSQLparse (YYTHD=0x7ff1fbb65000)
    at /var/lib/jenkins/jobs/percona-server-5.6-source-tarballs/workspace/sql/sql_yacc.yy:10888
#3  0x00000000006decef in parse_sql (thd=0x7ff1fbb65000,
    parser_state=0x7ff224c9e130, creation_ctx=0x0)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_parse.cc:9005
#4  0x00000000006df1b1 in mysql_parse (thd=0x7ff1fbb65000, rawbuf=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_parse.cc:6878
#5  0x00000000006e0d1f in dispatch_command (command=<value optimized out>,
    thd=0x7ff1fbb65000, packet=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_parse.cc:1442
#6  0x00000000006ad692 in do_handle_one_connection (thd_arg=Unhandled dwarf expression opcode 0xf3
)
...


So, it happens as soon as we noted a table to read from while parsing the query text. What surprised me initially is no other MDL lock requests, nothing at schema level, for example. I expected that metadata locks should prevent dropping the schema when I read from some table there, but had not seen the schema-level lock for this.

To get more details on this and try to make sure I do not miss anything by studying a corner case, I've added a row to the table and then tried to do the same SELECT in explicit transaction, and (after SELECT completed, but transaction remained active) I tried to DROP DATABASE from the second session while this transaction was still active. (This test will also let us understand what metadata locks are requested for DROP DATABASE, by the way.) I've got the following in gdb:

[New Thread 0x7ff224c5e700 (LWP 2048)]
[Switching to Thread 0x7ff224c5e700 (LWP 2048)]

Breakpoint 2, MDL_request::init (this=0x7ff224c5a040,
    mdl_namespace=MDL_key::GLOBAL, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) bt
#0  MDL_request::init (this=0x7ff224c5a040, mdl_namespace=MDL_key::GLOBAL,
    db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x00000000007d1ed7 in lock_schema_name (thd=0x7ff1fbbb5000,
    db=0x7ff1e581d0b0 "test")
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/lock.cc:794
#2  0x00000000006b03d4 in mysql_rm_db (thd=0x7ff1fbbb5000,
    db=0x7ff1e581d0b0 "test", if_exists=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_db.cc:787
...

---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5a1f0,
    mdl_namespace=MDL_key::BACKUP, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5a3a0,
    mdl_namespace=MDL_key::SCHEMA, db_arg=0x7ff1e581d0b0 "test",
    name_arg=0xba2ae4 "", mdl_type_arg=MDL_EXCLUSIVE,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) bt
#0  MDL_request::init (this=0x7ff224c5a3a0, mdl_namespace=MDL_key::SCHEMA,
    db_arg=0x7ff1e581d0b0 "test", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_EXCLUSIVE, mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x00000000007d1fb2 in lock_schema_name (thd=0x7ff1fbbb5000,
    db=0x7ff1e581d0b0 "test")
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/lock.cc:801
#2  0x00000000006b03d4 in mysql_rm_db (thd=0x7ff1fbbb5000,
    db=0x7ff1e581d0b0 "test", if_exists=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_db.cc:787
#3  0x00000000006daaa2 in mysql_execute_command (thd=0x7ff1fbbb5000)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_parse.cc:4371
#4  0x00000000006df518 in mysql_parse (thd=0x7ff1fbbb5000, rawbuf=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_parse.cc:6972
...

---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff1e58313b0,
    mdl_namespace=MDL_key::TABLE, db_arg=0x7ff1e5831570 "test",
    name_arg=0x7ff1e5831575 "t1", mdl_type_arg=MDL_EXCLUSIVE,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) bt
#0  MDL_request::init (this=0x7ff1e58313b0, mdl_namespace=MDL_key::TABLE,
    db_arg=0x7ff1e5831570 "test", name_arg=0x7ff1e5831575 "t1",
    mdl_type_arg=MDL_EXCLUSIVE, mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x00000000006b0857 in find_db_tables_and_rm_known_files (
    thd=0x7ff1fbbb5000, db=0x7ff1e581d0b0 "test", if_exists=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_db.cc:1108
#2  mysql_rm_db (thd=0x7ff1fbbb5000, db=0x7ff1e581d0b0 "test", if_exists=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_db.cc:812
...


Now it's clear what happens, eventually we have to drop the table while dropping the schema, and here we surely hit a blocking metadata lock that SELECT set! Let's continue:

---Type <return> to continue, or q <return> to quit---q
Quit

(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff1e581ece0,
    mdl_namespace=MDL_key::SCHEMA, db_arg=0x7ff1e5831570 "test",
    name_arg=0xba2ae4 "", mdl_type_arg=MDL_INTENTION_EXCLUSIVE,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) bt
#0  MDL_request::init (this=0x7ff1e581ece0, mdl_namespace=MDL_key::SCHEMA,
    db_arg=0x7ff1e5831570 "test", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x000000000068da14 in lock_table_names (thd=0x7ff1fbbb5000,
    tables_start=0x7ff1e5831010, tables_end=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:5033
#2  0x00000000006b05df in mysql_rm_db (thd=0x7ff1fbbb5000,
    db=0x7ff1e581d0b0 "test", if_exists=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_db.cc:834
...


So, while our request for MDL_EXCLUSIVE lock for the table surely could not be satisfied, server continued with some further requests:

---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5a1f0,
    mdl_namespace=MDL_key::GLOBAL, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5a3a0,
    mdl_namespace=MDL_key::BACKUP, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.


and only at this stage it hanged (waiting for the transaction where SELECT happened to complete). As soon as I committed there:

Breakpoint 2, MDL_request::init (this=0x7ff224c5a260,
    mdl_namespace=MDL_key::TABLE, db_arg=0xb925ef "mysql",
    name_arg=0xb94906 "proc", mdl_type_arg=MDL_SHARED_READ,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5a360,
    mdl_namespace=MDL_key::TABLE, db_arg=0xb925ef "mysql",
    name_arg=0xb94906 "proc", mdl_type_arg=MDL_SHARED_WRITE,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {




So, we tried to read and then maybe change the mysql.proc table (to drop procedures created in this database, honestly I was not sure if there was any one there at the moment). Next:

(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c59cb0,
    mdl_namespace=MDL_key::GLOBAL, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c599c0,
    mdl_namespace=MDL_key::BACKUP, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5a250,
    mdl_namespace=MDL_key::TABLE, db_arg=0xb925ef "mysql",
    name_arg=0xc111a8 "event", mdl_type_arg=MDL_SHARED_WRITE,
    mdl_duration_arg=MDL_TRANSACTION)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.




That is, some lock requests and we try to delete events in this database from mysql.event table obviously. Let's continue:

Breakpoint 2, MDL_request::init (this=0x7ff224c59900,
    mdl_namespace=MDL_key::GLOBAL, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) bt
#0  MDL_request::init (this=0x7ff224c59900, mdl_namespace=MDL_key::GLOBAL,
    db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x000000000068c13a in open_table (thd=0x7ff1fbbb5000,
    table_list=0x7ff224c59eb0, ot_ctx=0x7ff224c59c10)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:2895
#2  0x0000000000694585 in open_and_process_table (thd=0x7ff1fbbb5000,
    start=0x7ff224c59e48, counter=0x7ff224c59e50, flags=2048,
    prelocking_strategy=0x7ff224c59ea0)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:4797
#3  open_tables (thd=0x7ff1fbbb5000, start=0x7ff224c59e48,
    counter=0x7ff224c59e50, flags=2048, prelocking_strategy=0x7ff224c59ea0)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:5304
#4  0x0000000000694d74 in open_and_lock_tables (thd=0x7ff1fbbb5000,
    tables=0x7ff224c59eb0, derived=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:5960


#5  0x000000000085039d in open_and_lock_tables (thd=0x7ff1fbbb5000, lock_type=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.h:477
#6  Event_db_repository::open_event_table (thd=0x7ff1fbbb5000, lock_type=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/event_db_repository.cc:619
#7  0x0000000000850bfc in Event_db_repository::drop_schema_events (this=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/event_db_repository.cc:1011---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c59610,
    mdl_namespace=MDL_key::BACKUP, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) bt
#0  MDL_request::init (this=0x7ff224c59610, mdl_namespace=MDL_key::BACKUP,
    db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_STATEMENT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
#1  0x00000000007d1dfe in Global_backup_lock::acquire_protection (
    this=0x7ff1fbbb6a08, thd=0x7ff1fbbb5000, duration=MDL_STATEMENT,
    lock_wait_timeout=31536000)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/lock.cc:1225
#2  0x000000000068ce20 in open_table (thd=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:3258
#3  0x0000000000694585 in open_and_process_table (thd=0x7ff1fbbb5000,
    start=0x7ff224c59e48, counter=0x7ff224c59e50, flags=2048,
    prelocking_strategy=0x7ff224c59ea0)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:4797
#4  open_tables (thd=0x7ff1fbbb5000, start=0x7ff224c59e48,
    counter=0x7ff224c59e50, flags=2048, prelocking_strategy=0x7ff224c59ea0)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:5304
#5  0x0000000000694d74 in open_and_lock_tables (thd=0x7ff1fbbb5000,
    tables=0x7ff224c59eb0, derived=Unhandled dwarf expression opcode 0xf3
)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/sql_base.cc:5960

...
---Type <return> to continue, or q <return> to quit---q
 at /uQuit
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c5ab00,
    mdl_namespace=MDL_key::BINLOG, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_INTENTION_EXCLUSIVE, mdl_duration_arg=MDL_EXPLICIT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

We ended up with writing to the binary log and DROP DATABASE was completed.

Let's summarize this experience. First of all, it seems in current implementation there is no need to set metadata locks at schema level to prevent dropping the database we read from with SELECT - table level metadata lock will eventually block the DROP DATABASE request. If we have many tables in the database this may happen not at the very first one, so we can drop some tables before getting blocked. This is something to study later. I'd really prefer DROP DATABASE to be atomic and do not even start if it can not complete at the moment. See Bug #79610 also.

We see table-level metadata lock requests for mysql.proc and mysql.events tables when DROP DATABASE is executed. This is expected and reasonable - we can NOT drop the database (schema) if there are some stored functions, procedures or events are defined there that are used.

We had also noted that even if some metadata lock is requested but can not be obtained, processing may continue for some time until stopped, and then processing resumes when we finally get the required metadata lock.

This is the code to study in sql/lock.cc:

 760 /**
 761   Obtain an exclusive metadata lock on a schema name.
 762
 763   @param thd         Thread handle.
 764   @param db          The database name.
 765
 766   This function cannot be called while holding LOCK_open mutex.
 767   To avoid deadlocks, we do not try to obtain exclusive metadata
 768   locks in LOCK TABLES mode, since in this mode there may be
 769   other metadata locks already taken by the current connection,
 770   and we must not wait for MDL locks while holding locks.
 771
 772   @retval FALSE  Success.
 773   @retval TRUE   Failure: we're in LOCK TABLES mode, or out of memory,
 774                  or this connection was killed.
 775 */
 776
 777 bool lock_schema_name(THD *thd, const char *db)
 778 {
 779   MDL_request_list mdl_requests;
 780   MDL_request global_request;
 781   MDL_request backup_request;
 782   MDL_request mdl_request;
 783
 784   if (thd->locked_tables_mode)
 785   {
 786     my_message(ER_LOCK_OR_ACTIVE_TRANSACTION,
 787                ER(ER_LOCK_OR_ACTIVE_TRANSACTION), MYF(0));
 788     return TRUE;
 789   }
 790
 791   if (thd->global_read_lock.can_acquire_protection())
 792     return TRUE;
 793   global_request.init(MDL_key::GLOBAL, "", "", MDL_INTENTION_EXCLUSIVE,
 794                       MDL_STATEMENT);
 795
 796   if (thd->backup_tables_lock.abort_if_acquired())
 797     return true;
 798   thd->backup_tables_lock.init_protection_request(&backup_request,
 799                                                   MDL_STATEMENT);
 800
 801   mdl_request.init(MDL_key::SCHEMA, db, "", MDL_EXCLUSIVE, MDL_TRANSACTION)     ;
 802
 803   mdl_requests.push_front(&mdl_request);
 804   mdl_requests.push_front(&backup_request);
 805   mdl_requests.push_front(&global_request);
 806
 807   if (thd->mdl_context.acquire_locks(&mdl_requests,
 808                                      thd->variables.lock_wait_timeout))
 809     return TRUE;
 810
 811   DEBUG_SYNC(thd, "after_wait_locked_schema_name");
 812   return FALSE;


From this code we see that actually we acquire metadata locks in batches(!). This explaines why we had not stopped at our first "blocked" request immediately, and complicates tracing in gdb. We also see that we have some mdl_context object in THD structure, and we may use this later to study pending requests per thread. One day I'll start setting breakpoints at MDLcontext::acquire_locks as well.

It's also more or less clear what MDL_key::GLOBAL namespace is used for - this is a metadata lock requested by FLUSH TABLES WITH READ LOCK global lock, and thus we check it for any "write" operation, even before we try to do it. If I execute FLUSH TABLES WITH READ LOCK, I get the following in my gdb session with a breakpoint set:

[Switching to Thread 0x7ff224c9f700 (LWP 2017)]

Breakpoint 2, MDL_request::init (this=0x7ff224c9c9f0,
    mdl_namespace=MDL_key::GLOBAL, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_SHARED, mdl_duration_arg=MDL_EXPLICIT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.

Breakpoint 2, MDL_request::init (this=0x7ff224c9c9f0,
    mdl_namespace=MDL_key::COMMIT, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_SHARED, mdl_duration_arg=MDL_EXPLICIT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {
(gdb) c
Continuing.


and then it's completed. Now, on UNLOCK TABLES nothing happens at MDL level.

The last topic I'd like to discuss in this post are metadata locks with namespace defined like this: mdl_namespace=MDL_key::BACKUP, "backup locks". Manual for 5.7 does not mention them, but they still exist in the code even there:

[openxs@centos ~]$ grep -rn MDL_key::BACKUP ~/git/mysql-server/*
[openxs@centos ~]$ echo $?
1

In MySQL we see the following in mdl.h:

  enum enum_mdl_namespace { GLOBAL=0,
                            TABLESPACE,
                            SCHEMA,
                            TABLE,
                            FUNCTION,
                            PROCEDURE,
                            TRIGGER,
                            EVENT,
                            COMMIT,
                            USER_LEVEL_LOCK,
                            LOCKING_SERVICE,
                            /* This should be the last ! */
                            NAMESPACE_END };


So, backup locks is the unique feature of Percona Server, explained in the manual. It is implemented with MDL locks. As soon as we run LOCK TABLES FOR BACKUP, we see in gdb session:

Breakpoint 2, MDL_request::init (this=0x7ff224c9ca90,
    mdl_namespace=MDL_key::BACKUP, db_arg=0xba2ae4 "", name_arg=0xba2ae4 "",
    mdl_type_arg=MDL_SHARED, mdl_duration_arg=MDL_EXPLICIT)
    at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:1245
1245    {


This reminded me about the difference Percona Server can make, so I'll try to re-check my results with MySQL 5.6 next time as well. Stay tuned!