Sunday, July 23, 2017

Why Thread May Hang in "Waiting for table level lock" State - Part I

Last time I had to use gdb to check what's going on in MySQL server and found something useful with it to share in the blog post it was April 2017, and I miss this kind of experience already. So, today I decided to try it again in a hope to get some insights in cases when other tools may not apply or may not be as helpful as one could expect. Here is the long enough story, based on recent customer issue I worked on this week.
* * *
Had you seen anything like this output of SHOW PROCESSLIST statement:

Id User Host db Command Time State 
Info Progress
...
28 someuser01 xx.xx.xx.xx:39644 somedb001 Sleep 247  
NULL 0.000
29 someuser01 xx.xx.xx.yy:44100 somedb001 Query 276 
Waiting for table level lock DELETE FROM t1 WHERE (some_id = 'NNNNN') AND ... 0.000
...
33 someuser01 xx.xx.zz.tt:35886 somedb001 Query 275 
Waiting for table level lock DELETE FROM t2 WHERE (some_id = 'XXXXX') 0.000
...
38 someuser01 xx.xx.xx.xx:57055 somedb001 Query 246 
Waiting for table level lock DELETE FROM t3 WHERE (some_id in (123456)) AND ...  0.000
...
recently? That is, many threads accessing InnoDB(!) tables and hanging in the "Waiting for table level lock" state for a long time without any obvious reason in the SHOW PROCESSLIST or SHOW ENGINE INNODB STATUS?

I've seen it this week, along with a statement from customer that the server is basically unusable. Unlike in many other cases I had not found any obvious problem from the outputs mentioned above (other than high concurrency and normal InnoDB row lock waits etc for other sessions). There were no unexpected table level locks reported by InnoDB, and the longest running transactions were those in that waiting state.

Unfortunately fine MySQL manual is not very helpful when describing this thread state:
"Waiting for lock_type lock
The server is waiting to acquire a THR_LOCK lock or a lock from the metadata locking subsystem, where lock_type indicates the type of lock.
This state indicates a wait for a THR_LOCK:
  • Waiting for table level lock
These states indicate a wait for a metadata lock:
  • Waiting for event metadata lock
  • Waiting for global read lock
    ...
    "
What that "THR_LOCK" should tell the reader is beyond me (and probably most of MySQL users). Fortunately, I know from experience that this state usually means that thread is trying to access some MyISAM table. The tables mentioned in currently running statements were all InnoDB, though. Server had performance_schema enabled, but only with default instrumentation and consumers defined, so I had no chance to get anything related to metadata locks or much hope to see all previous statement executed by each thread anyway. I surely insisted on checking the source code etc, but this rarely gives immediate results.

So, to check my assumption and try to prove MyISAM tables are involved, and get some hint what sessions they were locked by, I suggested to run mysqladmin debug, the command that is rarely used these days, with the output that Oracle explicitly refused to document properly (see my Bug #71346)! I do know what to expect there (from the days when MyISAM was widely used), and when in doubts I can always check the source code.

The mysqladmin debug command outputs a lot of information into the error log. Among other things I found the following section:

Thread database.table_name          Locked/Waiting        Lock_type

...
928     somedb001.somewhat_seq      Waiting - write       High priority write lock
...
12940   somedb001.somewhat_seq      Locked - write        High priority write lock
where somedb001.somewhat_seq surely was the MyISAM table. There were actually many tables mentioned as locked, with waits on them, and names ending with _seq suffix. One can now check what MySQL thread with id=12940 is doing at all levels, where it comes from, should it be killed etc.

Now it became more clear what's going on. Probably application developers tried to implement sequences with MyISAM tables and, probably, triggers to use them when data are not provided for the columns! The idea is quite popular and described in many blog posts. Check this old blog post by Peter Zaitsev, for example. This assumption was confirmed, and essentially they did something like I did in the following deliberately primitive example:
-- this is our sequence table
mysql> show create table misam\G
*************************** 1. row ***************************
       Table: misam
Create Table: CREATE TABLE `misam` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> create table inno like misam;
Query OK, 0 rows affected (0.30 sec)

mysql> alter table inno engine=InnoDB;
Query OK, 0 rows affected (2.73 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table inno\G
*************************** 1. row ***************************
       Table: inno
Create Table: CREATE TABLE `inno` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.03 sec)

mysql> select * from misam;
Empty set (0.00 sec)

-- let's set up the first value for the sequence
mysql> insert into misam values(1);
Query OK, 1 row affected (0.05 sec)

-- now let's create trigger to insert the value from sequencemysql> delimiter //
mysql> create trigger tbi before insert on inno
    -> for each row
    -> begin
    -> if ifnull(new.id,0) = 0 then
    ->   update misam set id=last_insert_id(id+1);
    ->   set new.id = last_insert_id();
    -> end if;
    -> end
    -> //

Query OK, 0 rows affected (0.25 sec)

mysql> delimiter ;
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into inno values(0);
Query OK, 1 row affected (0.20 sec)

mysql> insert into inno values(0);
Query OK, 1 row affected (0.10 sec)

mysql> select * from inno;
+----+
| id |
+----+
|  2 |
|  3 |
+----+
2 rows in set (0.00 sec)

mysql> select * from misam;
+----+
| id |
+----+
|  3 |
+----+
1 row in set (0.00 sec)
So, this is how it is supposed to work, and, you know, it works from two concurrent sessions without any locking noticed etc.

This is the end of the real story, for now. We have a lot of things to discuss there with customer, including how to find out previous commands executed in each sessions, what useful details we can get from the performance_schema etc. The root cause was clear, but we still have to find out why the sessions are spending so much time holding the blocking locks on MyISAM tables and what we can do about that, or with architecture based on such implementation of sequences (that I kindly ask you to avoid whenever possible! Use auto_increment values, please, if you care about concurrency.) I expect more blog posts eventually inspired by that real story, but now it's time to move to more generic technical details.
* * *
I have two problems with this story if I try to approach it in a more generic way.

First, do we have any other way to see MyISAM table level locks, besides mysqladmin debug command? (Oracle kindly plans to replace with something, let me quote Shane Bester's comment the bug that originates from hist internal feature request: "We need to replace it with appropriate performance_schema or information_schema tables.")

Second, how exactly to reproduce the situation customer reported? My initial attempts to get such a status with the trigger in place failed, I was not able to capture threads spending any notable time in "Waiting for table level lock" state even with concurrent single row inserts and the trigger above in place. I was thinking about explicit LOCK TABLES misam WRITE etc, but I doubt real code does use that. Fortunately, the following INSERT running from one session:
mysql> insert into inno values(sleep(100));
Query OK, 1 row affected (1 min 40.20 sec)
allowed me to run INSERT in the other session that got blocked:
mysql> insert into inno values(0);
Query OK, 1 row affected (1 min 36.36 sec)
and while it was blocked I've seen the thread state I wanted in the output of the SHOW PROCESSLIST:
mysql> show processlist;+----+------+-----------+------+---------+------+------------------------------+-------------------------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                        | Info                                | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+------------------------------+-------------------------------------+-----------+---------------+
| 16 | root | localhost | test | Query   |   17 | User sleep                   | insert into inno values(sleep(100)) |         0 |             0 |
| 17 | root | localhost | test | Query   |   13 | Waiting for table level lock | insert into inno values(0)          |         0 |             0 |
| 22 | root | localhost | test | Query   |    0 | starting                     | show processlist                    |         0 |             0 |
+----+------+-----------+------+---------+------+------------------------------+-------------------------------------+-----------+---------------+
3 rows in set (0.00 sec)
So, I know how exactly to reproduce the problem and what it can be caused by. Any slow running single INSERT statement (caused by something complex executed in trigger, some slow function used in the list of values inserted, or, maybe just INSERT ... SELECT ... from the big table) will give us the desired thread state.

Coming back to the first generic problem mentioned above, is there any way besides running mysqladmin debug and checking the output vs processlist etc, to identify the thread that holds MyISAM table level locks? One should expect performance_schema to help with this, at least as of 5.7 (I've used Percona Server 5.7.18-15 on my Ubuntu 14.04 netbook for testing today, while the original problem was on MariaDB 10.2.x. Percona is for fun, while MariaDB is for real work...). Indeed, we have table_handles there, and until recently I ignored its existence (it's new in 5.7, and I am not even sure if MariaDB 10.2.x has it already).

So, I reproduced the problem again and got the following there:
mysql> select * from performance_schema.table_handles;
+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | OBJECT_INSTANCE_BEGIN | OWNER_THREAD_ID | OWNER_EVENT_ID | INTERNAL_LOCK | EXTERNAL_LOCK  |
+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+
| TABLE       | test          | t           |       140392772351024 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | t1          |       140392772352560 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | ttime       |       140392772355632 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | misam       |       140392772358704 |              55 |             10 | WRITE         | WRITE EXTERNAL |
| TABLE       | test          | misam       |       140392896981552 |              54 |             10 | WRITE         | WRITE EXTERNAL |
| TABLE       | test          | inno        |       140392772361776 |              55 |             10 | NULL          | WRITE EXTERNAL |
| TABLE       | test          | inno        |       140392896983088 |              54 |             10 | NULL          | WRITE EXTERNAL |
| TABLE       | test          | inno        |       140392826836016 |               0 |              0 | NULL          | NULL           |
| TABLE       | test          | misam       |       140392826837552 |               0 |              0 | NULL          | NULL           |
+-------------+---------------+-------------+-----------------------+-----------------+----------------+---------------+----------------+
9 rows in set (0.00 sec)
What I am supposed to do with the above to find out the MySQL thread id of the blocking session? I am supposed to join to performance_schema.threads table, looking for thread_id value that is the same as owner_thread_id above, 54 and 55. I get the following, note that processlist_id is what you are looking for in the processlist:
mysql> select * from performance_schema.threads where thread_id in (54, 55)\G
*************************** 1. row ***************************
          THREAD_ID: 54
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 27
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
   PROCESSLIST_TIME: 90
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 32252
*************************** 2. row ***************************
          THREAD_ID: 55
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 28
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
   PROCESSLIST_TIME: 101
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: Socket
       THREAD_OS_ID: 27844
2 rows in set (0.10 sec)
I do not see a way to distinguish lock wait from actually holding the lock. All I know at the moment is that there is engine-level lock on misam table (and inno table, for that matter). If the MySQL thread id is NOT for the thread that is "Waiting for table level lock", then it must be the thread that holds the lock! Some smart join with proper WHERE clause would let me to find out what I need directly. Maybe in one of the next parts I'll even present it, but writing it from the top of my head in critical situation is above my current skills related to performance_schema.
* * *
Now, what those of us should do who do not have performance_schema enabled, or have to use version before 5.7? Or those with access to gdb and spare 5 minutes?

Surely we should attach gdb to the mysqld process and, if in doubts, read some parts of the source code to know where to start. I started with the following fragments of code that were easy to find (as long as you know that COM_DEBUG command is actually sent by mysqladmin debug to server):
openxs@ao756:~/git/percona-server$ grep -rni com_debug *
include/mysql/plugin_audit.h.pp:160:  COM_DEBUG,
include/mysql.h.pp:47:  COM_DEBUG,
include/my_command.h:39:  COM_DEBUG,
libmysql/libmysql.c:899:  DBUG_RETURN(simple_command(mysql,COM_DEBUG,0,0,0));
rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task_debug.h:62:  "[XCOM_DEBUG] ",
sql/sql_parse.cc:1884:  case COM_DEBUG:
^C
...
# from the line highlighted above...  case COM_DEBUG:
    thd->status_var.com_other++;
    if (check_global_access(thd, SUPER_ACL))
      break;                                    /* purecov: inspected */
    mysql_print_status();    query_logger.general_log_print(thd, command, NullS);
    my_eof(thd);
    break;

openxs@ao756:~/git/percona-server$ grep -rni mysql_print_status *
sql/sql_test.h:40:void mysql_print_status();
sql/sql_parse.cc:59:#include "sql_test.h"         // mysql_print_status
sql/sql_parse.cc:1888:    mysql_print_status();
sql/sql_test.cc:456:void mysql_print_status()sql/mysqld.cc:69:#include "sql_test.h"     // mysql_print_status
sql/mysqld.cc:2387:        mysql_print_status();   // Print some debug info
openxs@ao756:~/git/percona-server$

# from the line highlighted above...

void mysql_print_status()
{
  char current_dir[FN_REFLEN];
  STATUS_VAR current_global_status_var;

  printf("\nStatus information:\n\n");
  (void) my_getwd(current_dir, sizeof(current_dir),MYF(0));
  printf("Current dir: %s\n", current_dir);
  printf("Running threads: %u  Stack size: %ld\n",
         Global_THD_manager::get_instance()->get_thd_count(),
         (long) my_thread_stack_size);
  thr_print_locks();                            // Write some debug info
...
  printf("\nTable status:\n\
...
  display_table_locks();...
# finally, let's find the code of display_table_locks...openxs@ao756:~/git/percona-server$ grep -rni display_table_locks *
sql/sql_test.cc:371:static void display_table_locks(void)sql/sql_test.cc:501:  display_table_locks();
sql/mysqld.cc:9537:  { &key_memory_locked_thread_list, "display_table_locks", PSI_FLAG_THREAD},
openxs@ao756:~/git/percona-server$

static void display_table_locks(void)
{
  LIST *list;
  Saved_locks_array saved_table_locks(key_memory_locked_thread_list);
  saved_table_locks.reserve(table_cache_manager.cached_tables() + 20);

  mysql_mutex_lock(&THR_LOCK_lock);
  for (list= thr_lock_thread_list; list; list= list_rest(list))
  {
    THR_LOCK *lock=(THR_LOCK*) list->data;
    mysql_mutex_lock(&lock->mutex);
    push_locks_into_array(&saved_table_locks, lock->write.data, FALSE,
                          "Locked - write");
    push_locks_into_array(&saved_table_locks, lock->write_wait.data, TRUE,
                          "Waiting - write");
    push_locks_into_array(&saved_table_locks, lock->read.data, FALSE,
                          "Locked - read");
    push_locks_into_array(&saved_table_locks, lock->read_wait.data, TRUE,
                          "Waiting - read");
    mysql_mutex_unlock(&lock->mutex);
  }
...
That's enough to start. We'll have to finally study what THR_LOCK is. There is a global (double) linked lists of all them, thr_lock_thread_list, and this is where we can get details from, in a way similar (let's hope) to those server does while processing COM_DEBUG command for us. So, let's attach gdb and let's fun begin:
openxs@ao756:~/git/percona-server$ sudo gdb -p 23841
[sudo] password for openxs:
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
...
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007fafd93d9c5d in poll () at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set $list=(LIST *)thr_lock_thread_list
(gdb) set $lock=(THR_LOCK*) $list->data
(gdb) p *($lock)
$1 = {list = {prev = 0x0, next = 0x7fafd2fa4780, data = 0x7fafbd545c80},
  mutex = {m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0,
        __nusers = 0, __kind = 3, __spins = 0, __elision = 0, __list = {
          __prev = 0x0, __next = 0x0}},
      __size = '\000' <repeats 16 times>, "\003", '\000' <repeats 22 times>,
      __align = 0}, m_psi = 0x7fafd1f97c80}, read_wait = {data = 0x0,
    last = 0x7fafbd545cc8}, read = {data = 0x0, last = 0x7fafbd545cd8},
  write_wait = {data = 0x0, last = 0x7fafbd545ce8}, write = {
    data = 0x7fafbd69f188, last = 0x7fafbd69f190}
, write_lock_count = 0,
  read_no_write_count = 0, get_status = 0xf84910 <mi_get_status>,
  copy_status = 0xf84a70 <mi_copy_status>,
  update_status = 0xf84970 <mi_update_status>,
  restore_status = 0xf84a50 <mi_restore_status>,
  check_status = 0xf84a80 <mi_check_status>}
In the above we started from the first item in the list. If needed we can move on to the next with set $list=(LIST *)($list->next) etc. We then interpret $list->data as a (THD_LOCK*). There we can see read, read_wait, write and write_wait structures. One of them will have data item that is not 0x0. This way we can identify is it a lock or lock wait, and what kind of lock is it. In my case i was looking for a write lock, and here it is, the very first one. So, I continue:
(gdb) p *($lock)->write.data
$2 = {owner = 0x7fafbd468d38, next = 0x0, prev = 0x7fafbd545cf8,
  lock = 0x7fafbd545c80, cond = 0x0, type = TL_WRITE,
  status_param = 0x7fafbd69ee20, debug_print_param = 0x7fafbd515020,
  m_psi = 0x7fafa4c06cc0}
We can check the owner field:
(gdb) p *($lock)->write.data.owner
$3 = {thread_id = 16, suspend = 0x7fafbd469370}
and thread_id there is what we were looking for, the MySQL thread id of the blocking thread. If we want to get the details about the table locks we can do this as follows, by studying debug_print_param in data:
(gdb) set $table=(TABLE *) &(*($lock)->write.data->debug_print_param)
(gdb) p $table
$4 = (TABLE *) 0x7fafbd515020
(gdb) p *($table)
$5 = {s = 0x7fafbd4d7030, file = 0x7fafbd535230, next = 0x0, prev = 0x0,
  cache_next = 0x0, cache_prev = 0x7fafc4df05a0, in_use = 0x7fafbd468000,
  field = 0x7fafbd4d7440, hidden_field_count = 0, record = {
    0x7fafbd4d7430 "\377", 0x7fafbd4d7438 "\n"}, write_row_record = 0x0,
  insert_values = 0x0, covering_keys = {map = 0}, quick_keys = {map = 0},
  merge_keys = {map = 0}, possible_quick_keys = {map = 0},
  keys_in_use_for_query = {map = 0}, keys_in_use_for_group_by = {map = 0},
  keys_in_use_for_order_by = {map = 0}, key_info = 0x7fafbd4d7508,
  next_number_field = 0x0, found_next_number_field = 0x0, vfield = 0x0,
  hash_field = 0x0, fts_doc_id_field = 0x0, triggers = 0x0,
  pos_in_table_list = 0x7fafbd47a9b8, pos_in_locked_tables = 0x7fafbd4e5030,
  group = 0x0, alias = 0x7fafbbbad120 "misam",
  null_flags = 0x7fafbd4d7430 "\377", bitmap_init_value = 0x0, def_read_set = {
    bitmap = 0x7fafbd4d75a8, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75a8, mutex = 0x0}, def_write_set = {
    bitmap = 0x7fafbd4d75ac, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75ac, mutex = 0x0}, tmp_set = {
    bitmap = 0x7fafbd4d75b0, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75b0, mutex = 0x0}, cond_set = {
    bitmap = 0x7fafbd4d75b4, n_bits = 1, last_word_mask = 4294967294,
    last_word_ptr = 0x7fafbd4d75b4, mutex = 0x0},
  def_fields_set_during_insert = {bitmap = 0x7fafbd4d75b8, n_bits = 1,
    last_word_mask = 4294967294, last_word_ptr = 0x7fafbd4d75b8, mutex = 0x0},
---Type <return> to continue, or q <return> to quit---
  read_set = 0x7fafbd515128, write_set = 0x7fafbd515148,
  fields_set_during_insert = 0x7fafbd5151a8, query_id = 0, quick_rows = {
    0 <repeats 64 times>}, const_key_parts = {0 <repeats 64 times>},
  quick_key_parts = {0 <repeats 64 times>}, quick_n_ranges = {
    0 <repeats 64 times>}, quick_condition_rows = 0, lock_position = 0,
  lock_data_start = 0, lock_count = 1, temp_pool_slot = 0, db_stat = 39,
  current_lock = 1, nullable = 0 '\000', null_row = 0 '\000',
  status = 3 '\003', copy_blobs = 0 '\000', force_index = 0 '\000',
  force_index_order = 0 '\000', force_index_group = 0 '\000',
  distinct = 0 '\000', const_table = 0 '\000', no_rows = 0 '\000',
  key_read = 0 '\000', no_keyread = 0 '\000', locked_by_logger = 0 '\000',
  no_replicate = 0 '\000', locked_by_name = 0 '\000',
  fulltext_searched = 0 '\000', no_cache = 0 '\000',
  open_by_handler = 0 '\000', auto_increment_field_not_null = 0 '\000',
  insert_or_update = 0 '\000', alias_name_used = 0 '\000',
  get_fields_in_item_tree = 0 '\000', m_needs_reopen = 0 '\000',
  created = true, max_keys = 0, reginfo = {join_tab = 0x0, qep_tab = 0x0,
    lock_type = TL_WRITE, not_exists_optimize = false,
    impossible_range = false}, mem_root = {free = 0x7fafbd4d7420,
    used = 0x7fafbd535220, pre_alloc = 0x0, min_malloc = 32, block_size = 992,
    block_num = 6, first_block_usage = 0, max_capacity = 0,
    allocated_size = 2248, error_for_capacity_exceeded = 0 '\000',
    error_handler = 0xc4b240 <sql_alloc_error_handler()>, m_psi_key = 106},
---Type <return> to continue, or q <return> to quit---
  blob_storage = 0x0, grant = {grant_table = 0x0, version = 0,
    privilege = 536870911, m_internal = {m_schema_lookup_done = true,
      m_schema_access = 0x0, m_table_lookup_done = true,
      m_table_access = 0x0}}, sort = {filesort_buffer = {m_next_rec_ptr = 0x0,
      m_rawmem = 0x0, m_record_pointers = 0x0, m_sort_keys = 0x0,
      m_num_records = 0, m_record_length = 0, m_sort_length = 0,
      m_size_in_bytes = 0, m_idx = 0}, io_cache = 0x0, merge_chunks = {
      m_array = 0x0, m_size = 0}, addon_fields = 0x0,
    sorted_result_in_fsbuf = false, sorted_result = 0x0,
    sorted_result_end = 0x0, found_records = 0}, part_info = 0x0,
  all_partitions_pruned_away = false, mdl_ticket = 0x7fafbd4c4f10,
  m_cost_model = {m_cost_model_server = 0x0, m_se_cost_constants = 0x0,
    m_table = 0x0}, should_binlog_drop_if_temp_flag = false}
We already see table alias, but more details are in the s filed (of TABLE_SHARE * type):
(gdb) p $table->s$26 = (TABLE_SHARE *) 0x7fafbd4d7030
...
(gdb) p $table->s->table_cache_key
$27 = {str = 0x7fafbd4d73b8 "test", length = 11}
...
(gdb) p $table->s->db
$30 = {str = 0x7fafbd4d73b8 "test", length = 4}
(gdb) p $table->s->path
$31 = {str = 0x7fafbd4d73c8 "./test/misam", length = 12}
(gdb) p $table->s->table_name
$32 = {str = 0x7fafbd4d73bd "misam", length = 5}
# to remind you, this is how to get the thread id directly
(gdb) p (*($lock)->write.data.owner).thread_id
$37 = 16
I skilled some of my tests and intermediate results. To summarize, for me it took only a bit more time to search the code and try commands in gdb than to find out that metadata_locks does not help and one has to use table_handles in performance_schema. Next time in gdb I'll do it instantly, and the number of key strokes is far less :)

Surely, for production environments I'll have to write, test and note somewhere proper queries to the performance_schema. For next attempts with gdb and, maybe, FOSDEM talks I'll have to read the code to remember key details about the data structures used above... More posts to come, stay tuned!

Friday, July 21, 2017

Fun with Bugs #54 - On Some Bugs Fixed in MySQL 5.7.19

More than 3 months after 5.7.18 we' ve got MySQL 5.7.19 released recently. This is my quick review of the release notes with interesting fixed bug (reported in public) highlighted in the areas I am usually interested in.

Let's start with InnoDB. The following bug fixes attracted my attention:
  • Bug #85043 is still private. You know how much I hate those. At least we can see it was about the fact that "The server allocated memory unnecessarily for an operation that rebuilt the table." Let's hope this is no longer the case.
  • Bug #84038 - "Errors when restarting MySQL after FLUSH TABLES FOR EXPORT, RENAME and DROP.", was reported by Jean-François Gagné and verified by Umesh Shastry. Basically, InnoDB failed to update INNODB_SYS_DATAFILES internal data dictionary table properly while renaming tables.
  • Bug #80788 - "Reduce the time of looking for MLOG_CHECKPOINT during crash recovery". It was reported by  Zhai Weixiang (who had provided a patch) and quickly verified by Umesh Shastry.
  • Bug #83470 - "Reverse scan on a partitioned table does ICP check incorrectly, causing slowdown". This report that is related to both InnoDB, partitioning and optimizer, was created by my colleague from MariaDB, Sergey Petrunya, who had also suggested a patch under BSD license. Bug #84107 was considered a duplicate of this one it seems.
There were some public bugs fixed in group replication. While I do not really care, yet, about Group Replication, it's nice to see bugs in this area fixed so fast:
  • Bug #85667 - "GR node is in RECOVERING state if binlog_checksum configured on running server". This bug was reported by Ramana Yeruva.
  • Bug #85047 - "Secondaries allows transactions through ASYNC setup into the group". Yet another group replication bug recently fixed. It was reported by Narendra Chauhan who probably works for Oracle.
  • Bug #84728 - "Not Possible To Avoid MySQL From Starting When GR Cannot Start", was reported by Kenny Gryp from Percona.
  • Bug #84329 - "Some Class B private address is not permitted automatically". This funny bug was reported by Satoshi Mitani and verified by Umesh Shastry
  • Bug #84153 - "ASSERT "!contains_gtid(gtid)" rpl_gtid_owned.cc:94 Owned_gtids::add_gtid_owner". It was reported by Narendra Chauhan.
Usual async replication was also improved in 5.7.19, with the following related bugs fixed:
  • Bug #83184 - "Can't set slave_skip_errors > 3000". This bug was reported by Tsubasa Tanaka (who had provided a patch) and verified by Umesh Shastry.
  • Bug #82283 - "main.mysqlbinlog_debug fails with a LeakSanitizer error". Laurynas Biveinis from Percona found it and provided patches. The bug was verified by Umesh Shastry.
  • Bug #81232 - "Changing master_delay after stop slave results in loss of events". Ths regression bug (5.6.x was not affected) was reported by Parveez Baig. Bug #84249 (reported by Sergio Roysen) was declared a duplicate of this one.
  • Bug #77406 - "MTS must be able handle large packets with small slave_pending_jobs_size_max", was reported by my colleague Andrii Nikitin while he worked in Oracle.
  • Bug #84471 is still private. Why it could be so when the bug is supposed to be about "...the master could write to the binary log a last_committed value which was smaller than it should have been. This could cause the slave to execute in parallel transactions which should not have been, leading to inconsistencies or other errors." Let's hide the details about all bugs that may lead to data corruption,s why not?
  • Bug #83186 - "Request for slave_skip_errors = ER_INCONSISTENT_ERROR". Thanks to  Tsubasa Tanaka, who had provided a patch, we can do this now.
  • Bug #82848 - "Restarting a slave after seeding it with a mysqldump loses it's position". This serious problem with GTID-based replication was noted and resolved (with a patch) by Daniël van Eeden.
  • Bug #82209 is also private... Now, I am tired of them, just go real the release notes and hope they describe the problem and solution right. Great for any open source software, isn't it?
  • Bug #81119 - "multi source replication slave sql running error 1778". This serious bug (where GTIDs also play role) was reported by Mohamed Atef. Good to see it fixed finally.
I know long term query cache is doomed, Oracle is not going to work on it any more, but in the meantime, please, check this important bug fixed, Bug #86047 - "FROM sub query with 'group by' is cached by mistake", by Meiji Kimura. If you still use query cache and have complex queries with derived tables, please, consider upgrade. You might be getting wrong results all this time...

If you use xtrabackup or Oracle's MySQL Enterprise Backup in the environment with XA transactions, please, check Bug #84442 - "XA PREPARE inconsistent with XTRABACKUP", by David Zhao, who had also provided a patch. Your backups may be inconsistent until you upgrade...

Now, on some optimizer bugs fixed:
  • Bug #81031 - "count(*) on innodb sometimes returns 0". This happened when index merge was used by the optimizer. The bug was reported by Ashraf Amayreh and verified by Shane Bester.
  • Bug #84320 - "DISTINCT clause does not work in GROUP_CONCAT". It was reported by Varun Gupta and verified by Bogdan Kecman. See Bug #68145 also (that is still "Verified").
  • Bug #79675 - "index_merge_intersection optimization causes wrong query results". This bug was reported by Saverio M and verified by Miguel Solorzano.
There were a lot more bugs fixed in 5.7.19. I'd say that if one uses replication (especially GTID-based or group replication), upgrade is a must, but I had no tried this version myself, yet.

Monday, July 17, 2017

Fun with Bugs #53 - On Some Percona Software Bugs I've Reported

So far in this series I had written only/mostly about MySQL server bugs. Does it mean that there are no unique or interesting bugs in other MySQL-related software or MySQL forks and projects that use MySQL code?

Surely no, it doesn't. So, today I'd like to present a short list of currently active (that is, new or probably not yet resolved) bugs that I had reported for Percona's software at Launchpad over the years I use it. The list is based on this link, but I excluded minor reports and those that probably are just forgotten, while the problem is fixed long time ago.

Let me start with bugs in XtraBackup, the tool that plays a key role in most MySQL environments that are not using Oracle's Enterprise subscriptions:
  • lp:1704636 - "xtrabackup 2.4.7 crashes while backing up MariaDB 10.2.x with ftwrl-* options". I've reported it yesterday. Probably some unique background thread of MariaDB 10.2.x is not taken into account and some calculations are performed for it that lead to crash. Sorry, but there is no way to use --ftwrl-* options of XtraBackup with MariaDB 10.2.x.
  • lp:1418150 - "Provide a way for xtrabackup to communicate with storage managers via XBSA API". This is a feature request created based on customer's request to store backups in Tivoli Storage Manager. Similar request was recently noted from MariaDB customer as well. MySQL's Enterprise Backup allows to work with storage managers, even if not in the most direct way. I've used TSM a lot in the past in production as Informix DBA, so I care about this feature more than about many others...
Let's continue with another unique software package from Percona, Percona Toolkit. I still use some of the tools every day (and do not hesitate to write about this), and would like to see more attention to the toolkit in areas other than MongoDB. I've reported the following two bugs in the past that are still active:
  • lp:1566939 - "pt-stalk should not call mysqladmin debug by default". Having in mind documentation Bug #71346 that Oracle just refuses to fix, I'd say there may be a dozen of people in the world who can make a good use of the output even when it's correct, useful and really needed. For the rest of us it just pollutes error logs and pt-stalk data collections.
  • lp:1533271 - "pt-online-schema-change does not report anything when it waits for slave to replicate new table creation". The problem is clear enough from description. There is nothing useful to find out the reason for the hang in this case without PTDEBUG=1.
Finally, let's check several Percona Server bugs. Note that in many cases i had to file bug for Percona Server even if upstream MySQL was also affected, in a hope that Percona may fix the problem faster (and this happened sometimes). The following bugs were not inherited from MySQL directly:
  • lp:1455432 - "Audit log plugin should allow to include or exclude logging for specific users". This was request for the same functionality that was originally provided in MariaDB's implementation. Actually the bug was fixed in Percona Server 5.6.32-78.0 and 5.7.14-7, but remains in the list as it is still just "Triaged" for 5.5.x branch. Time to close it completely, I'd say.
  • lp:1408971 - "Manual does not describe Innodb_current_row_locks in any useful way". Nothing changed, check the manual.
  • lp:1401528 - "Add option to switch to "old" get_lock() behavior". It's hard to introduce new features in minor release during GA period, and do it right. Somebody had to think about backward compatibility, upgrades and downgrades, and about the completeness of implementation (see lp:1396336 about that). In this specific case Percona failed to do it right.
  • lp:1396035 - "Percona-Server-test-56 RPM package does NOT check /usr/share/percona-server directory and thus is unusable "as is". Packaging is another thing that is hard to do right. I was not lazy today, so I re-check this on CentOS 6.x. with 5.7.18, and ./mtr is somewhat useful there:

    [root@centos mysql-test]# ls -l ./mtr
    lrwxrwxrwx. 1 root root 19 Jun  7 18:20 ./mtr -> ./mysql-test-run.pl
    [root@centos mysql-test]# perl ./mtr analyze
    Logging: ./mtr  analyze
    MySQL Version 5.7.18
    Checking supported features...
     - SSL connections supported
    Collecting tests...
    Checking leftover processes...
    Removing old var directory...
    Creating var directory '/usr/share/mysql-test/var'...
    Installing system database...
    Using parallel: 1

    ==============================================================================

    TEST                                      RESULT   TIME (ms) or COMMENT
    --------------------------------------------------------------------------

    worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
    worker[1] mysql-test-run: WARNING: running this script as _root_ will cause some tests to be skipped
    main.analyze                             [ pass ]  10920
    --------------------------------------------------------------------------
    The servers were restarted 0 times
    Spent 10.920 of 58 seconds executing testcases

    Completed: All 1 tests were successful.
  • lp:1263968 - "DOCS: http://www.percona.com/doc/percona-server/5.6/installation.html does not provide enough details on further steps after installing .deb". I'd still say that some details requested are missing. Writing proper manuals is hard.
I had surely reported a lot more bugs, but many of them have related upstream MySQL ones, so there are no reasons to discuss them separately now. Also, note that in Percona I was involved in setting priorities for the bugs and mostly reported them based on customer issues or complains. So, no wonder that my bug reports were fixed fast, and not so many active ones left by now.

There was yet another Percona tool that I spent as lot of time on, both on building it from current sources, using it and  reporting problems with it. It's Percona Playback, Based on this, they seem to work on other but similar tool now. Still, I consider this request useful for any tool that allows to replay specific load: lp:1365538.

Sunday, July 9, 2017

MySQL Support Engineer's Chronicles, Issue #7

This week in Support was busy enough for me. Among other things I had to study all possible reasons (other than obvious query cache impact) for queries hanging in "query end" status and noted Bug #80652 related to binlog group commit and fixed in MySQL 5.7.17+ and 8.0.1+ only. The case I had to review was related to Galera though, and I suggest you to note that "query end" may be related to Galera replication stall. Studying this path further soon brought lp:1197771 - "Cluster stalls while distributing transaction" to my attention again, so I asked about proper status for it on Facebook. As it happens way too often recently, I've got few 'Likes" but no further comments, neither from Percona nor from Codership. It seems good old times of my efficient bugs escalations via Facebook had gone...

Does it mean I am going to stop writing and sharing posts about bugs on Facebook? Definitely it does NOT! One of the recent reasons is this post about Bug #86902 that, after being shared by me, got a quick comment and clarification by Sveta Smirnova that the problem (performance problem when binary logging is turned OFF and you use more than one transactional storage engine concurrently) is actually know, - it was  reported as Bug #80870 (that is still "Open", what a shame).

So, even if some Oracle or Percona employees will continue to ignore new bugs reports or my Facebook (re-)posts about MySQL bugs, at least bug reporter gets more chances to get help from Community, additional references and clarifications. So, stay tuned - I'll keep writing about MySQL bugs in this blog and, on a more regular basis and probably for a wider audience, - on Facebook. As popular Mahatma Gandhi's quote says:
"First they ignore you, then they laugh at you, then they fight you, then you win."
In my case with public MySQL bugs and the way they are processed by Oracle it was not strictly in this order, and I am yet to see people laughing at me for this reason, but I think I already won as since September 2012 public MySQL bugs database still exists and several Oracle engineers still pay attention to community bug reports there.

I had reported one bug myself this week, Bug #86930 - "Wrong results for queries with row constructors and information_schema". It was quickly verified by Miguel Solorzano. It's a funny bug (with workarounds), but not as scary as few other bugs recently reported and described in this post by Jean-François Gagné.Note, that as he stated, there are more than just that 2 bugs he mentioned (Bug #86926 and Bug #86927). Everybody in Oracle who should know that already knows it, and I know it. I keep stating that the implementation of InnoDB persistent statistics gives us more troubles than benefits. Last time I cared and complained that much it was about Bug #70629 - "InnoDB updated rec_per_key[] statistics not published to the optimizer enough often". But, trust me, this recent set of problems Jean-François found is way more serious.

I'd like to share few references I found extremely useful this week while working on Support issues this week. First of all, if you set up SSL connections for MySQL, take time to review this old post by my former colleague from Percona Roman Vynar, and another one by Matthew Boehm. This may save you some time and debugging efforts. If you care about MariaDB ColumnStore performance (I have to), check this KB article. If you want to replicate data to the table with somewhat different column types on slave, make sure to read the manual. Finally, if you care enough to keep SELinux enabled and use Galera clusters with any non-default ports or directories, make sure to read this blog post.

I've noted a couple of serious decisions made by Oracle in regard to MySQL. First of the is to discontinue work on MySQL Internals manual and integrate it some how into the main manual. You can find related text added recently to some bug reports, like Bug #67989 - "MySQL Internals documentation missing 5.6 binlog protocol parts" by my colleague Andrew Hutchings:
"No more updates are made to the MySQL Internals documentation, because it's in the process of being replaced by https://dev.mysql.com/doc/dev/mysql-server/latest/."
Sounds promising, isn't it?

The other one, that is going to be accepted way better, IMHO, is to discontinue work on query cache. Check Bug #86046 - "FROM sub query is cached by mistake" by Meiji Kimura, for example:
"[7 Jul 9:09] Erlend Dahl

MySQL will no longer invest in the query cache, see:

So, that were some of the links I've noted during this week. Some of the topics mentioned above may be continued and presented in more details in my further posts.