Thursday, January 7, 2016

Exploring Metadata Locks with gdb - How One Can Use This?

In the previous post in this series I've concluded that metadata locks are acquired in "batches" and the function that implements this is called MDL_context::acquire_locks. Let's check quickly what it does to confirm where wait for metadata lock really happens. We need this to proceed finally from studying what locks are set and when (this is a long and complicated topic to spend time on out of general interest) to more practical topic: how to find the session that holds the blocking metadata lock in MySQL versions before 5.7.x.

I'll continue to use Percona Server 5.6.27 for now, just because I have it installed and have a source code at hand. So, MDL_context class is defined in sql/mdl.h file as follows:

/**
  Context of the owner of metadata locks. I.e. each server
  connection has such a context.
*/

class MDL_context
{
public:
  typedef I_P_List<MDL_ticket,
                   I_P_List_adapter<MDL_ticket,
                                    &MDL_ticket::next_in_context,
                                    &MDL_ticket::prev_in_context> >
          Ticket_list;

  typedef Ticket_list::Iterator Ticket_iterator;

  MDL_context();
...

  bool try_acquire_lock(MDL_request *mdl_request);
  bool acquire_lock(MDL_request *mdl_request, ulong lock_wait_timeout);
  bool acquire_locks(MDL_request_list *requests, ulong lock_wait_timeout);
...

  unsigned long get_lock_owner(MDL_key *mdl_key);...
private:
  THD *get_thd() const { return m_owner->get_thd(); }...

};

I've highlighted some of the functions that we may use later. Now, this is how MDL_context::aquire_locks() implementation looks like in mdl.cc:

/**
  Acquire exclusive locks. There must be no granted locks in the
  context.

  This is a replacement of lock_table_names(). It is used in
  RENAME, DROP and other DDL SQL statements.

  @param  mdl_requests  List of requests for locks to be acquired.

  @param lock_wait_timeout  Seconds to wait before timeout.

  @note The list of requests should not contain non-exclusive lock requests.
        There should not be any acquired locks in the context.

  @note Assumes that one already owns scoped intention exclusive lock.

  @retval FALSE  Success
  @retval TRUE   Failure
*/

bool MDL_context::acquire_locks(MDL_request_list *mdl_requests,
                                ulong lock_wait_timeout)
{

...
  for (p_req= sort_buf; p_req < sort_buf + req_count; p_req++)
  {
    if (acquire_lock(*p_req, lock_wait_timeout))
      goto err;
  }
  my_free(sort_buf);
  return FALSE;

err:
  /*
    Release locks we have managed to acquire so far.
    Use rollback_to_savepoint() since there may be duplicate
    requests that got assigned the same ticket.
  */
  rollback_to_savepoint(mdl_svp);
  /* Reset lock requests back to its initial state. */
...

  my_free(sort_buf);
  return TRUE;
}


Comments make me think that for some metadata locks we may see some different function called, but let's deal with exclusive ones for now, those that can be blocked. Now, the MDL_context::aquire_lock() we eventually call for each request:

/**
  Acquire one lock with waiting for conflicting locks to go away if needed.

  @param mdl_request [in/out] Lock request object for lock to be acquired

  @param lock_wait_timeout [in] Seconds to wait before timeout.

  @retval  FALSE   Success. MDL_request::ticket points to the ticket
                   for the lock.
  @retval  TRUE    Failure (Out of resources or waiting is aborted),
*/

bool
MDL_context::acquire_lock(MDL_request *mdl_request, ulong lock_wait_timeout)
{
  MDL_lock *lock;
  MDL_ticket *ticket= NULL;
  struct timespec abs_timeout;
  MDL_wait::enum_wait_status wait_status;
...

  /*
    Our attempt to acquire lock without waiting has failed.
    As a result of this attempt we got MDL_ticket with m_lock
    member pointing to the corresponding MDL_lock object which
    has MDL_lock::m_rwlock write-locked.
  */
  lock= ticket->m_lock;

  lock->m_waiting.add_ticket(ticket);

  /*
    Once we added a pending ticket to the waiting queue,
    we must ensure that our wait slot is empty, so
    that our lock request can be scheduled. Do that in the
    critical section formed by the acquired write lock on MDL_lock.
  */
  m_wait.reset_status();

  /*
    Don't break conflicting locks if timeout is 0 as 0 is used
    To check if there is any conflicting locks...
  */
  if (lock->needs_notification(ticket) && lock_wait_timeout)
    lock->notify_conflicting_locks(this);

  mysql_prlock_unlock(&lock->m_rwlock);

  will_wait_for(ticket);

  /* There is a shared or exclusive lock on the object. */
  DEBUG_SYNC(get_thd(), "mdl_acquire_lock_wait");

  find_deadlock();

  struct timespec abs_shortwait;
  set_timespec(abs_shortwait, 1);
  wait_status= MDL_wait::EMPTY;

  while (cmp_timespec(abs_shortwait, abs_timeout) <= 0)
  {
    /* abs_timeout is far away. Wait a short while and notify locks. */
    wait_status= m_wait.timed_wait(m_owner, &abs_shortwait, FALSE,
                                   mdl_request->key.get_wait_state_name());

    if (wait_status != MDL_wait::EMPTY)
      break;
    /* Check if the client is gone while we were waiting. */
    if (! m_owner->is_connected())
    {
      /*
       * The client is disconnected. Don't wait forever:
       * assume it's the same as a wait timeout, this
       * ensures all error handling is correct.
       */
      wait_status= MDL_wait::TIMEOUT;
     break;
    }

    mysql_prlock_wrlock(&lock->m_rwlock);
    if (lock->needs_notification(ticket))
      lock->notify_conflicting_locks(this);
    mysql_prlock_unlock(&lock->m_rwlock);
    set_timespec(abs_shortwait, 1);
  }
...


So, here we check if our request is deadlocking and then we wait. It's clear that any session that is hanging while waiting for the metadata lock will have MDL_context::aquire_lock() in the backtrace.

To check this assumption let me set up a usual test with SELECT * FROM t1 executed in active transaction and TRUNCATE TABLE blocked. This is how it may look like in the SHOW PROCESSLIST:

mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+-------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                           | Info              | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+---------------------------------+-------------------+-----------+---------------+
|  2 | root | localhost | test | Query   |  121 | Waiting for table metadata lock | truncate table t1 |         0 |             0 |
|  3 | root | localhost | test | Sleep   |    2 |                                 | NULL              |         1 |             0 |
|  4 | root | localhost | test | Query   |    0 | init                            | show processlist  |         0 |             0 |
+----+------+-----------+------+---------+------+---------------------------------+-------------------+-----------+---------------+
3 rows in set (0.00 sec)

In general (and in this case) we may NOT be able to find a thread that is executing some statement for longer than out blocked session is waiting for the metadata lock. So, which one is the blocking one?

Let's try to find a thread that mentions  MDL_context::aquire_lock() in the backtrace:

[root@centos percona-server]# gdb -p `pidof mysqld` -ex "set pagination 0" -ex "thread apply all bt" -batch 2>/dev/null | grep 'MDL_context::acquire_lock'
#3  0x000000000064a74b in MDL_context::acquire_lock (this=0x7fc005ff5140, mdl_request=0x7fc00614e488, lock_wait_timeout=) at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:2380
#4  0x000000000064ae1b in MDL_context::acquire_locks (this=0x7fc005ff5140, mdl_requests=) at /usr/src/debug/percona-server-5.6.27-76.0/sql/mdl.cc:2500
[root@centos percona-server]#

Note that I've executed the command as root, because mysqld process is owned by root in this case.

So, we see exactly one backtrace with this call:

MDL_context::acquire_lock (this=0x7fc005ff5140, mdl_request=0x7fc00614e488, lock_wait_timeout=)

and from the code review we know that 0x7fc00614e488 is of type MDL_request *. We also know that 0x7fc005ff5140 is of type MDL_context * (this pointer in the method of that class). Now we can attach gdb to the mysqld running and try to see what we can do with that pointers:

[root@centos percona-server]# gdb -p `pidof mysqld`
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-83.el6)
...

(gdb) set $pmdlr=(MDL_request *)0x7fc00614e488
(gdb) p $pmdlr
$1 = (MDL_request *) 0x7fc00614e488
(gdb) p $pmdlr->key
$2 = {m_length = 9, m_db_name_length = 4,
  m_ptr = "\003test\000t1", '\000' <repeats 378 times>,
  static m_namespace_to_wait_state_name = {{m_key = 102,
      m_name = 0xb9eb8c "Waiting for global read lock", m_flags = 0}, {
      m_key = 103, m_name = 0xb9eba9 "Waiting for backup lock", m_flags = 0}, {
      m_key = 104, m_name = 0xb9ed68 "Waiting for schema metadata lock",
      m_flags = 0}, {m_key = 105,
      m_name = 0xb9ed90 "Waiting for table metadata lock", m_flags = 0}, {
      m_key = 106,
      m_name = 0xb9edb0 "Waiting for stored function metadata lock",
      m_flags = 0}, {m_key = 107,
      m_name = 0xb9ede0 "Waiting for stored procedure metadata lock",
      m_flags = 0}, {m_key = 108,
      m_name = 0xb9ee10 "Waiting for trigger metadata lock", m_flags = 0}, {
      m_key = 109, m_name = 0xb9ee38 "Waiting for event metadata lock",
      m_flags = 0}, {m_key = 110, m_name = 0xb9ebc1 "Waiting for commit lock",
      m_flags = 0}, {m_key = 79, m_name = 0xb9295a "User lock", m_flags = 0}, {
      m_key = 111, m_name = 0xb9ebd9 "Waiting for binlog lock", m_flags = 0}}}
(gdb) p &($pmdlr->key)
$3 = (MDL_key *) 0x7fc00614e4a8


So, we can get a pointer to the MDL_key. Now, in MDL_context class we had seen the following method:

  unsigned long get_lock_owner(MDL_key *mdl_key);

that returns the id of the owning thread, and lucky we are, we have a pointer to MDL_context, so we can refer to this function and call it if needed:

(gdb) set $pmdlc=(MDL_context *)0x7fc005ff5140
(gdb) p $pmdlc
$4 = (MDL_context *) 0x7fc005ff5140
(gdb) p $pmdlc->get_lock_owner(&($pmdlr->key))
$5 = 3


Now, what's that value printed, 3? It's the id of thread in the SHOW PROCESSLIST that holds the blocking metadata lock. So, we quit from gdb:

(gdb) q
A debugging session is active.

        Inferior 1 [process 3003] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/sbin/mysqld, process 3003


and then we know what thread to kill to release the blocking lock and let TRUNCATE to proceed:

mysql> show processlist;
+----+------+-----------+------+---------+------+---------------------------------+-------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State                           | Info              | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+---------------------------------+-------------------+-----------+---------------+
|  2 | root | localhost | test | Query   | 1291 | Waiting for table metadata lock | truncate table t1 |         0 |             0 |
|  3 | root | localhost | test | Sleep   | 1172 |                                 | NULL              |         1 |             0 |
|  4 | root | localhost | test | Query   |    0 | init                            | show processlist  |         0 |             0 |
+----+------+-----------+------+---------+------+---------------------------------+-------------------+-----------+---------------+
3 rows in set (0.00 sec)

mysql> kill 3;
Query OK, 0 rows affected (0.05 sec)

mysql> show processlist;
+----+------+-----------+------+---------+------+-------+------------------+-----------+---------------+
| Id | User | Host      | db   | Command | Time | State | Info             | Rows_sent | Rows_examined |
+----+------+-----------+------+---------+------+-------+------------------+-----------+---------------+
|  2 | root | localhost | test | Sleep   | 1305 |       | NULL             |         0 |             0 |
|  4 | root | localhost | test | Query   |    0 | init  | show processlist |         0 |             0 |
+----+------+-----------+------+---------+------+-------+------------------+-----------+---------------+
2 rows in set (0.00 sec)


So, from now on you know how to find out in MySQL 5.6 the blocking session/thread for any session that is waiting on metadata with just gdb and some grep.

This is not the only way, you can get the same result from different functions in different MDL-related classes. Your findings may depend on what part of the code you traced or read first, but essentially that's it: you can easily find a blocking thread for any waiting MDL lock request, using just a couple of gdb commands! No need to kill them one by one in a hope.

For any MySQL DBA gdb can be the tool useful in their routine work, NOT only while studying core dumps.

In the following blog posts in this series I'll get back to studying source code and will try to work with MySQL 5.7 in a similar way. Stay tuned!

No comments:

Post a Comment