Tuesday, March 31, 2015

Using gdb to understand what locks (and when) are really set by InnoDB. Part I.

There is a story behind this post. To make it short, the detailed study presented here started when I noted old Bug #72748 in "No feedback" status few days ago. I considered this status totally wrong and started a discussion on Facebook about it (yes, this is what I usually do, I discuss handling of MySQL bugs in public).

The technical topic of this discussion (blaming aside) was the way AUTO-INC locks are set by InnoDB with default innodb_autoinc_lock_mode=1. Manual defines "bulk inserts" concept:
  • Bulk inserts
    Statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. This includes INSERT ... SELECT, REPLACE ... SELECT, and LOAD DATA statements, but not plain INSERT. InnoDB will assign new values for the AUTO_INCREMENT column one at a time as each row is processed.
 and then says:
  • innodb_autoinc_lock_mode = 1 (consecutive lock mode)
    This is the default lock mode. In this mode, bulk inserts use the special AUTO-INC table-level lock and hold it until the end of the statement. This applies to all INSERT ... SELECT, REPLACE ... SELECT, and LOAD DATA statements. Only one statement holding the AUTO-INC lock can execute at a time.
I suspected (based on test case presented by my colleague Peiran last year) that this may not be case in reality, or at least the process is a bit more complicated. In the process of technical discussion, to make my points clear and highlight the underlying problem, I've reported another bug, Bug #76533, with simple test case and some speculations (that were not entirely correct as I soon found out).

After re-reading my report and re-checking results I was really surprised to see the result that can not be explained by the manual page mentioned above (yes, after all my bug reports for the manual I still consider it correct until the opposite is clearly proved, but that can be reproduced on MySQL 5.5, 5.6 and 5.7.

Once when I was in doubts I've got a nice advice from my dear friend Sinisa, "Read the Source, Luke!". But one has probably to spend few days at least to understand enough details to say for sure what happens when one executes INSERT INTO T ... SELECT FROM T (same table) comparing to INSERT INTO T ... SELECT FROM S (other table). So, I gave good advice myself (that I actually used to give others at FOSDEM 2015): "Use gdb, Luke!". It was even easier to follow that way after reading this great post by Annamalai.

So, I set up simple test, like this:

[openxs@centos ~]$ mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.23-72.1-log Percona Server (GPL), Release 72.1, Revision 0503478

Copyright (c) 2009-2015 Percona LLC and/or its affiliates
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select @@innodb_autoinc_lock_mode, @@log_bin, @@binlog_format;
+----------------------------+-----------+-----------------+
| @@innodb_autoinc_lock_mode | @@log_bin | @@binlog_format |
+----------------------------+-----------+-----------------+
|                          1 |         1 | STATEMENT       |
+----------------------------+-----------+-----------------+
1 row in set (0.01 sec)

mysql> select @@innodb_locks_unsafe_for_binlog;
+----------------------------------+
| @@innodb_locks_unsafe_for_binlog |
+----------------------------------+
|                                0 |
+----------------------------------+
1 row in set (0.00 sec)
 

mysql> create table t(id int auto_increment primary key, val int);
Query OK, 0 rows affected (0.21 sec)

mysql> create table tt(id int auto_increment primary key, val int);
Query OK, 0 rows affected (0.14 sec)

mysql> insert into t(val) values (1), (2), (3), (4);
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into tt(val) values (1), (2), (3), (4);
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from t;
+----+------+
| id | val  |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.00 sec)

mysql> select * from tt;
+----+------+
| id | val  |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.01 sec)

That is, on Percona Server 5.6.23 I've created two InnoDB tables with auto_increment PRIMARY key and one other integer column. I've inserted the same 4 rows into both tables. Now, I wonder what locks are set while executing each the following statements, in the same single session, and in what order they are set:

insert into t(val) select 100 from tt;
insert into t(val) select 100 from t;

as I suspect there is a difference that is not clearly documented. I decided to use gdb to set breakpoints on the following functions:

lock_table()
lock_rec_lock()
row_lock_table_autoinc_for_mysql()

Don't ask me why these, it's topic for a separate discussion (but you can guess based on the names and/or code review and/or this great post I've already mentioned).

I've made sure there is debug info for the official RPM binaries I've used:

[root@centos ~]# rpm -qa | grep -i percona
...

Percona-Server-server-56-5.6.23-rel72.1.el6.x86_64
Percona-Server-56-debuginfo-5.6.23-rel72.1.el6.x86_64
...


So, I've attached gdb to mysql process running and set breakpoints:

[root@centos ~]# gdb -p `pidof mysqld`
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
...

 Loaded symbols for /usr/lib64/mysql/plugin/ha_tokudb.so
0x00007fd74b0970d3 in poll () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.x86_64 jemalloc-3.6.0-1.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 nss-softokn-freebl-3.14.3-22.el6_6.x86_64 openssl-1.0.1e-30.el6_6.7.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) set pagination 0
(gdb) b lock_table
Breakpoint 1 at 0x91c940: file /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc, line 4426.
(gdb) b lock_rec_lock
Breakpoint 2 at 0x91de04: file /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc, line 2329.
(gdb) b row_lock_table_autoinc_for_mysql
Breakpoint 3 at 0x984504: file /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc, line 1146.
(gdb) c
Continuing.


Then in the session where I created my tables I've started transaction to check the first statement:

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t(val) select 100 from tt;


and immediately hit one of breakpoints in gdb:

[Switching to Thread 0x7fd74cf79700 (LWP 1866)]

Breakpoint 1, lock_table (flags=0, table=0x7fd7233f7de8, mode=LOCK_IS, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:4426
4426            if (flags & BTR_NO_LOCKING_FLAG) {
(gdb) p table->name
$1 = 0x7fd7383feae8 "test/tt"

So, we had a request for IS lock for the tt table.  I had source code at hand to check for some details that may be useful while interpreting data:

-- this is from  storage/innobase/include/lock0types.h
/* Basic lock modes */
enum lock_mode {
        LOCK_IS = 0,    /* intention shared */
        LOCK_IX,        /* intention exclusive */
        LOCK_S,         /* shared */
        LOCK_X,         /* exclusive */
        LOCK_AUTO_INC,  /* locks the auto-inc counter of a table
                        in an exclusive mode */
        LOCK_NONE,      /* this is used elsewhere to note consistent read */
        LOCK_NUM = LOCK_NONE, /* number of lock modes */
        LOCK_NONE_UNSET = 255
};


-- this is from storage/innobase/include/lock0lock.h
#define LOCK_WAIT       256     /*!< Waiting lock flag; when set, it
                                means that the lock has not yet been
                                granted, it is just waiting for its
                                turn in the wait queue */
/* Precise modes */
#define LOCK_ORDINARY   0       /*!< this flag denotes an ordinary
                                next-key lock in contrast to LOCK_GAP
                                or LOCK_REC_NOT_GAP */
#define LOCK_GAP        512     /*!< when this bit is set, it means that the
                                lock holds only on the gap before the record;
                                for instance, an x-lock on the gap does not
                                give permission to modify the record on which
                                the bit is set; locks of this type are created
                                when records are removed from the index chain
                                of records */
#define LOCK_REC_NOT_GAP 1024   /*!< this bit means that the lock is only on
                                the index record and does NOT block inserts
                                to the gap before the index record; this is
                                used in the case when we retrieve a record
                                with a unique key, and is also used in
                                locking plain SELECTs (not part of UPDATE
                                or DELETE) when the user has set the READ
                                COMMITTED isolation level */
#define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waiting
                                gap type record lock request in order to let
                                an insert of an index record to wait until
                                there are no conflicting locks by other
                                transactions on the gap; note that this flag
                                remains set when the waiting lock is granted,
                                or if the lock is inherited to a neighboring
                                record */

if gdb would not be kind enough to do this for me (it 's, for enum, as you can see).

Now, let's check how we ended up here:

(gdb) bt
#0  lock_table (flags=0, table=0x7fd7233f7de8, mode=LOCK_IS, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:4426
#1  0x0000000000999bdd in row_search_for_mysql (buf=0x7fd71bfed410 "\375\004", mode=1, prebuilt=0x7fd7233e2068, match_mode=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0sel.cc:4083
#2  0x00000000008e8b41 in ha_innobase::index_read (this=0x7fd71bf9d610, buf=0x7fd71bfed410 "\375\004", key_ptr=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:8232
#3  0x00000000008dc090 in ha_innobase::index_first (this=0x7fd71bf9d610, buf=0x7fd71bfed410 "\375\004") at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:8629
#4  0x000000000059a3ee in handler::ha_index_first (this=0x7fd71bf9d610, buf=0x7fd71bfed410 "\375\004") at /usr/src/debug/percona-server-5.6.23-72.1/sql/handler.cc:3057
#5  0x00000000006b491d in join_read_first (tab=0x7fd72332dc60) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:2540
#6  0x00000000006b3c0d in sub_select (join=0x7fd72332cf40, join_tab=0x7fd72332dc60, end_of_records=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:1256
#7  0x00000000006b2ea8 in do_select (this=0x7fd72332cf40) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:933
#8  JOIN::exec (this=0x7fd72332cf40) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_executor.cc:194
#9  0x00000000006fc3c5 in mysql_execute_select (thd=0x7fd71c354000, tables=0x7fd72332c888, wild_num=0, fields=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_select.cc:1100
#10 mysql_select (thd=0x7fd71c354000, tables=0x7fd72332c888, wild_num=0, fields=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_select.cc:1221
#11 0x00000000006fcc25 in handle_select (thd=0x7fd71c354000, result=0x7fd72332ce00, setup_tables_done_option=1073741824) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_select.cc:110
#12 0x00000000006d977b in mysql_execute_command (thd=0x7fd71c354000) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:3965
#13 0x00000000006db7e8 in mysql_parse (thd=0x7fd71c354000, rawbuf=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:6922
#14 0x00000000006dcf59 in dispatch_command (command=<value optimized out>, thd=0x7fd71c354000, packet=0x7fd72330f001 "insert into t(val) select 100 from tt", packet_length=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:1434
#15 0x00000000006aa772 in do_handle_one_connection (thd_arg=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_connect.cc:1532
#16 0x00000000006aa860 in handle_one_connection (arg=0x7fd71c354000) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_connect.cc:1443
#17 0x0000000000adc113 in pfs_spawn_thread (arg=0x7fd71c3443e0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/perfschema/pfs.cc:1860
#18 0x00007fd74c9869d1 in start_thread () from /lib64/libpthread.so.0
#19 0x00007fd74b0a08fd in clone () from /lib64/libc.so.6


So, we started with SELECT part and set IS lock on the table we read from tt. This is expected, as manual says:
"INSERT INTO T SELECT ... FROM S WHERE ... sets an exclusive index record lock (without a gap lock) on each row inserted into T. If the transaction isolation level is READ COMMITTED, or innodb_locks_unsafe_for_binlog is enabled and the transaction isolation level is not SERIALIZABLE, InnoDB does the search on S as a consistent read (no locks). Otherwise, InnoDB sets shared next-key locks on rows from S. InnoDB has to set locks in the latter case:"
Our case is REPEATABLE READ and innodb_locks_unsafe_for_binlog is not enabled, so some kind of S locks will be set on rows, and it means we need IS lock on the table (again, by the manual). Let's continue:

(gdb) c
Continuing.

Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=2, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:2329
2329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {
(gdb) p index->table_name
$2 = 0x7fd7383feae8 "test/tt"
(gdb) p index->name
$3 = 0x7fd7233f8480 "PRIMARY"


So, we clearly see an attempt to set a record lock on some row (heap_no=2) in the PRIMARY key of the table tt (this is a row of data, surely). mode=2 means LOCK_S in the enum. So this is an attempt to set S lock on the row ("shared next-key lock" as manual explained above, one day I'll explain how to distinguish it from other kinds of shared locks if needed). Let's continue:

(gdb) c
Continuing.

Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:1146
1146            if (trx == table->autoinc_trx) {
(gdb) p trx
$4 = <value optimized out>
(gdb) p table->name
value has been optimized out

(gdb) p table
$5 = <value optimized out>

(gdb) bt
#0  row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:1146
#1  0x00000000008e8dcc in ha_innobase::innobase_lock_autoinc (this=0x7fd71bf9cc10) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:7084
#2  0x00000000008e9128 in innobase_get_autoinc (this=0x7fd71bf9cc10, offset=1, increment=1, nb_desired_values=1, first_value=0x7fd74cf767c0, nb_reserved_values=0x7fd74cf767d0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:13890
#3  ha_innobase::get_auto_increment (this=0x7fd71bf9cc10, offset=1, increment=1, nb_desired_values=1, first_value=0x7fd74cf767c0, nb_reserved_values=0x7fd74cf767d0) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:13961
#4  0x000000000059ad18 in handler::update_auto_increment (this=0x7fd71bf9cc10) at /usr/src/debug/percona-server-5.6.23-72.1/sql/handler.cc:3472
#5  0x00000000008f06e9 in ha_innobase::write_row (this=0x7fd71bf9cc10, record=0x7fd71bfec810  <incomplete sequence \375>) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/handler/ha_innodb.cc:7271

...
#19 0x00000000006dcf59 in dispatch_command (command=<value optimized out>, thd=0x7fd71c354000, packet=0x7fd72330f001 "insert into t(val) select 100 from tt", packet_length=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/percona-server-5.6.23-72.1/sql/sql_parse.cc:1434
...

I had not checked the code in time, so attempts to print something I see had not helped, but top lines of backtrace clearly show: we want to get auto_increment value for the row we are planning to write somewhere (to the table t). So, we set IS lock on table read from, we set S lock on the first row we tried to read from it via SELECT and then we try to insert the row into the target table t, and we need auto_increment value for it. Let's continue:

(gdb) c
Continuing.


Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_AUTO_INC, thr=0x7fd7233e3780) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:4426
4426            if (flags & BTR_NO_LOCKING_FLAG) {
(gdb) p table->name
$6 = 0x7fd7383feac0 "test/t"
(gdb) p thr
$7 = (que_thr_t *) 0x7fd7233e3780
(gdb) p *thr
$8 = {common = {type = 9, parent = 0x7fd7233e36b8, brother = 0x0, val = {data = 0x0, ext = 0, len = 0, type = {prtype = 0, mtype = 0, len = 0, mbminmaxlen = 0}}, val_buf_size = 0}, magic_n = 8476583, child = 0x7fd7233e3548, graph = 0x7fd7233e36b8, state = 1, is_active = 1, run_node = 0x7fd7233e3548, prev_node = 0x7fd7233e3548, resource = 0, lock_state = 0, slot = 0x0, thrs = {prev = 0x0, next = 0x0}, trx_thrs = {prev = 0x0, next = 0x0}, queue = {prev = 0x0, next = 0x0}, fk_cascade_depth = 0}

So,  we clearly see the attempt to set LOCK_AUTO_INC on the table t. This happens only after we read some row from the table tt and now inserting it into the target table, t.
 

(gdb) c
Continuing.

Breakpoint 1, lock_table (flags=0, table=0x7fd7233f69e8, mode=LOCK_IX, thr=0x7fd7233e3780) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:4426
4426            if (flags & BTR_NO_LOCKING_FLAG) {
(gdb) p table->name
$9 = 0x7fd7383feac0 "test/t"
(gdb) p *table
$10 = {id = 65, heap = 0x7fd71c3514a0, name = 0x7fd7383feac0 "test/t", dir_path_of_temp_table = 0x0, data_dir_path = 0x0, space = 51, flags = 1, flags2 = 80, ibd_file_missing = 0, cached = 1, to_be_dropped = 0, n_def = 5, n_cols = 5, can_be_evicted = 1, corrupted = 0, drop_aborted = 0, cols = 0x7fd71c371868, col_names = 0x7fd71c3718f0 "id", name_hash = 0x0, id_hash = 0x0, indexes = {count = 1, start = 0x7fd7233f7b68, end = 0x7fd7233f7b68}, foreign_set = std::set with 0 elements, referenced_set = std::set with 0 elements, table_LRU = {prev = 0x7fd74902cb68, next = 0x7fd7233f7de8}, fk_max_recusive_level = 0, n_foreign_key_checks_running = 0, def_trx_id = 84487, query_cache_inv_trx_id = 84497, big_rows = 0, stats_latch_created = 2, stats_latch = 0x7fd71c3efb80, stat_initialized = 1, memcached_sync_count = 0, stats_last_recalc = 1427781732, stat_persistent = 0, stats_auto_recalc = 0, stats_sample_pages = 0, stat_n_rows = 4, stat_clustered_index_size = 1, stat_sum_of_other_index_sizes = 0, stat_modified_counter = 0, stats_bg_flag = 0 '\000', autoinc_lock = 0x7fd71c3718a8, autoinc_mutex = {event = 0x7fd71c3ef600, lock_word = 0 '\000', waiters = 0, list = {prev = 0x7fd7233f7f50, next = 0x7fd71c370f28}, count_os_wait = 0, cmutex_name = 0xc3cb37 "&table->autoinc_mutex", pfs_psi = 0x0}, autoinc = 6, n_waiting_or_granted_auto_inc_locks = 1, autoinc_trx = 0x7fd71c370c68, fts = 0x0, quiesce = QUIESCE_NONE, n_rec_locks = 0, n_ref_count = 1, locks = {count = 1, start = 0x7fd71c3718a8, end = 0x7fd71c3718a8}, is_corrupt = 0}
(gdb) p table->autoinc_lock
$11 = (ib_lock_t *) 0x7fd71c3718a8
(gdb) p *(table->autoinc_lock)
$12 = {trx = 0x7fd71c370c68, trx_locks = {prev = 0x7fd71c373530, next = 0x0}, type_mode = 20, hash = 0x0, index = 0x0, un_member = {tab_lock = {table = 0x7fd7233f69e8, locks = {prev = 0x0, next = 0x0}}, rec_lock = {space = 140561986054632, page_no = 0, n_bits = 0}}}


 Now, we set IX lock on table T (no wonder, we need to inserts rows there). According to the comments in the code:

-- this is from storage/innobase/include/lock0priv.h 
static const byte lock_compatibility_matrix[5][5] = {
 /**         IS     IX       S     X       AI */
 /* IS */ {  TRUE,  TRUE,  TRUE,  FALSE,  TRUE},
 /* IX */ {  TRUE,  TRUE,  FALSE, FALSE,  TRUE},
 /* S  */ {  TRUE,  FALSE, TRUE,  FALSE,  FALSE},
 /* X  */ {  FALSE, FALSE, FALSE, FALSE,  FALSE},
 /* AI */ {  TRUE,  TRUE,  FALSE, FALSE,  FALSE}
};

...

/* STRONGER-OR-EQUAL RELATION (mode1=row, mode2=column)
 *    IS IX S  X  AI
 * IS +  -  -  -  -
 * IX +  +  -  -  -
 * S  +  -  +  -  -
 * X  +  +  +  +  +
 * AI -  -  -  -  +
 * See lock_mode_stronger_or_eq().
 */
static const byte lock_strength_matrix[5][5] = {
 /**         IS     IX       S     X       AI */
 /* IS */ {  TRUE,  FALSE, FALSE,  FALSE, FALSE},
 /* IX */ {  TRUE,  TRUE,  FALSE, FALSE,  FALSE},
 /* S  */ {  TRUE,  FALSE, TRUE,  FALSE,  FALSE},
 /* X  */ {  TRUE,  TRUE,  TRUE,  TRUE,   TRUE},
 /* AI */ {  FALSE, FALSE, FALSE, FALSE,  TRUE}
};


IX lock is compatible with AI (and AI is hold by this same transaction anyway), so let's assume we got it and continue:

(gdb) c
Continuing.

Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=3, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:2329
2329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {
(gdb) p index->table_name
$13 = 0x7fd7383feae8 "test/tt"
(gdb) p index->name
$14 = 0x7fd7233f8480 "PRIMARY"


Now we set S lock (mode=2) on the next row (heap_no=3) we read in SELECT. We continue reading...

(gdb) c
Continuing.

Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:1146
1146            if (trx == table->autoinc_trx) {
(gdb) c
Continuing.

So, request for auto_increment value again (we ask for the lock that we already have), to get it for the row we plan to insert. All clear, so we continued and got this:

[Switching to Thread 0x7fd716bf7700 (LWP 1718)]

Breakpoint 1, lock_table (flags=0, table=0x7fd74902cb68, mode=LOCK_IX, thr=0x7fd71c1db638) at /usr/src/debug/percona-serverinnodb_table-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:4426
4426            if (flags & BTR_NO_LOCKING_FLAG) {
(gdb) p table->name
$15 = 0x7fd730396c60 "mysql/innodb_table_stats"


So, other thread stepped in to set IX lock on innodb_table_stats table. This is a background update of table statistics and there will be many hits for our breakpoints because of this. I'll skip them all from now on, to keep concentrated on one main topic. For our tables in question we later had already well know sequences:

...
Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=5, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:2329
2329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {
(gdb) p index->table_name
$17 = 0x7fd7383feae8 "test/tt"
(gdb) c
Continuing.

Breakpoint 3, row_lock_table_autoinc_for_mysql (prebuilt=0x7fd7233e3068) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/row/row0mysql.cc:1146
1146            if (trx == table->autoinc_trx) {
(gdb) c
Continuing.

...

that is, requests for S lock on yet another row in the source table (tt) and then request for the auto_increment. It ends up like this (changes to persistent statistics tables skipped):

Breakpoint 2, lock_rec_lock (impl=0, mode=2, block=0x7fd7249ae700, heap_no=1, index=0x7fd7233f82e8, thr=0x7fd7233e29e8) at /usr/src/debug/percona-server-5.6.23-72.1/storage/innobase/lock/lock0lock.cc:2329
2329            switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {
(gdb) p index->table_name
$20 = 0x7fd7383feae8 "test/tt"
(gdb) p block
$21 = (const buf_block_t *) 0x7fd7249ae700
(gdb) p *block
$22 = {page = {space = 52, offset = 3, buf_fix_count = 1, io_fix = 0 '\000', state = 5 '\005', flush_type = 1, buf_pool_index = 0, zip = {data = 0x0, m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 0}, hash = 0x0, list = {prev = 0x7fd7249cf280, next = 0x0}, newest_modification = 129192199, oldest_modification = 0, LRU = {prev = 0x7fd7249ae580, next = 0x7fd7249ae880}, old = 0, freed_page_clock = 0, access_time = 1852500389, is_corrupt = 0}, frame = 0x7fd72c438000 "\333s\257\247", unzip_LRU = {prev = 0x0, next = 0x0}, mutex = {event = 0x7fd72400e900, lock_word = 0 '\000', waiters = 0, list = {prev = 0x7fd7249ae900, next = 0x7fd7249ae600}, count_os_wait = 0, cmutex_name = 0xc375cd "&block->mutex", pfs_psi = 0x0}, lock = {lock_word = 1048575, waiters = 0, recursive = 0, writer_thread = 140561776342784, event = 0x7fd72400e980, wait_ex_event = 0x7fd72400ea00, list = {prev = 0x7fd7249ae940, next = 0x7fd7249ae640}, pfs_psi = 0x0, count_os_wait = 0, lock_name = 0xc375db "&block->lock", last_s_file_name = 0xc281a0 "/mnt/workspace/percona-server-5.6-redhat-binary/label_exp/centos6-64/rpmbuild/BUILD/percona-server-5.6.23-72.1/storage/innobase/row/row0sel.cc", last_x_file_name = 0xc34650 "/mnt/workspace/percona-server-5.6-redhat-binary/label_exp/centos6-64/rpmbuild/BUILD/percona-server-5.6.23-72.1/storage/innobase/btr/btr0btr.cc", writer_is_wait_ex = 0, last_s_line = 3226, last_x_line = 741}, lock_hash_val = 39242, check_index_page_at_flush = 1, modify_clock = 1, n_hash_helps = 0, n_fields = 1, n_bytes = 0, left_side = 1, curr_n_fields = 0, curr_n_bytes = 0, curr_left_side = 0, index = 0x0}
(gdb) c
Continuing.



Note heap_no=1 above. This is a supremum record. I've also checked some details you can get for the block (page in the buffer pool) in the process. At this moment we've finally got rows inserted:

mysql> insert into t(val) select 100 from tt;
Query OK, 4 rows affected, 1 warning (31 min 9.51 sec)
Records: 4  Duplicates: 0  Warnings: 1
mysql> show warnings\G

*************************** 1. row ***************************
  Level: Note
   Code: 1592
Message: Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave.
1 row in set (34.72 sec)

As you can assume for the above (time to execute show warnings)  we hit some breakpoints as InnoDB continues to update persistent statistics for the table t. I've later executed the following:

mysql> show engine innodb status\G
...

------------
TRANSACTIONS
------------
Trx id counter 84515
Purge done for trx's n:o < 84514 undo n:o < 0 state: running
History list length 371
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 84514, ACTIVE 223 sec
2 lock struct(s), heap size 360, 2 row lock(s), undo log entries 2---TRANSACTION 84511, ACTIVE 2394 sec
3 lock struct(s), heap size 360, 5 row lock(s), undo log entries 4
MySQL thread id 1, OS thread handle 0x7fd74cf79700, query id 18 localhost root init
show engine innodb status
--------
...
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
2 RW transactions active inside InnoDB
0 RO transactions active inside InnoDB
2 out of 1000 descriptors used
Main thread process no. 1681, id 140561818302208, state: sleeping
Number of rows inserted 12, updated 0, deleted 0, read 12
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s


So, you see transaction that made 2 changes in the background thead (this is reated to persitent statistics table), and our transaction that made 4 changes (inserted 4 rows). You see 5 row locks (and surely will NOT see AUTO-INC table one no matter how you try, as statement is completed, but it was there). These are 5 S locks set on the tt table we read (4 rows read and supremum record). But where our X locks on rows inserted into the t table (transaction is still active)? Here they are:

mysql> select * from t;
+----+------+
| id | val  |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
|  5 |  100 |
|  6 |  100 |
|  7 |  100 |
|  8 |  100 |
+----+------+
8 rows in set (0.00 sec)


They exist, but they are implicit. You've probably noted that there was no call for X record lock mentioned, and they had NOT happened. Implicit locks do not have accociated lock object with them, they are present just because the trx_id for the transaction that changed (inserted, in this case) the row. Annamalai explained them well here.

Probably it's time to stop, as this post is already big enough. To summarize, the following locks are set (and in the following order) while executing INSERT INTO t ... SELECT FROM tt where there is an auto_increment column for which the value is generated in the destination table with default innodb_aunoinc_lock_mode=1 according to our tracing in gdb:
  1. IS lock on the source table (tt)
  2. S lock on the first row in tt we read with SELECT
  3. AUTO-INC lock on the destination table (t)
  4. IX lock on the destination table (t)
  5. Probably implicit X lock on the row inserted (we had not seen this in gdb clearly)
  6. S lock on the next row in tt we read with SELECT
  7. Probably implicit X lock on the row inserted (we had not seen this in gdb clearly)
  8. ...
  9. S lock on the supremum in tt we read with SELECT
I see three potential problems here.

There is a small chance to run fast concurrent INSERT (that uses AUTO-INC lock) into the destination table after we read the first row from the source table, but before we request AUTO-INC lock on the destination.

Also, try to find explanation for this case in the manual. It says nothing on when AUTO-INC lock is set.

The last but not the least, I do not see any explanation of implicit X locks in the manual. I had not try hard yet.

For these (and more) I'll report bugs. The story will be continued in the next blog post(s) in this series. As for that "day" when I plan ot tell you more about InnoDB locks, it's predefined. Please, visit this talk at Percona Live 2015. There Nilnanadan and me (let's hope) will tell you notably more based on our findings and experience.

3 comments:

  1. Hi Valeriy -

    Still reading this post, but, which "great post by Annamalai"?

    Thanks,

    Ben

    ReplyDelete
  2. Click on "this" in " this great post by Annamalai." in the text :)

    https://blogs.oracle.com/mysqlinnodb/entry/introduction_to_transaction_locks_in

    ReplyDelete
    Replies
    1. Thanks Valeriy - don't know how I missed that before!

      Delete