Saturday, December 19, 2015

Tricking the Optimizer, or How Checking Bug Reports Help to Solve Real Problems

I've got several useful habits over the years of work in MySQL Support. One of them is to start working on every problem with search for known MySQL bugs related to the problem at hand. I'd like to share one recent case where this habit helped me to get a solution for customer almost instantly.

It was one of rare cases when customer opened a support request with a very clear question and even a test case. The problem was described very precisely, more or less as follows (with table and column names, and data changed for this blog post, surely).

Let's assume we have two tables created like these:

mysql> create table t1(id int auto_increment primary key, c1 varchar(2), c2 varchar(100));Query OK, 0 rows affected (0.27 sec)

mysql> create table t2(id int auto_increment primary key, t1_id int, ctime datetime, cvalue decimal(10,2), key(t1_id, ctime));
Query OK, 0 rows affected (0.15 sec)


So, we have a couple of tables with assumed (even if not formally declared) relation, of a "master-details" kind. We have few rows in the "master" table:

mysql>  insert into t1(c1,c2) values ('UA','Val'),('US','Tom'),('UK','Jerry');
Query OK, 3 rows affected (0.06 sec)
Records: 3  Duplicates: 0  Warnings: 0


and usually several (or very many) rows for each row from "master" in the "details" table:

mysql>  insert into t2(t1_id, ctime, cvalue) values(1,NOW(),10),(1,NOW(),20),(1,NOW(),30),(2,NOW(),10),(2,NOW(),50),(3,NOW(),100);
Query OK, 6 rows affected (0.07 sec)
Records: 6  Duplicates: 0  Warnings: 0


Just to double check, this is what we have to begin with in our tables:

mysql> select * from t1;
+----+------+-------+
| id | c1   | c2    |
+----+------+-------+
|  1 | UA   | Val   |
|  2 | US   | Tom   |
|  3 | UK   | Jerry |
+----+------+-------+
3 rows in set (0.01 sec)

mysql> select * from t2;
+----+-------+---------------------+--------+
| id | t1_id | ctime               | cvalue |
+----+-------+---------------------+--------+
|  1 |     1 | 2015-12-19 17:30:59 |  10.00 |
|  2 |     1 | 2015-12-19 17:30:59 |  20.00 |
|  3 |     1 | 2015-12-19 17:30:59 |  30.00 |
|  4 |     2 | 2015-12-19 17:30:59 |  10.00 |
|  5 |     2 | 2015-12-19 17:30:59 |  50.00 |
|  6 |     3 | 2015-12-19 17:30:59 | 100.00 |
+----+-------+---------------------+--------+
6 rows in set (0.00 sec)


Now, let's try to get many more rows to the "details" table (do not mind the warnings, I did that on the server with binary log enabled and binlog_format = STATEMENT):

mysql>  insert into t2(t1_id, ctime, cvalue) select t1_id, NOW(), cvalue from t2;
Query OK, 6 rows affected, 1 warning (0.04 sec)
Records: 6  Duplicates: 0  Warnings: 1


...

mysql>  insert into t2(t1_id, ctime, cvalue) select t1_id, NOW(), cvalue from t2;
Query OK, 196608 rows affected, 1 warning (25.63 sec)
Records: 196608  Duplicates: 0  Warnings: 1


Now, we have index on both (t1_id, ctime) columns in our t2 table, so queries like the following are executed in a fast and efficient way:

mysql> explain select * from t1 join t2 on t1.id = t2.t1_id where t1.id = 3 and ctime between '2015-12-19 17:30:59' and '2015-12-19 17:31:10'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: range
possible_keys: t1_id
          key: t1_id
      key_len: 11
          ref: NULL
         rows: 1
        Extra: Using index condition
2 rows in set (0.01 sec)


mysql> select sql_no_cache * from t1 join t2 on t1.id = t2.t1_id where t1.id = 3 and ctime between '2015-12-19 17:30:59' and '2015-12-19 17:31:10';
+----+------+-------+----+-------+---------------------+--------+
| id | c1   | c2    | id | t1_id | ctime               | cvalue |
+----+------+-------+----+-------+---------------------+--------+
|  3 | UK   | Jerry |  6 |     3 | 2015-12-19 17:30:59 | 100.00 |
+----+------+-------+----+-------+---------------------+--------+
1 row in set (0.00 sec)


Note range access and both columns used for the t1_id index of the t2 table. But what if we specify rows from the "master" (t1) table indirectly? Like this:

mysql> explain select * from t1 join t2 on t1.id = t2.t1_id where t1.c1 = 'UK' and ctime between '2015-12-19 17:30:59' and '2015-12-19 17:31:10'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: t1_id
          key: t1_id
      key_len: 5
          ref: test.t1.id
         rows: 1
        Extra: Using index condition
2 rows in set (0.00 sec)

mysql> select sql_no_cache * from t1 join t2 on t1.id = t2.t1_id where t1.c1 = 'UK' and ctime between '2015-12-19 17:30:59' and '2015-12-19 17:31:10';
+----+------+-------+----+-------+---------------------+--------+
| id | c1   | c2    | id | t1_id | ctime               | cvalue |
+----+------+-------+----+-------+---------------------+--------+
|  3 | UK   | Jerry |  6 |     3 | 2015-12-19 17:30:59 | 100.00 |
+----+------+-------+----+-------+---------------------+--------+
1 row in set (0.22 sec)


We see that all rows from the "master" table are checked (this is expected, and we have only 3 there anyway), but then we see "ref" access via t1_id index of the "details" (t2) table, and, what's more important, we get the result notably slower than before.

Now the question: Why we are only using the t1_id element of the compound key? This becomes an issue when the leading element of the key is not highly selective.

Another question: Is there any way we can prompt the optimizer to make better use of the index so that when it has evaluated the t1_id values it scans the t2 table using the full depth of the index?

So, this was the customer request I've got. Based on my good habit, I started to search for known optimizer bugs, using the following search string in Google:

site:bugs.mysql.com join range compound index

On the very first page of results I've got a link to the following bug (that was reported by my former colleague Justin Swanhart, probably for some other customer issue, and maybe we even discussed it in the past with him, but still, I just googled for the string above, without any specific details in mind):
  • Bug #70002 - "very low performance join - eq_ref and ref access broken for compound indexes"
The bug was declared a "Duplicate" of an older Bug #52030, "WHERE clause with != primary key check slows a query from 0.06 sec to 4 min", and their similarity was far from clear to me. That older bug was reported for 5.1.x and still is "Verified". But there was no need to understand the relation, as the last comment (by Jørgen Løland) in the bug I found stated:
"Thanks for the bug report. This looks like a duplicate of BUG#52030 but with a much less obscure reproducible. I'll link the two bugs together and request that the implementor verifies your test case as well when the issue has been fixed.

A short summary of what happens: MySQL can only do '[eq_]ref' access if the comparison operator is =. That happens to be the case for the first keypart. If MySQL is to make use of the BETWEEN predicate when looking up rows in the index, 'dynamic range' access has to be used. This is the kind that says "Range checked for each record (index 0x...)" in EXPLAIN. 'Dynamic range' is more costly to use than 'ref' because parts of the optimizer has to be invoked for every row in the referred-to table (cust_car in this case). Because of this higher cost, 'dynamic range' will not be chosen by MySQL if 'ref' access is possible. Admittedly, this heuristic sometimes fail like in this bug report."
This gives the answer to the first customer question: we are using only the first column of the compound index there is a heuristic inside optimizer that prefers ref access, and this can be considered a bug/known limitation of current MySQL's optimizer.

Moreover, I've immediately tried to trick optimizer to use dynamic range access path:

mysql> explain select * from t1 join t2 on (t1.id >= t2.t1_id and t1.id <= t2.t1_id) where t1.c1 = 'UK' and ctime between '2015-12-19 17:30:59' and '2015-12-19 17:31:10'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: t1_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 392850
        Extra: Range checked for each record (index map: 0x2)
2 rows in set (0.00 sec)

mysql> select * from t1 join t2 on (t1.id >= t2.t1_id and t1.id <= t2.t1_id) where t1.c1 = 'UK' and ctime between '2015-12-19 17:30:59' and '2015-12-19 17:31:10';
+----+------+-------+----+-------+---------------------+--------+
| id | c1   | c2    | id | t1_id | ctime               | cvalue |
+----+------+-------+----+-------+---------------------+--------+
|  3 | UK   | Jerry |  6 |     3 | 2015-12-19 17:30:59 | 100.00 |
+----+------+-------+----+-------+---------------------+--------+
1 row in set (0.01 sec)

and, as you can see, got both the desired plan (on MySQL 5.6.27) and fast execution.

So, I've immediately suggested to customer to use the trick of replacing the join condition with the one that prevents "ref" access, and this workaround helped to get fast execution time for the real case as well! This was the answer to the second question: replace '=' in WHERE clause with logically equal condition that bypasses the heuristics and let the entire index to be used for each row selected for the outer table.

Problem was solved for customer in 20 minutes or so after he had opened the support request! This is the power of good habit and the result of using the Community and Oracle engineer's work on maintaining public bugs database! We do not know if the bug will ever be fixed, but we are able to use the results of its discussion to solve problems.

Side note: the same trick applied to older version, without dynamic range optimization, does not give any real benefit. The resulting query on 5.5.x, for example, is notably slower. So, another good habit to mention is: check the suggested solution on the exact version customer used before recommending it!

No comments:

Post a Comment