Saturday, January 8, 2022

Visualizing MySQL Plan Execution Time With Flame Graphs

In the previous post I've already shown one non-classical but useful application of flame graphs as a visualization of something interesting for MySQL DBAs (besides usual stack traces from profilers), time spent on waits or even statements execution by stages with related waits as reported by performance_schema. Today I am going to abuse the tools even more and try to show the real impact of each step in the query execution as reported by the EXPLAIN ANALYZE statement that is supported since MySQL 8.0.18.

 The idea to use flame graphs for SQL plans visualization was (to the best of my knowledge) first suggested by Tanel Poder in this article, in context of Oracle RDBMS. There are several implementations of the idea in a free software here and there, for example.

Unfortunately I had not found any similar posts for MySQL and tools mentioned above do not work with MySQL. It made me wondering why? 

For quite a some time we have a way to get several metrics for each execution step of the query plan in MySQL. Let's start with some stupid query (used to prove the concept and test possible steps to implement it):

openxs@ao756:~/dbs/8.0$ bin/mysql -uroot --socket=/tmp/mysql8.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 33
Server version: 8.0.27 Source distribution

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> explain analyze select user, host from mysql.user u1 where u1.user not in (select distinct user from mysql.user) order by host desc\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop antijoin  (cost=3.75 rows=25) (actual time=0.139..0.139 rows=0 loops=1)
    -> Covering index scan on u1 using PRIMARY (reverse)  (cost=0.75 rows=5) (actual time=0.058..0.064 rows=5 loops=1)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`)  (actual time=0.003..0.003 rows=1 loops=5)
        -> Materialize with deduplication  (cost=1.25..1.25 rows=5) (actual time=0.071..0.071 rows=4 loops=1)
            -> Filter: (mysql.`user`.`User` is not null)  (cost=0.75 rows=5) (actual time=0.022..0.035 rows=5 loops=1)
                -> Covering index scan on user using PRIMARY  (cost=0.75 rows=5) (actual time=0.020..0.031 rows=5 loops=1)

1 row in set (0,02 sec)

We see some hierarchical representation of query execution steps (it's Oracle's famous TREE format of EXPLAIN output, the only one supported for EXPLAIN ANALYZE). Each row represents some step, explain what the step was doing, quite verbosely, and then provides several useful metrics like cost, estimated number of rows and then actual time to return the first and all rows for this step etc. The actual time to return all rows at this step is what DBAs are usually interested in The steps form a tree, but the way it's represented is a bit unusual - we do not see JSON or any other structured format as some other RDBMSes provide. Instead each row has some level in the hierarchy represented by the number of spaces before the very informative "->" prompt. The first row with its additional "EXPLAIN: " prompts aside, each nesting level adds 4 spaces, as highlighted by the light green background above.

Maybe this unusual representation (not in the table with plan steps like in Oracle and not as JSON) prevented people from quickly implementing flame graphs-based query "profiler" for MySQL. But this definitely is not going to stop me from hacking something that may even work. I am not really a developer during recent 16+ years already, so I will not use Node.js or Python or anything fancy - just old Unix text processing tools like sed and awk, and surely the power of SQL (that small part I managed to master). 

So, with the output like the above saved in a file:

openxs@ao756:~/dbs/8.0$ cat /tmp/explain.txt
*************************** 1. row ***************************
EXPLAIN: -> Nested loop antijoin  (cost=3.75 rows=25) (actual time=0.115..0.115 rows=0 loops=1)
    -> Covering index scan on u1 using PRIMARY (reverse)  (cost=0.75 rows=5) (actual time=0.045..0.049 rows=5 loops=1)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`)  (actual time=0.002..0.002 rows=1 loops=5)
        -> Materialize with deduplication  (cost=1.25..1.25 rows=5) (actual time=0.063..0.063 rows=4 loops=1)
            -> Filter: (mysql.`user`.`User` is not null)  (cost=0.75 rows=5) (actual time=0.025..0.034 rows=5 loops=1)
                -> Covering index scan on user using PRIMARY  (cost=0.75 rows=5) (actual time=0.024..0.032 rows=5 loops=1)

openxs@ao756:~/dbs/8.0$

The first stage is simple and represented by the following command line I came up with quite fast:

openxs@ao756:~/dbs/8.0$  cat /tmp/explain.txt | awk 'NR > 1' | sed 's/EXPLAIN: //' | sed 's/(cost[^)][^)]*)//' | sed 's/(actual time=//' | sed 's/\(..*\)\.\...*/\1/' | sed 's/  \([^ ][^ ]*\)$/; \1/'
-> Nested loop antijoin ; 0.115
    -> Covering index scan on u1 using PRIMARY (reverse) ; 0.045
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`); 0.002
        -> Materialize with deduplication ; 0.063
            -> Filter: (mysql.`user`.`User` is not null) ; 0.025
                -> Covering index scan on user using PRIMARY ; 0.024

openxs@ao756:~/dbs/8.0$

At each step sequentially I removed the first line of the output, removed that stupid "EXPLAIN :" "prompt", removed cost and number of rows estimations and then extracted just the actual time to return all rows from the step (separated by ";" as flamegraph.pl wants) , and kept the step detailed description with the initial spaces.

The next stage of processing took me a lot of time and efforts, as I had to convert the output above into a different format that is suitable for loading into the database:

openxs@ao756:~/dbs/8.0$ cat /tmp/plan.txt
1;0;0; Nested loop antijoin ; 0.115
2;1;1; Covering index scan on u1 using PRIMARY (reverse) ; 0.045
3;1;1; Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`); 0.002
4;3;2; Materialize with deduplication ; 0.063
5;4;3; Filter: (mysql.`user`.`User` is not null) ; 0.025
6;5;4; Covering index scan on user using PRIMARY ; 0.024

The format is simple: semicolon separated row number (step in the plan), row number of the "parent" step in the hierarchy (with 0 used for the very first step, obviously), the level of the step in the hierarchy, just in case, that same detailed step description and the value of the metric for the step (time to return all rows in milliseconds). This file is then loaded into the following MySQL table:

mysql> desc plan;
+--------+---------------+------+-----+---------+-------+
| Field  | Type          | Null | Key | Default | Extra |
+--------+---------------+------+-----+---------+-------+
| seq    | int           | YES  |     | NULL    |       |
| parent | int           | YES  |     | NULL    |       |
| level  | int           | YES  |     | NULL    |       |
| step   | varchar(1024) | YES  |     | NULL    |       |
| val    | decimal(10,3) | YES  |     | NULL    |       |
+--------+---------------+------+-----+---------+-------+
5 rows in set (0,00 sec)

mysql> truncate plan;
Query OK, 0 rows affected (1,77 sec)

mysql> load data infile '/tmp/plan.txt' into table plan fields terminated by ';';
Query OK, 6 rows affected (0,13 sec)
Records: 7  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from plan;
+------+--------+-------+------------------------------------------------------------------------------------------+-------+
| seq  | parent | level | step                                                                                     | val   |
+------+--------+-------+------------------------------------------------------------------------------------------+-------+
|    1 |      0 |     0 |  Nested loop antijoin                                                                    | 0.115 |
|    2 |      1 |     1 |  Covering index scan on u1 using PRIMARY (reverse)                                       | 0.045 |
|    3 |      1 |     1 |  Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`) | 0.002 |
|    4 |      3 |     2 |  Materialize with deduplication                                                          | 0.063 |
|    5 |      4 |     3 |  Filter: (mysql.`user`.`User` is not null)                                               | 0.025 |
|    6 |      5 |     4 |  Covering index scan on user using PRIMARY                                               | 0.024 |
+------+--------+-------+------------------------------------------------------------------------------------------+-------+
6 rows in set (0,00 sec)

Now, how to end up with that format? I spent a lot of time with awk and created the following script:

awk '
function st_push(val) {
st_array[st_pos++] = val;
}

function st_pop() {
return (st_size() > 0) ? st_array[--st_pos] : "ERROR";
}

function st_size() {
return st_pos;
}

BEGIN { p[0] = 0; level[0] = 0; parent[0] = 0; }

NF > 0 {
  # print "> " NR ";" $1;
  split($0,a,"->");
  lvl = length(a[1])/4;

  level[NR] = lvl;
  parent[NR] = NR-1;

  if (level[NR] > level[NR-1]) {
    st_push(parent[NR]);
    p[level[NR]] = NR-1;
  }
  else if (level[NR] < level[NR-1]) {
    for (i=0; i<=(level[NR-1]-level[NR]); i++) {
      parent[NR] = st_pop();
      # print "poped " parent[NR];
    }
  }
  else {
    parent[NR]=parent[NR-1]
  }
  print NR ";" parent[NR] ";" lvl ";" a[2]
}'

that starts with 3 functions implementing a stack and then 3 associative arrays: p[] to store current parent row number for the specific nesting level, level[] to store hierarchy level of specific row and parent[] to store that number of row that is a parent of a given row in the plan.

To find the level of the row I split() the input row with at least one default filed separator ';' inside, using "->" as a field separator (assuming it is not used in the step description, ever) and then end up with initial spaces in a[1] and the rest (the step description and metrics value) in a[2]. I divide the length of a[1] by 4 to find the level. 

This was a simple part. Then I try to build a value of a parent row using a stack to save previous parent row number for the level. It took some testing with a more primitive input format to come up with "tree traversal" code for such a weird tree representation format that seem to work on few different nested structures tested (correct me if it fails for more complex plans). I am not a developer any more so the code above may be wrong.

So, with the plan in the table, where each line has a seq number of seq number of the parent line as parent, one needs a simple enough recursive CTE to sum up all possible paths and summarizing time spent in the path:

mysql> with recursive cte_plan as (select seq, parent, level, concat(seq, concat(' -', step)) as step, round(val*1000) as val from plan where level = 0 union all select p.seq, p.parent, p.level, concat(c.step, concat(';', concat(p.seq, concat(' -',p.step)))), round(p.val*1000) as val from plan p join cte_plan c on p.parent = c.seq) select step, val from cte_plan\G
*************************** 1. row ***************************
step: 1 - Nested loop antijoin
 val: 115
*************************** 2. row ***************************
step: 1 - Nested loop antijoin ;2 - Covering index scan on u1 using PRIMARY (reverse)
 val: 45
*************************** 3. row ***************************
step: 1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`)
 val: 2
*************************** 4. row ***************************
step: 1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`);4 - Materialize with deduplication
 val: 63
*************************** 5. row ***************************
step: 1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`);4 - Materialize with deduplication ;5 - Filter: (mysql.`user`.`User` is not null)
 val: 25
*************************** 6. row ***************************
step: 1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`);4 - Materialize with deduplication ;5 - Filter: (mysql.`user`.`User` is not null) ;6 - Covering index scan on user using PRIMARY
 val: 24
6 rows in set (0,01 sec)

Note that I've multiplied the metrics value by 1000 as flamegraph.pl script that expects metrics to be integer numbers. So they are actually in microseconds. I also added step number to each step, as actually we may have exact same step in different places in the hierarchy if wee read from the same table more than once in the query.

The final code to produce the input for flamegraph.pl is like this:

openxs@ao756:~/dbs/8.0$ ~/dbs/8.0/bin/mysql -uroot --socket=/tmp/mysql8.sock test -B -e"with recursive cte_plan as (select seq, parent, level, concat(seq, concat(' -', step)) as step, round(val*1000) as val from plan where level = 0 union all select p.seq, p.parent, p.level, concat(c.step, concat(';', concat(p.seq, concat(' -',p.step)))), round(p.val*1000) as val from plan p join cte_plan c on p.parent = c.seq) select step, val from cte_plan;" | awk ' NR > 1' > /tmp/processed_plan.txt
openxs@ao756:~/dbs/8.0$ cat /tmp/processed_plan.txt
1 - Nested loop antijoin        115
1 - Nested loop antijoin ;2 - Covering index scan on u1 using PRIMARY (reverse) 45
1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`)   2
1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`);4 - Materialize with deduplication       63
1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`);4 - Materialize with deduplication ;5 - Filter: (mysql.`user`.`User` is not null)         25
1 - Nested loop antijoin ;3 - Single-row index lookup on <subquery2> using <auto_distinct_key> (user=mysql.u1.`User`);4 - Materialize with deduplication ;5 - Filter: (mysql.`user`.`User` is not null) ;6 - Covering index scan on user using PRIMARY  24

This is how I create the final flame graph representing the plan steps):

openxs@ao756:~/dbs/8.0$ cat /tmp/processed_plan.txt | ~/git/FlameGraph/flamegraph.pl  --title "EXPLAIN Steps" --inverted --countname miliseconds > /tmp/explain2.svg

Surely these steps can be simplified, put into a script etc. I checked intermediate stages one by one, left some debugging details and so on. You can do better, but the final result is this nice icicle graph:

Time spent on each step of query execution represented as an icicle graph

where  I highlighted the impact of the "Covering index scan on u1 using PRIMARY (reverse)" step. As usual with flame graphs, in the original .svg file you can do search, zoom in an out etc. For complex queries it may be really useful.

As a DBA I'd ask for something better than these weird stages (even if scripted). I wish we had all execution steps in tables or view, like in Oracle's v$sql_plan and v$sql_plan_statistics_all, or at least in JSON format maybe (like EXPLAIN FORMAT=JSON  produces for the plan itself). One day, maybe (I am going to file a feature request for this later). Yet another day I plan to use the output of ANALYZE FORMAT=JSON in MariaDB to get similar information :)

No comments:

Post a Comment