Sunday, November 10, 2019

Time in Performance Schema

I've seen questions like this:
"Is there a way to know when (date and time) the last statement captured in ... was actually ran?"
more than once in some discussions (and customer issues) related to Performance Schema. I've seen answers provided by my colleagues and me after some limited testing. I've also noticed statements that it may not be possible.

Indeed, examples of wall clock date and time in the output of queries from the performance_schema are rare (and usually come from tables in the information_schemasys.format_time() function converts time to a nice, human readable format, but it still remains relative - it is not a date and time when something recorded in performance_schema happened.

In this post I'd like to document the answer I've seen and have in mind (and steps to get it) here, to save time for readers and myself faced with similar questions in the future. I'll also show the problem with this answer that I've noticed after testing for more than few minutes.

Let's start with simple setup of testing environment. In my case it is good old MariaDB 10.3.7 running on this netbook under Windows. First, let's check if Performance Schema is enabled:
 MariaDB [test]> select version(), @@performance_schema;
+--------------------+----------------------+
| version()          | @@performance_schema |
+--------------------+----------------------+
| 10.3.7-MariaDB-log |                    1 |
+--------------------+----------------------+
1 row in set (0.233 sec)
Then let's enable recording of time for everything and enable all consumers:
MariaDB [test]> update performance_schema.setup_instruments set enabled='yes', timed='yes';
Query OK, 459 rows affected (0.440 sec)
Rows matched: 707  Changed: 459  Warnings: 0

MariaDB [test]> update performance_schema.setup_consumers set enabled='yes';
Query OK, 8 rows affected (0.027 sec)
Rows matched: 12  Changed: 8  Warnings: 0
Now we can expect recently executed statements to be recorded, like this:
MariaDB [test]> select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current\G*************************** 1. row ***************************
      now(): 2019-11-03 17:42:51
   event_id: 46
timer_start: 22468653162059216
  timer_end: 22468697203533224
   sql_text: select now(), event_id, timer_start, timer_end, sql_text from performance_schema.events_statements_current
1 row in set (0.045 sec)
Good, but how we can get a real time when the statement was executed (like now() reports)? We all know from the fine manual that timer_start and timer_end values are in "picoseconds". So we can easily convert them into seconds (or whatever units we prefer):
MariaDB [test]> select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current\G
*************************** 1. row ***************************
                    now(): 2019-11-03 17:54:02
                 event_id: 69
timer_start/1000000000000: 23138.8159
                 sql_text: select now(), event_id, timer_start/1000000000000, sql_text from performance_schema.events_statements_current
1 row in set (0.049 sec)
This value is related startup time one might assume, and we indeed can expect that timer in Performance Schema is initialized at some very early stage of startup. But how to get date and time of server startup with SQL statement?

This also seems to be easy, as we have a global status variable called Uptime measured in seconds. Depending on fork and version used we can get the value of Uptime either from the Performance Schema (in MySQL 5.7+) or from the Information Schema (in MariaDB and older MySQL versions). For example:
MariaDB [test]> select variable_value from information_schema.global_status where variable_name = 'Uptime';
+----------------+
| variable_value |
+----------------+
| 23801          |
+----------------+
1 row in set (0.006 sec)
So, server startup time is easy to get with a date_sub() function:
MariaDB [test]> select @start := date_sub(now(), interval (select variable_value from information_schema.global_status where variable_name = 'Uptime') second) as start;
+----------------------------+
| start                      |
+----------------------------+
| 2019-11-03 11:28:18.000000 |
+----------------------------+
1 row in set (0.007 sec)
In the error log of MariaDB server I see:
2019-11-03 11:28:18 0 [Note] mysqld (mysqld 10.3.7-MariaDB-log) starting as process 5636 ...
So, I am sure the result is correct. Now, if we use date_add() to add timer value converted to seconds, for example to the server startup time, we can get the desired answer, date and time when the statement recorded in performance_schema was really executed:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 657
                  ts: 2019-11-03 18:24:00.501654
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:05
timediff(now(), @ts): 00:00:04.498346
1 row in set (0.002 sec)
I was almost ready to publish this blog post a week ago, before paying more attention to the result (that used to be perfectly correct in earlier simple tests) and executing a variation of statement presented above. The problem I noticed is that when Uptime of the server is not just few minutes (as it often happens in quick test environments), but hours or days, timestamp that we get for a recent event from the performance_schema using the approach suggested may differ from current timestamp notably (we see 4.5+ seconds difference highlighted above). Moreover, this difference seem to fluctuate:
MariaDB [test]> select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current\G
*************************** 1. row ***************************
            event_id: 682
                  ts: 2019-11-03 18:24:01.877763
            sql_text: select event_id, @ts := date_add(@start, interval timer_start/1000000000000 second) as ts, sql_text, now(), timediff(now(), @ts) from performance_schema.events_statements_current
               now(): 2019-11-03 18:24:07
timediff(now(), @ts): 00:00:05.122237
1 row in set (0.002 sec)
and tend to grow with Uptime. This make the entire idea of converting timer_start and timer_end Performance Schema "counters" in "picoseconds" questionable and unreliable for the precise real timestamp matching and comparing with other timestamp information sources in production.
Same as with this photo of sunset at Brighton taken with my Nokia dumb phone back in June, I do not see a clear picture of time measurement in Performance Schema...

After spending some more time thinking about this I decided to involve MySQL team somehow and created the feature request, Bug #97558 - "Add function (like sys.format_time) to convert TIMER_START to timestamp", that ended up quickly "Verified" (so I have small hope that I had not missed anything really obvious - correct me if I am wrong). I'd be happy to see further comments there and, surely, the function I asked about implemented. But I feel there is some internal problem with this and some new feature at server side may be needed to take the "drift" of time in Performance Schema into account.

There is also a list of currently open questions that I may try to answer in followup posts:
  1. Is the problem of time drift I noticed a MariaDB 10.3.7-specific, or recent MySQL 5..x and 8.0.x are also affected?
  2. Is this difference in time growing monotonically with time or really fluctuating?
  3. When exactly Performance Schema time "counter" starts, where is it in the code?
  4. Are there any other, better or at least more precise and reliable ways to get timestamps of some specific events that happen during MySQL server work? I truly suspect that gdb and especially dynamic tracing on Linux with tools like bpftrace may give us more reliable results...

2 comments:

  1. If I understand correctly TSC are based on the CPU clock cycle. Modern CPU have dynamic clock cycle this is why you see the difference. This also means the time measured is not as exact as "picoseconds" would indicate.

    ReplyDelete
    Replies
    1. Even more reasons then to ask for some feature from server to match time counters to real time, once in a wahile, and/or calculate current difference. It would be really useful to have a way to find out from Performance Schema data only when exactly some long wait (or long running SQL statement) happened, to correlate with performance data collected from OS or other sources in MySQL server.

      Delete