Saturday, March 11, 2017

Fun with Bugs #49 - Applying PMP to MySQL

As you maybe noted, in several recent posts I've provided some additional details for slides used during my FOSDEM talk on profiling MySQL. The only part not covered yet is related to using Poor Man's Profiler (and pt-pmp version of it). I see no reason to explain what it does and how to use it once again, but would like to show several recent enough MySQL bug reports where this tool was essential to find, explain or demonstrate the problem.

Quick search for active bugs with "pt-pmp" in MySQL bugs database produces 8 hits at the moment:
  •  Bug #85304 - "Reduce mutex contention of fil_system->mutex". It was reported by Zhai Weixiang few days ago, and pt-pmp output was used as a starting point for the analysis that ended up with a patch suggested.
  • Bug #85191 - "performance regression with HANDLER READ syntax", from the same reporter. In this report pt-pmp was used to prove the point and show what exactly threads were doing.
  • Bug #80979 - "Doublewrite buffer event waits should be annotated for Performance Schema", by Laurynas Biveinis. One more case when PMP shows where the time is spent by threads in some specific case, while there is no instrumentation (yet) for the related wait in Perfomance Schema.
  • Bug #77827 - "reduce cpu time costs of creating dummy index while change buffer is enabled", again by Zhai Weixiang. In this bug report he had used both perf to show that some notable time was spent on the operation, and pt-pmp to show the related backtraces.
  • Bug #73803 - "don't acquire global_sid_lock->rdlock if gtid is disabled". Once again, Zhai Weixiang used pt-pmp output as a starting point for further code analysis.I wonder why this bug is still "Open", by the way...
  • Bug #70237 - "the mysqladmin shutdown hangs". Guess who reported it after applying PMP when something hanged. As I stated in all my 3 recent FOSDEM talks, this is exactly what you have to do before killing and restarting MySQL server in production - get backtraces of all threads, raw or at least aggregated with pt-pmp... I am not sure why the bug was not noted in time, there are even ideas of patches shared. Time for somebody to process it formally.
  • Bug #69812 - "the server stalls at function row_vers_build_for_consistent_read". Same reporter, same tool used, same result - the bug report is still "Open". Looks like I know what my next post(s) in this "Fun with Bugs" series will be devoted to...
  • Bug #62018 - "innodb adaptive hash index mutex contention". It was reported by Mark Callaghan and PMP outputs were used as a part of the evidence. The bug is "Verified" and even got a patch suggested for 5.7.x by Percona engineers, but still had not got any proper attention from Oracle. I may have some more results related to the "cost" and "benefits" of adaptive hash indexing to share soon, so stay tuned...
Surely, there are way more bugs where PMP was used. Let me share one more that I noted while working on my recent talk on profiling (bug had not found time to put it on slides and comment on):

  • Bug #78277 - "InnoDB deadlock, thread stuck on kernel calls from transparent page compression", by Mark Callaghan. Again, PMP outputs were provided to prove the point and show where threads are stuck. The bug is "Open".

For many performance related cases applying pt-pmp and sharing the results becomes a de facto community requirement, as you can see, for example, in Bug #84025. Note that Umesh Shastry, who verified the bug, provided pt-pmp outputs in hist testing results. I'd suggest to have gdb and pt-pmp installed and ready to use on any production system using any version and fork of MySQL. Even if your bug will be ignored by Oracle, these outputs are useful for other community members who may hit similar cases or is not lazy to check and work on the code to provide a patch.

1 comment:

  1. Not sure it is still relevant but
    fil_system->mutex contention limits block read rate with fast storage