Tuesday, March 20, 2018

Fun With Bugs #62 - On Bugs Related to JSON Support in MySQL

Comparing to "online ALTER" or FULLTEXT indexes in InnoDB, I am not that much exposed to using JSON in MySQL. OK, there is EXPLAIN ... FORMAT=JSON statement that is quite useful, optimizer trace in MySQL is also in JSON format, but otherwise I rarely have to even read this kind of data since my lame attempt to pass one introductory MongoDB training couse back in 2015 (when I mostly enjoyed mongo shell). Even less I care about storing JSON in MySQL.

But it seems community users do use it, and there is a book for them... So, out of pure curiosity last week I decided to check known bugs reported by MySQL Community users that are related to this feature (it has a separate category "MySQL Server: JSON" in public bugs database. This is a (short enough) list of my findings:
  • Bug #89543 - "stack-overflow on gis.geojson_functions / json.json_functions_innodb". This is a recent bug report from Laurynas Biveinis. Based on last comment, it may have something to do with specific GCC compiler version and exact build option (WITH_UBSAN=ON) used. But for me it's yet another indication that community (Percona's in this case) extra QA still matters for MySQL quality. 
  • Bug #88033 - "Generated INT column with value from JSON document". As Geert Vanderkelen says: "would be great if JSON null is NULL".
  • Bug #87440 - "use union index,the value of key_len different with used_key_parts’ value." This is actually a bug in optimizer (or EXPLAIN ... FORMAT=JSON), but as it is still "Open" and nobody cared to check it, it still stays in the wrong category.
  • Bug #85755 - "JSON containing null value is extracted as a string "null"". Similar to Geert's case above, Dave Pullin seems to need an easy way to get real NULL in SQL when the value is null in JSON. It is stated in comments that now everything works as designed, and it works exactly the same in MariaDB 10.2.13. Some users are not happy though and the bug remains still "Verified".
  • Bug #84082 - "JSON_OBJECT("k", dateTime) adds milliseconds to time". In MariaDB 10.2.13 the result is different:
    MariaDB [(none)]> SELECT version(), JSON_OBJECT("k", cast("2016-11-19 17:46:31" as datetime(0)));
    | version()       | JSON_OBJECT("k", cast("2016-11-19 17:46:31" as datetime(0))) |
    | 10.2.13-MariaDB | {"k": "2016-11-19 17:46:31"}                                 |
    1 row in set (0.00 sec)
  • Bug #83954 - "JSON handeling of DECIMAL values, JSON from JSON string". Take care when you cast values to JSON type.
  • Bug #81677 - "Allows to force JSON columns encoding into pure UTF-8". This report is still "Open" and probably everything works as designed.
  • Bug #79813 - "Boolean values are returned inconsistently with JSON_OBJECT". Take care while working with boolean properties.
  • Bug #79053 - "Second argument "one" to JSON_CONTAINS_PATH is a misnomer". Nice request from Roland Bouman to use "some" instead of "one" in this case. Probably too late to implement it...
There also several related feature requests, of them the following are still "Open":
  • Bug #84167 - "Managing deeply nested JSON documents with JSON_SET, JSON_MERGE". As a comment says, "It would by nice if MySQL came up with a solution for this in a more constructive way".
  • Bug #82311 - "Retrieve the last value of an array (JSON_POP)". I am not 100% sure what the user wants here, but even I can get last array element in MongoDB using $slice, for example:
    > db.text.find();
    { "_id" : ObjectId("5aafe41c6bd09c09625aa3d1"), "item" : "test" }
    { "_id" : ObjectId("5ab0e8c208d6fe081ebb4f40"), "item" : null }
    { "_id" : ObjectId("5ab0f0f6cc695c62b4b0a301"), "item" : [ { "val" : 10 }, { "val" : 11 } ] }
    > db.text.find({}, { item: { $slice: -1 }, _id: 0});
    { "item" : "test" }
    { "item" : null }
    { "item" : [ { "val" : 11 } ] }
  • Bug #80545 - "JSON field should be allow INDEX by specifying table_flag". It seems some 3rd party storage engine developers want to have a way to crate and use some nice indexes of their engines to get efficient index-based search in JSON documents.
  • Bug #80349 - "MySQL 5.7 JSON: improve documentation and possible improvements". Simon Mudd asked for more documentation on the way feature is designed (for those interested in the details of JSON data type implementation in modern MySQL versions there is a detailed enough public worklog, WL#8132) and asked some more questions. Morgan Tocker answered at least some of them, but this report still stays "Open" since that times...
It would be great for remaining few "Open" reports from community users to be properly processed, and a couple of real bugs identified fixed. There are also features to work on, and in the meantime workarounds and best practices should be shared. Maybe there are enough of them for the entire book.

That's almost all I have to say about JSON in MySQL at the moment.

Sunday, March 11, 2018

Checking User Threads With gdb in MySQL 5.7+

In one of my gdb-related posts last year I noted that there is no more simple global list of user threads in MySQL 5.7+:
"I had highlighted Global_THD_manager singleton also as during my next gdb sessions I had found out that simple global list of threads is also gone and in 5.7 everything is done via that Global_THD_manager. This is a topic for some other post, though."
In that post and many times later when I had to deal with MySQL 5.7+ I just checked OS threads one by one in gdb using thread  1 ... thread N commands. This is not efficient at best, as I also hit numerous background threads that I often do not care about. So, a couple of weeks ago I finally decided to get back to this topic and find out how to check just user threads one by one in recent MySQL versions. I had a nice hint by Shane Bester on how to get information about $i-th thread (that he shared in one of his comments to my Facebook post):
set $value = (THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data + (sizeof(THD**) * $i))
I've attached gdb to an instance of Percona Server 5.7.x that I had running in my CentOS 6.9 VM and tried few commands to check types and content of the Global_THD_manager elements:
(gdb) p Global_THD_manager::thd_manager
$1 = (Global_THD_manager *) 0x7fab087fd000
(gdb) p Global_THD_manager::thd_manager->thd_list
$2 = {m_size = 2, m_capacity = 500, m_buff = {{
      data = "\000\060b\344\252\177\000\000\000\220i\344\252\177\000\000\000\200x\344\252\177", '\000' <repeats 3977 times>, align = {<No data fields>}}},
  m_array_ptr = 0x7fab087fd010, m_psi_key = 0}

So, we see that internally there is some array of elements thd_list with m_size items (2 in my case) probably stored in some pre-allocated buffer of m_capacity (500) elements, stored in m_buff.data. The type of elements is not clear, but we can try Shane's hint and assume that they are of type THD**. Let's try to check what we see there after type castings:
(gdb) p (THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data)
$4 = (THD **) 0x7fab087fd010
(gdb) p  *(THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data)
$5 = (THD *) 0x7faae4623000
(gdb) p  **(THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data)
$6 = {<MDL_context_owner> = {
    _vptr.MDL_context_owner = 0x1c51f50}, <Query_arena> = {
So, we get reasonable addresses and when we dereference the resulting THD** pointer twice we indeed get a structure that looks like THD of MySQL 5.7+ (it's very different, say, in MariaDB 10.1.x), with reasonable content (that is huge and skipped above).

I've tried to get processlist id of thread based on findings of that post using intermediate gdb variables:

(gdb) set $ppthd = (THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data)
(gdb) p *($ppthd)
$7 = (THD *) 0x7faae4623000

(gdb) set $pthd = *($ppthd)
(gdb) p $pthd->m_thread_id
$10 = 5
and then directly, using offsets and checking for security contexts of threads:
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data)).m_main_security_ctx.m_user
$14 = {m_ptr = 0x7faae463b060 "myuser", m_length = 6, m_charset = 0x1d21760,
  m_alloced_length = 8, m_is_alloced = true}
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data + (sizeof(THD**)))).m_main_security_ctx.m_user
$15 = {m_ptr = 0x7faae46b1090 "root", m_length = 4, m_charset = 0x1d21760,
  m_alloced_length = 8, m_is_alloced = true}
(gdb) p  (**(THD**)(Global_THD_manager::thd_manager->thd_list.m_buff.data + (sizeof(THD**)))).m_thread_id
$16 = 9
to confirm that I correctly get user names and thread ids for both 2 user threads I had in that "list". As usual Shane Bester was right!

Now, if you want to get more details about Global_THD_manager, you can just check the sql/mysqld_thd_manager.h file. I was interested mostly in the following:
  int get_num_thread_running() const { return num_thread_running; }
  uint get_thd_count() const { return global_thd_count; }

  static Global_THD_manager *thd_manager;

  // Array of current THDs. Protected by LOCK_thd_list.
  typedef Prealloced_array<THD*, 500, true> THD_array;
  THD_array thd_list;

  // Array of thread ID in current use. Protected by LOCK_thread_ids.
  typedef Prealloced_array<my_thread_id, 1000, true> Thread_id_array;
  Thread_id_array thread_ids;
First of all, how consistent it is to use both int and uint data types for values that are always >=0... The fact that our thd_list elements is actually some template-based container, Prealloced_array, it also interesting, as it would be useful to find out how it is implemented. We can find all relevant details in the include/prealloced_array.h file. I'd like to highlight the following here:
"The interface is chosen to be similar to std::vector."

  size_t         m_size;
  size_t         m_capacity;
  // This buffer must be properly aligned.
  my_aligned_storage<Prealloc * sizeof(Element_type), MY_ALIGNOF(double)>m_buff;
Element_type *m_array_ptr;
To summarize, MySQL 5.7+ uses more C++ now, with templates, singletons, iterators and more, but still Oracle prefers to implement their own container types instead of using some standard ones. One of these generic types, Prealloced_array, is widely used and is easy to deal with in gdb, as long as you know the element type.

Sunday, March 4, 2018

On InnoDB's FULLTEXT Indexes

I had recently written about InnoDB features that I try to avoid by all means if not hate: "online" DDL and persistent optimizer statistics. Time to add one more to the list - FULLTEXT indexes.

This feature had a lot of problems when initially introduced in MySQL 5.6. There was a nice series of blog posts about the initial experience with it by my colleague from Percona (at that times) : part I, part II, and part III. Many of the problems mentioned there were resolved or properly documented since that times, but even more were discovered. So, InnoDB FULLTEXT indexes may be used, with care, when MyISAM or other engines/means to add fulltext search is not an option. The list of bugs that are still important and must be taken into account is presented below.

What forced me to get back to this feature recently and hate it sincerely is one customer issue that led to this bug report: MDEV-14773  - "ALTER TABLE ... MODIFY COLUMN ... hangs for InnoDB table with FULLTEXT index". Note that I have to refer to MariaDB bug report here, as related upstream Bug #88844 is hidden from community (probably considered a shame, if not a security problem)! The bug is simple: if one applies any ALTER to the InnoDB table with FULLTEXT index, even not related that index and columns in in any way, chances are high that this ALTER may cause a kind of hang/infinite loop/conflict of the thread that tries to drop temporary table used by ALTER, as one of last steps, and FTS background optimize thread. Similar to other two problematic features, new background threads were introduced and their cooperation with other threads in InnoDB seems to be not that well designed/implemented.

There are many other bugs to take into account if you ever plan to add any single FULLTEXT index to your InnoDB table. Here is the list of the most important ones, mostly still "Verified" or open and ignored, that I collected during one of calm night shifts this week:
  • Bug #78048 - "INNODB Full text Case sensitive not working". This bug was fixed only recently, in MySQL 5.6.39, 5.7.21, and 8.0.4.
  • Bug #83776 - "InnoDB FULLTEXT search returns incorrect result for operators on ignored words". Still "Verified" on all GA versions and 8.0.x.
  • Bug #76210 - "InnoDB FULLTEXT index returns wrong results for key/value pair documents". This bug was reported by Justin Swanhart 3 years ago, quickly verified and then seems to be ignored.
  • Bug #86036 - "InnoDB FULLTEXT index has too strict innodb_ft_result_cache_limit max limit". I reported this bug 10 months ago, and it was immediately "Verified". It seems FULLTEXT indexes are hardly useful in general for large InnoDB tables because of this limitation.
  • Bug #78977 - "Enable InnoDB fulltext index to use generated FTS_DOC_ID column". This is a feature request (still "Open") to get rid of this well known limitation/specific column.
  • Bug #86460 - "Deleted DOCID are not maintained during OPTIMIZE of InnoDB FULLTEXT tables". If you want to get rid of deleted DOC_IDs in the INNODB_FT_DELETED, better just run ALTER TABLE ... ENGINE=InnoDB.
  • Bug #75763 - "InnoDB FULLTEXT index reduces insert performance by up to 6x on JSON docs". yet another verified bug report by Justin Swanhart.
  • Bug #69762 - "InnoDB fulltext match against in boolean mode misses results on join". Let me quote last comment there:
    "Since innodb doesn't support fulltext search on columns without fulltext index, and it is very complicated to support search on columns in multiple fulltext indexes in optimizer, it won't be fixed.

    We admit it's a point innodb fulltext is not compatible with myisam."
  • Bug #85880 - "Fulltext query is too slow when each ngram token match a lot of documents". This bug is still "Open".
  • Bug #78485 - "Fulltext search with char * produces a syntax error with InnoDB". Yet another verified regression comparing to MyISAM FULLTEXT indexes. Nobody cares for 2.5 years.
  • Bug #80432 - "No results in fulltext search for top level domain in domain part of email ". It ended up as "Won't fix", but at least a workaround was provided by Oracle developer.
  • Bug #81819 - "ALTER TABLE...LOCK=NONE is not allowed when FULLTEXT INDEX exists". Online ALTER just does not work for tables with FULLTEXT indexes. This is a serious limitation.
  • Bug #72132 - "Auxiliary tables for InnoDB FTS indexes are always created in shared tablespace". This my bug report was fixed in .5.6.20+ and 5.7.5+, but the fact that this regression was not noted for a long time internally says a lot about the way the feature was developed and maintained.
  • Bug #83560  - "InnoDB FTS - output from mysqldump extremely slow and blocks unrelated inserts". I have yet to check the metadata locks set when the table with FULLTEXT index is used in various SQL statements, but from this "Verified" report it is clear that just lading a dump of a table with FULLTEXT indexes may work too slow for any large table.
  • Bug #71551 - "ft_boolean_syntax has no impact on InnoDB FTS". yet another inconsistency with MyISAM FULLTEXT indexes that was reported 4 years ago and "Verified", but still ignored after that.
  • Bug #83741 - "InnoDB: Failing assertion: lock->magic_n == 22643". Surely, debug assertions can be ignored, but in most cases they are in the code for a good reason. This failure was reported by Roel Van de Paar from Percona.
  • Bug #83397 - "INSERT INTO ... SELECT FROM ... fails if source has > 65535 rows on FTS". This "Verified" bug alone, reported by Daniël van Eeden, makes InnoDB FULLTEXT indexes hardly usable in production for large tables.
  • Bug #80296 - "FTS query exceeds result cache limit". The bug is "Closed" silently (by the bug reporter maybe, Monty Solomon?), but users report that recent enough versions like 5.6.35 and 5.7.17 are still affected. See also Bug #82971 (no fix for MySQL 5.6.x for sure).
  • Bug #85876 - "Fulltext search can not find word which contains "," or ".".  Still "Verified" for 1 months.
  • Bug #68987 - "MySQL crash with InnoDB assertion failure in file pars0pars.cc". Crash was reported in MySQL 5.6.10, not repeatable. Then (different?) assertion failure was reported in debug builds only in MySQL 5.6.21+, and verified. Not sure what's going on with this bug report...
  • Bug #83398 - "Slow and unexpected explain output on FTS". The fact that EXPLAIN may be slow when the table with FULLTEXT index is involved is now documented, so this report by Daniël van Eeden is closed.
  • Bug #81930 - "incorrect result with InnoDB FTS and subquery". This bug report about wrong results by Sergei Golubchik from MariaDB was immediately "Verified", but ignored since that time.
  • Bug #80347 - "mysqldump backup restore fails due to invalid FTS_DOC_ID (Error 182 and 1030)". There is a workaround based on mydumper/myloader at least...
To summarize, InnoDB FULLTEXT indexes is one of the most problematic InnoDB features for any production use because:
  • There are all kinds of serious bugs, from wrong results to hangs, debug assertions and crashes, that do not seem to get any internal priority and stay "Verified" for years.
  • There are performance regressions and missing features comparing to MyISAM FULLTEXT indexes, so migration may cause problems.
  • InnoDB FULLTEXT indexes are not designed to work with really large tables/result sets.
  • You should expect problems during routine DBA activities, like ALTERing tables or dumps and restores when any table with InnoDB FULLTEXT index is involved. 
If you still plan/have to use it, please, make sure to use the latest MySQL version, check the list above carefully and test/check the results of fulltext searches and routine DBA operations like altering the table. You may get a lot of surprises. Consider alternatives like Sphinx seriously.