Blog of (former?) MySQL Entomologist: More on Studying MySQL Hashes in gdb, and How P

I have to get back to the topic of checking user variables in gdb to clarify few more details. In his comment Shane Bester kindly noted that calling functions defined in MySQL code is not going to work when core dump is studied. So, I ended up with a need to check what does the my_hash_element() function I've used really do, to be ready to repeat that step by step manually. Surely I could skip that and use Python and Shane himself did, but structures of HASH type are widely used in MySQL, so I'd better know how to investigate them manually than blindly use existing code.

Quick search with grep for my_hash_element shows:

[root@centos mysql-server]# grep -rn my_hash_element *
include/hash.h:94:uchar *my_hash_element(HASH *hash, ulong idx);
mysys/hash.c:734:uchar *my_hash_element(HASH *hash, ulong idx)plugin/keyring/hash_to_buffer_serializer.cc:34:      if(store_key_in_buffer(reinterpret_cast<const IKey *>(my_hash_element(keys_hash, i)),
plugin/version_token/version_token.cc:135: while ((token_obj= (version_token_st *) my_hash_element(&version_tokens_hash, i)))
plugin/version_token/version_token.cc:879:    while ((token_obj= (version_token_st *) my_hash_element(&version_tokens_hash, i)))
sql/sql_base.cc:1051:    TABLE_SHARE *share= (TABLE_SHARE *)my_hash_element(&table_def_cache, idx);
sql/sql_base.cc:1262:        share= (TABLE_SHARE*) my_hash_element(&table_def_cache, idx);
sql/sql_udf.cc:277:    udf_func *udf=(udf_func*) my_hash_element(&udf_hash,idx);
...
sql/table_cache.cc:180:      (Table_cache_element*) my_hash_element(&m_cache, idx);
sql/rpl_gtid.h:2123:        Node *node= (Node *)my_hash_element(hash, i);
sql/rpl_gtid.h:2274:          node= (Node *)my_hash_element(hash, node_index);
sql/rpl_tblmap.cc:168:    entry *e= (entry *)my_hash_element(&m_table_ids, i);
sql/rpl_master.cc:238:    SLAVE_INFO* si = (SLAVE_INFO*) my_hash_element(&slave_list, i);
sql-common/client.c:3245:        LEX_STRING *attr= (LEX_STRING *) my_hash_element(attrs, idx);
storage/perfschema/table_uvar_by_thread.cc:76:    sql_uvar= reinterpret_cast<user_var_entry*> (my_hash_element(& thd->user_vars, index));storage/ndb/include/util/HashMap.hpp:155:    Entry* entry = (Entry*)my_hash_element(&m_hash, (ulong)i);
storage/ndb/include/util/HashMap.hpp:169:    Entry* entry = (Entry*)my_hash_element((HASH*)&m_hash, (ulong)i);
[root@centos mysql-server]#

That is, HASH structure is used everywhere in MySQL, from keyring to UDFs and table cache, to replication and NDB Cluster, with everything in between. If I can navigate to each HASH element and dump/print it, I can better understand a lot of code, if needed. If anyone cares, HASH is defined in a very simple way in include/hash.h:

typedef struct st_hash {
size_t key_offset,key_length; /* Length of key if const length */
size_t blength;
ulong records;
uint flags;
DYNAMIC_ARRAY array; /* Place for hash_keys */
my_hash_get_key get_key;
void (*free)(void *);
CHARSET_INFO *charset;
my_hash_function hash_function;
PSI_memory_key m_psi_key;
} HASH;

It relies on DYNAMIC_ARRAY to store keys.

The code of the my_hash_element function in mysys/hash.c is very simple:

uchar *my_hash_element(HASH *hash, ulong idx)
{
if (idx < hash->records)
return dynamic_element(&hash->array,idx,HASH_LINK*)->data;
return 0;
}

Quick search for dynamic_element shows that it's actually a macro:

[root@centos mysql-server]# grep -rn dynamic_element *
client/mysqldump.c:1608:    my_err= dynamic_element(&ignore_error, i, uint *);
extra/comp_err.c:471:      tmp= dynamic_element(&tmp_error->msg, i, struct message*);
extra/comp_err.c:692:    tmp= dynamic_element(&err->msg, i, struct message*);
extra/comp_err.c:803: first= dynamic_element(&err->msg, 0, struct message*);
include/my_sys.h:769:#define dynamic_element(array,array_index,type) \mysys/hash.c:126: HASH_LINK *data= dynamic_element(&hash->array, 0, HASH_LINK*);
...

that is defined in include/my_sys.h as follows:

#define dynamic_element(array,array_index,type) \
((type)((array)->buffer) +(array_index))

So, now it's clear what to do in gdb, having in mind what array do we use. Let me start the session, find a thread I am interested in and try to check elements one by one:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7fc3a037b700 (LWP 3061))]#0 0x00007fc3d2cb3383 in poll () from /lib64/libc.so.6
(gdb) p do_command::thd->m_thread_id
$1 = 5
(gdb) p do_command::thd->user_vars
$2 = {key_offset = 0, key_length = 0, blength = 4, records = 3, flags = 0,
array = {buffer = 0x7fc3ba2b3560 "\377\377\377\377", elements = 3,
    max_element = 16, alloc_increment = 32, size_of_element = 16,
    m_psi_key = 38},
get_key = 0xc63630 <get_var_key(user_var_entry*, size_t*, my_bool)>,
free = 0xc636c0 <free_user_var(user_var_entry*)>, charset = 0x1ded740,
hash_function = 0xeb6990 <cset_hash_sort_adapter>, m_psi_key = 38}
(gdb) set $uvars=&(do_command::thd->user_vars)
(gdb) p $uvars
$3 = (HASH *) 0x7fc3b9fad280
...
(gdb) p &($uvars->array)
$5 = (DYNAMIC_ARRAY *) 0x7fc3b9fad2a8
(gdb) p ((HASH_LINK*)((&($uvars->array))->buffer) + (0))
$6 = (HASH_LINK *) 0x7fc3ba2b3560
(gdb) p ((HASH_LINK*)((&($uvars->array))->buffer) + (0))->data
$7 = (uchar *) 0x7fc3b9fc80e0 "H\201\374\271\303\177"
(gdb) p (user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (0))->data)
$8 = (user_var_entry *) 0x7fc3b9fc80e0
(gdb) p *(user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (0))->data)
$9 = {static extra_size = 8, m_ptr = 0x7fc3b9fc8148 "bbb", m_length = 3,
m_type = STRING_RESULT, m_owner = 0x7fc3b9fad000, m_catalog = {
    str = 0x100000000 <Address 0x100000000 out of bounds>,
    length = 416611827727}, entry_name = {m_str = 0x7fc3b9fc8150 "a",
    m_length = 1}, collation = {collation = 0x1ded740,
    derivation = DERIVATION_IMPLICIT, repertoire = 3}, update_query_id = 25,
used_query_id = 25, unsigned_flag = false}
...
(gdb) p *(user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (2))->data)
$11 = {static extra_size = 8, m_ptr = 0x7fc3b9e6e220 "\002", m_length = 64,
m_type = DECIMAL_RESULT, m_owner = 0x7fc3b9fad000, m_catalog = {str = 0x0,
    length = 0}, entry_name = {m_str = 0x7fc3b9fc8290 "c", m_length = 1},
collation = {collation = 0x1ded740, derivation = DERIVATION_IMPLICIT,
    repertoire = 3}, update_query_id = 25, used_query_id = 25,
unsigned_flag = false}
(gdb)

I tried to highlight important details above. With gdb variables it's a matter of proper type casts and dereferencing. In general, I was printing content of item (user variable in this case) with index N as *(user_var_entry *)(((HASH_LINK*)((&($uvars->array))->buffer) + (N))->data).

Now, back to printing the variables. Let's see how this is done in performance_schema, in storage/perfschema/table_uvar_by_thread.cc:

     74   for (;;)
     75   {
     76     sql_uvar= reinterpret_cast<user_var_entry*> (my_hash_element(& thd->user_vars, index));
     77     if (sql_uvar == NULL)
     78       break;...
     98     /* Copy VARIABLE_NAME */
     99     const char *name= sql_uvar->entry_name.ptr();
    100     size_t name_length= sql_uvar->entry_name.length();
    101     DBUG_ASSERT(name_length <= sizeof(pfs_uvar.m_name));
    102     pfs_uvar.m_name.make_row(name, name_length);
    103
    104     /* Copy VARIABLE_VALUE */
    105     my_bool null_value;
    106     String *str_value;
    107     String str_buffer;
    108     uint decimals= 0;
    109     str_value= sql_uvar->val_str(& null_value, & str_buffer, decimals);    110     if (str_value != NULL)
    111     {
    112       pfs_uvar.m_value.make_row(str_value->ptr(), str_value->length());
    113     }
    114     else
    115     {
    116       pfs_uvar.m_value.make_row(NULL, 0);
    117     }
    118
    119     index++;    120   }

So, there we check elements by index until there is no element with such index, and apply the val_str() function of the class. While debugging live server we can do the same, but if we care to see how it works step by step, here is the code from sql/item_func.cc:

String *user_var_entry::val_str(my_bool *null_value, String *str,
                                uint decimals) const
{
if ((*null_value= (m_ptr == 0)))
    return (String*) 0;
switch (m_type) {
case REAL_RESULT:
    str->set_real(*(double*) m_ptr, decimals, collation.collation);    break;
case INT_RESULT:
    if (!unsigned_flag)
      str->set(*(longlong*) m_ptr, collation.collation);    else
      str->set(*(ulonglong*) m_ptr, collation.collation);    break;
case DECIMAL_RESULT:
    str_set_decimal((my_decimal *) m_ptr, str, collation.collation);
    break;
case STRING_RESULT:
    if (str->copy(m_ptr, m_length, collation.collation))
      str= 0;                                   // EOM error case ROW_RESULT:
    DBUG_ASSERT(1);                             // Impossible
    break;
}
return(str);
}

For INT_RESULT and REAL_RESULT it's all clear, and Shane did essentially the same in his Python code. For strings we have to copy proper items into a zero terminated string or use methods of String class if we debug on a live server to get the entire string data. For DECIMAL_RESULT I checked the implementation of str_set_decimal() that relies on decimal2string() eventually that... looks somewhat complicated (check yourself in strings/decimal.c). So, I'd better to just print my_decimal structure in gdb, for any practical purposes, instead of re-implementing this function in Python.

To summarize, HASH structure is widely used in MySQL and it is easy to dump any of these hashes in gdb, item by item, in the same way as, for example, Performance Schema in MySQL 5.7 does this of user variables. Getting string representation of my_decimal "manually" is complicated.

Blog of (former?) MySQL Entomologist

Saturday, August 12, 2017

More on Studying MySQL Hashes in gdb, and How P_S Code May Help

No comments:

Post a Comment