Sunday, February 17, 2013

When kill flag is checked for SELECT? Part II

In the previous part I've stopped at the moment when we entered JOIN:exec() - most checks for kill flag happen somewhere there, during query execution. We know the list of functions that checks this flag during query execution:

sub_select_cache()
evaluate_join_record()
flush_cached_records()
end_write()
end_update()
end_unique_update()
end_write_group()
remove_dup_with_compare()
remove_dup_with_hash_index()


but we do not know when exactly each of them is called. So, let me try to show what happens inside JOIN::exec (some code paths and checks are not considered for simplicity, we care about SELECT, but not EXPLAIN SELECT etc). I've included statements that change thread status and highlighted parts of code with the same thread status with different background colors and, as usual, functions that eventually may check kill flag are highlighted with bold:

JOIN::exec()
  thd_proc_info(thd, "executing");
  get_schema_tables_result()
  /* Create a tmp table if distinct or if the sort is too complicated */
  if (need_tmp)
  {
    thd_proc_info(thd, "Copying to tmp table");
    do_select()
    change_to_use_tpm_fileds()
    change_refs_to_tmp_fields()
    JOIN::make_simple_join()
    create_tmp_table()
    thd_proc_info(thd, "Creating sort index");
    create_sort_index()
    thd_proc_info(thd, "Copying to group table");
    JOIN::make_sum_func_list()
    setup_sum_funcs()
    do_select()
    end_read_record()
    change_to_use_tpm_fileds()
    if (curr_join->select_distinct && ! curr_join->group_list)
    {
      thd_proc_info(thd, "Removing duplicates");
      remove_duplicates()
    }
    calc_group_buffer()
    count_filed_types()
  } 
  /* let's ignore case of procedure entirely */
  /* simplification, few ifs ignored */
  make_group_fileds()
  init_items_ref_array()
  setup_copy_fields()
  JOIN::set_itmes_ref_array()
  JOIN::make_sum_func_list()
  prepare_sum_aggregators()
  setup_sum_funcs()
  if (curr_join->group_list || curr_join->order)
  {
     thd_proc_info(thd, "Sorting result");
     make_cond_for_table()
     create_sort_index()
  }
  send_result_set_metadata()
  thd_proc_info(thd, "Sending data");
  do_select()

As you can see, do_select() function may be called in different places and more than once if temporary table is used. Let's check what this function does:

do_select()
  /* Set up select_end */
  end_select=setup_end_select_func()
  if (join->tables)
  {
    join->join_tab[join->tables-1].next_select= end_select;
    join_tab=join->join_tab+join->const_tables;
  }
  ...
  sub_select()
  ...
  join->result->send_eof()

 So, basically do_select() determines  how to do next select step and then calls sub_select():
 
sub_select()
  if (end_of_records)
     return (*join_tab->next_select)(join,join_tab+1,end_of_records);
  ...
  rc= evaluate_join_record(join, join_tab, error);
  while (rc == NESTED_LOOP_OK)
  {
    error= info->read_record(info);
    rc= evaluate_join_record(join, join_tab, error);
  }
  if (rc == NESTED_LOOP_NO_MORE_ROWS &&
      join_tab->last_inner && !join_tab->found)
    rc= evaluate_null_complemented_join_record(join, join_tab);

Here we can call one of functions for the next select step or call evaluate_join_record() in a loop. evaluate_join_record() is one of the functions that checks kill flag before doing read work.

Most of other functions that check kill flag are called when we are done with nested loop join and already found a dataset for GROUP BY and ORDER BY processing (in a temporary table). This is how end_select function is determined:

/*
Rows produced by a join sweep may end up in a temporary table or be
sent to a client. Setup the function of the nested loop join algorithm
which handles final fully constructed and matched records.
*/

11382 Next_select_func setup_end_select_func(JOIN *join)
11383 {
11384  TABLE *table= join->tmp_table;
11385  TMP_TABLE_PARAM *tmp_tbl= &join->tmp_table_param;
11386  Next_select_func end_select;
11387
11388  /* Set up select_end */
11389  if (table)
11390  {
11391    if (table->group && tmp_tbl->sum_func_count &&
11392    !tmp_tbl->precomputed_group_by)
11393    {
11394      if (table->s->keys)
11395      {
11396        DBUG_PRINT("info",("Using end_update"));
11397        end_select=end_update;
11398      }
11399      else
11400      {
11401        DBUG_PRINT("info",("Using end_unique_update"));
11402        end_select=end_unique_update;
11403      }
11404    }
11405    else if (join->sort_and_group && !tmp_tbl->precomputed_group_by)
11406    {
11407      DBUG_PRINT("info",("Using end_write_group"));
11408      end_select=end_write_group;
11409    }
11410    else
11411    {
11412      DBUG_PRINT("info",("Using end_write"));
11413      end_select=end_write;
11414      if (tmp_tbl->precomputed_group_by)
11415      {
11416        /*
11417        A preceding call to create_tmp_table in the case when loose
11418        index scan is used guarantees that
11419        TMP_TABLE_PARAM::items_to_copy has enough space for the group
11420        by functions. It is OK here to use memcpy since we copy
11421        Item_sum pointers into an array of Item pointers.
11422        */
11423        memcpy(tmp_tbl->items_to_copy + tmp_tbl->func_count,
11424          join->sum_funcs,
11425          sizeof(Item*)*tmp_tbl->sum_func_count);
11426        tmp_tbl->items_to_copy[tmp_tbl->func_count+tmp_tbl->sum_func_count]= 0;
11427      }
11428    }
11429  }
11430  else
11431  {
11432    /*
11433    Choose method for presenting result to user. Use end_send_group
11434    if the query requires grouping (has a GROUP BY clause and/or one or
11435    more aggregate functions). Use end_send if the query should not
11436    be grouped.
11437    */
11438    if ((join->sort_and_group ||
11439      (join->procedure && join->procedure->flags & PROC_GROUP)) &&
11440      !tmp_tbl->precomputed_group_by)
11441      end_select= end_send_group;
11442    else
11443      end_select= end_send;
11444  }
11445  return end_select;
11446 } 

So, one of the functions that check for kill flag is used whenever we need to use temporary table to process GROUP BY and/or ORDER BY:

end_write()
end_update()
end_unique_update()
end_write_group()

Only 3 functions remain. Where sub_select_cache() is used (by the way, it calls sub_select() also, so may lead to more checks of kill flag in the process)?  It is used whenever you see "Using join buffer" in the EXPLAIN results for the query. This is determined at the optimization stage:

JOIN::optimize()
  make_join_readinfo()
...
 6868  for (i=join->const_tables ; i < join->tables ; i++)
 6869  {
...
 6875    tab->next_select=sub_select; /* normal select */
 6876
 6877    /*
 6878    Determine if the set is already ordered for ORDER BY, so it can
 6879    disable join cache because it will change the ordering of the results.
 6880    Code handles sort table that is at any location (not only first after
 6881    the const tables) despite the fact that it's currently prohibited.
 6882    We must disable join cache if the first non-const table alone is
 6883    ordered. If there is a temp table the ordering is done as a last
 6884    operation and doesn't prevent join cache usage.
 6885    */
...
 6896    switch (tab->type) {
...
 6915    case JT_ALL:
 6916    /*
 6917    If previous table use cache
 6918    If the incoming data set is already sorted don't use cache.
 6919    */
 6920    if (i != join->const_tables && !(options & SELECT_NO_JOIN_CACHE) &&
 6921    tab->use_quick != 2 && !tab->first_inner && !ordered_set)
 6922    {
 6923      if ((options & SELECT_DESCRIBE) ||
 6924        !join_init_cache(join->thd,join->join_tab+join->const_tables,
 6925          i-join->const_tables))
 6926      {
 6927        tab[-1].next_select=sub_select_cache; /* Patch previous */
 6928      }
 6929    }
...

Read the manual for more details of join buffer usage. Basically it is used to reduce number of reads of inner tables in joins that are scanned entirely (see JT_ALL above).

Remaining two functions that check kill flag during query execution are called in remove_duplicates():

14327 static int
14328 remove_duplicates(JOIN *join, TABLE *entry,List<Item> &fields, Item *having)
14329 {
...
14360  entry->file->info(HA_STATUS_VARIABLE);
14361  if (entry->s->db_type() == heap_hton ||
14362      (!entry->s->blob_fields &&
14363       ((ALIGN_SIZE(reclength) + HASH_OVERHEAD) * entry->file->stats.records <
14364    thd->variables.sortbuff_size)))
14365    error=remove_dup_with_hash_index(join->thd, entry,
14366                     field_count, first_field,
14367                     reclength, having);
14368  else
14369    error=remove_dup_with_compare(join->thd, entry, first_field, offset,
14370                  having);
14371 
14372  free_blobs(first_field);
14373  DBUG_RETURN(error);
14374 }

The remove_duplicates() itself may be called in JOIN::exec() and it was highlighted in the beginning of this post.

Now the picture is almost complete, we know when kill flag is checked during query execution also. What I still miss is some nice text (or simple pseudo code that fits into one screen) for the manual to replace that oversimplified statement that we had started with. I have to think about this and present in the final Part III, so... to be continued.

Saturday, February 16, 2013

When kill flag is checked for SELECT? Part I

Manual describes this briefly:

In SELECT, ORDER BY and GROUP BY loops, the flag is checked after reading a block of rows. If the kill flag is set, the statement is aborted.

Complete, correct and useful answer is more complex though. Here is correct answer, but not very useful. So, kill flag is checked in the following functions related to SELECT statement processing:

make_join_statistics()
best_extension_by_limited_search()
find_best()
sub_select_cache()
evaluate_join_record()
flush_cached_records()
end_write()
end_update()
end_unique_update()
end_write_group()
remove_dup_with_compare()
remove_dup_with_hash_index() 


Continue reading if you are interested in the process of getting correct answer and would like to know how to make it also complete and useful eventually.

Let's just use grep to search for thd->killed usage in source code (of version 5.5.30) that implements SELECT and then check functions with these lines:

[openxs@chief mysql-5.5]$ grep -n 'thd->kill' sql/sql_select.cc
3136:  DBUG_RETURN(join->thd->killed || get_best_combination(join));
5463:  if (thd->killed)  // Abort
5602:  if (thd->killed)
11604:  if (join->thd->killed)          // If aborted by user
11807:  if (join->thd->killed)                  // Aborted by user
12052:    if (join->thd->killed)
12915:  if (join->thd->killed)                  // Aborted by user
12968:  if (join->thd->killed)                  // Aborted by user
13048:  if (join->thd->killed)                  // Aborted by user
13095:  if (join->thd->killed)
14394:    if (thd->killed)
14523:    if (thd->killed)


Line 3136 is in the make_join_statistics() function, that is, flag is checked in the process of query optimization also:

3122  /* Find an optimal join order of the non-constant tables. */
3123  if (join->const_tables != join->tables)
3124  {
3125  optimize_keyuse(join, keyuse_array);
3126  if (choose_plan(join, all_table_map & ~join->const_table_map))
3127  goto error;
3128  }
3129  else
3130  {
3131  memcpy((uchar*) join->best_positions,(uchar*) join->positions,
3132  sizeof(POSITION)*join->const_tables);
3133  join->best_read=1.0;
3134  }
3135  /* Generate an execution plan from the found optimal join order. */
3136  DBUG_RETURN(join->thd->killed || get_best_combination(join));
  
Line 5463 is in the best_extension_by_limited_search() function, so again flag is checked in the process of query optimization:

5451 static bool
5452 best_extension_by_limited_search(JOIN *join,
5453  table_map remaining_tables,
5454  uint idx,
5455  double record_count,
5456  double read_time,
5457  uint search_depth,
5458  uint prune_level)
5459 {
5460  DBUG_ENTER("best_extension_by_limited_search");
5461 
5462  THD *thd= join->thd;
5463  if (thd->killed) // Abort
5464  DBUG_RETURN(TRUE);
5465 
5466  DBUG_EXECUTE("opt", print_plan(join, idx, read_time, record_count, idx,
5467  "SOFAR:"););
5468 
5469  /*
5470  'join' is a partial plan with lower cost than the best plan so far,
5471  so continue expanding it further with the tables in 'remaining_tables'.
5472  */
5473  JOIN_TAB *s;
 
Line 5602 is in the find_best() function:
 
5596 static bool
5597 find_best(JOIN *join,table_map rest_tables,uint idx,double record_count,
5598  double read_time)
5599 {
5600  DBUG_ENTER("find_best");
5601  THD *thd= join->thd;
5602  if (thd->killed)
5603  DBUG_RETURN(TRUE);
5604  if (!rest_tables)
5605  {
5606  DBUG_PRINT("best",("read_time: %g record_count: %g",read_time,
5607  record_count));
5608 
5609  read_time+=record_count/(double) TIME_FOR_COMPARE;
...
 
So, again it is related to the process of finding optimal join order at the query optimization stage. All these cases are not mentioned in the manual.

Line 11604 is finally related to query execution (reading rows). It is in the sub_select_cache() function:

11592 enum_nested_loop_state
11593 sub_select_cache(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
11594 {
11596 
11597  if (end_of_records)
11598  {
11599  rc= flush_cached_records(join,join_tab,FALSE);
11600  if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS)
11601  rc= sub_select(join,join_tab,end_of_records);
11602  return rc;
11603  }
11604  if (join->thd->killed) // If aborted by user
11605  {
11606  join->thd->send_kill_message();
11607  return NESTED_LOOP_KILLED; /* purecov: inspected */
11608  }
...

Line 11807 is at the beginning of the evaluate_join_record() function:

11794 static enum_nested_loop_state
11795 evaluate_join_record(JOIN *join, JOIN_TAB *join_tab,
11796  int error)
11797 {
11798  bool not_used_in_distinct=join_tab->not_used_in_distinct;
11799  ha_rows found_records=join->found_records;
11800  COND *select_cond= join_tab->select_cond;
11801  bool select_cond_result= TRUE;
11802 
11803  if (error > 0 || (join->thd->is_error())) // Fatal error
11804  return NESTED_LOOP_ERROR;
11805  if (error < 0)
11806  return NESTED_LOOP_NO_MORE_ROWS;
11807  if (join->thd->killed) // Aborted by user
11808  {
11809  join->thd->send_kill_message();
11810  return NESTED_LOOP_KILLED; /* purecov: inspected */
11811  }
 
Line 12052 is in the records reading loop (finally) in the flush_cached_records() function (that you see called above on line 11599 in sub_select_cache()):

12036  /* read through all records */
12037  if ((error=join_init_read_record(join_tab)))
12038  {
12039  reset_cache_write(&join_tab->cache);
12040  return error < 0 ? NESTED_LOOP_NO_MORE_ROWS: NESTED_LOOP_ERROR;
12041  }
12042 
12043  for (JOIN_TAB *tmp=join->join_tab; tmp != join_tab ; tmp++)
12044  {
12045  tmp->status=tmp->table->status;
12046  tmp->table->status=0;
12047  }
12048 
12049  info= &join_tab->read_record;
12050  do
12051  {
12052  if (join->thd->killed)
12053  {
12054  join->thd->send_kill_message();
12055  return NESTED_LOOP_KILLED; // Aborted by user /* purecov: inspected */
12056  }
...
12093  } while (!(error=info->read_record(info)));

Line 12915 is in the end_write() function:

12908 static enum_nested_loop_state
12909 end_write(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),
12910  bool end_of_records)
12911 {
12912  TABLE *table=join->tmp_table;
12913  DBUG_ENTER("end_write");
12914 
12915  if (join->thd->killed) // Aborted by user
12916  {
12917  join->thd->send_kill_message();
12918  DBUG_RETURN(NESTED_LOOP_KILLED); /* purecov: inspected */
12919  }
 
Line 12968 is in the end_update():  
 
12957 static enum_nested_loop_state
12958 end_update(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),
12959  bool end_of_records)
12960 {
12961  TABLE *table=join->tmp_table;
12962  ORDER *group;
12963  int error;
12964  DBUG_ENTER("end_update");
12965 
12966  if (end_of_records)
12968  if (join->thd->killed) // Aborted by user
12969  {
12970  join->thd->send_kill_message();
12971  DBUG_RETURN(NESTED_LOOP_KILLED); /* purecov: inspected */
12972  }
 
Line 13048 is in the end_unique_update() (it's getting boring, isn't it, a lot of similar looking code is similar named functions):

13038 static enum_nested_loop_state
13039 end_unique_update(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),
13040  bool end_of_records)
13041 {
13042  TABLE *table=join->tmp_table;
13043  int error;
13044  DBUG_ENTER("end_unique_update");
13045 
13046  if (end_of_records)
13048  if (join->thd->killed) // Aborted by user
13049  {
13050  join->thd->send_kill_message();
13051  DBUG_RETURN(NESTED_LOOP_KILLED); /* purecov: inspected */
13052  }

Line 13095 is from something similar also, end_write_group() function (note that comment is placed differently though):

13087 static enum_nested_loop_state
13088 end_write_group(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),
13089  bool end_of_records)
13090 {
13091  TABLE *table=join->tmp_table;
13092  int idx= -1;
13093  DBUG_ENTER("end_write_group");
13094 
13095  if (join->thd->killed)
13096  { // Aborted by user
13097  join->thd->send_kill_message();
13098  DBUG_RETURN(NESTED_LOOP_KILLED); /* purecov: inspected */
13099  }

We are almost done. Line 14394 is finally closely related to something manual described, "GROUP BY loop" in the remove_dup_with_compare() function:

14377 static int remove_dup_with_compare(THD *thd, TABLE *table, Field **first_field,
14378  ulong offset, Item *having)
14379 {
14380  handler *file=table->file;
14381  char *org_record,*new_record;
14382  uchar *record;
14383  int error;
14384  ulong reclength= table->s->reclength-offset;
14385  DBUG_ENTER("remove_dup_with_compare");
14386 
14387  org_record=(char*) (record=table->record[0])+offset;
14388  new_record=(char*) table->record[1]+offset;
14389 
14390  file->ha_rnd_init(1);
14391  error=file->rnd_next(record);
14392  for (;;)
14393  {
14394  if (thd->killed)
14395  {
14396  thd->send_kill_message();
14397  error=0;
14398  goto err;
14399  }
...
14446  file->position(record); // Remember position
14447  }
14448  }
14449  if (!found)
14450  break; // End of file
14451  /* Restart search on next row */
14452  error=file->restart_rnd_next(record,file->ref);
14453  }
14454 
14455  file->extra(HA_EXTRA_NO_CACHE);
14456  DBUG_RETURN(0);
14457 err:
14458  file->extra(HA_EXTRA_NO_CACHE);
14459  if (error)
14460  file->print_error(error,MYF(0));
14461  DBUG_RETURN(1);
14462 }
 
Finally, line 14523 is in the loop in the remove_dup_with_hash_index() function:

14518  file->ha_rnd_init(1);
14519  key_pos=key_buffer;
14520  for (;;)
14521  {
14522  uchar *org_key_pos;
14523  if (thd->killed)
14524  {
14525  thd->send_kill_message();
14526  error=0;
14527  goto err;
14528  }
...
 
That's all great, but how these functions are related to each other? MySQL Internals manual will help us to get the general picture of how SELECT is processed:
 
handle_select()
   mysql_select()
     JOIN::prepare()
       setup_fields()
     JOIN::optimize()            /* optimizer is from here ... */
       optimize_cond()
       opt_sum_query()
       make_join_statistics()
         get_quick_record_count()
         choose_plan()
           /* Find the best way to access tables */
           /* as specified by the user.          */
           optimize_straight_join()
             best_access_path()
           /* Find a (sub-)optimal plan among all or subset */
           /* of all possible query plans where the user    */
           /* controls the exhaustiveness of the search.   */
           greedy_search()
             best_extension_by_limited_search()
               best_access_path()
           /* Perform an exhaustive search for an optimal plan */
           find_best()
       make_join_select()        /* ... to here */
     JOIN::exec()
 
In the diagram above I've highlighted optimizer part with different background and functions where kill flag is checked with bold.
 
It would be nice to see/describe JOIN::exec() in a way similar to above. I plan to do this in the second part of this post. To be continued...