--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
...
I tried to search the manual and Web in general and found no useful explanation (these verbose values should be self explanatory by design it seems). As question was asked, probably it's time to try to answer it...
These states for I/O threads are set by srv_set_io_thread_op_info() function. Quick search with grep on MySQL 5.5.29 source code tree gives us some hints, the rest we can try to find out via code review based on these hits:
[openxs@chief mysql-5.5]$ grep -rn srv_set_io_thread_op_info * 2>/dev/null
storage/innobase/srv/srv0srv.c:772:srv_set_io_thread_op_info(
storage/innobase/os/os0file.c:3494: srv_set_io_thread_op_info(i, "not started yet");
storage/innobase/os/os0file.c:4370: srv_set_io_thread_op_info(orig_seg, "wait Windows aio");
storage/innobase/os/os0file.c:4394: srv_set_io_thread_op_info(orig_seg,
storage/innobase/os/os0file.c:4681: srv_set_io_thread_op_info(global_seg,
storage/innobase/os/os0file.c:4691: srv_set_io_thread_op_info(global_seg,
storage/innobase/os/os0file.c:4787: srv_set_io_thread_op_info(global_segment,
storage/innobase/os/os0file.c:4805: srv_set_io_thread_op_info(global_segment,
storage/innobase/os/os0file.c:4946: srv_set_io_thread_op_info(global_segment, "consecutive i/o requests");
storage/innobase/os/os0file.c:4989: srv_set_io_thread_op_info(global_segment, "doing file i/o");
storage/innobase/os/os0file.c:5010: srv_set_io_thread_op_info(global_segment, "file i/o done");
storage/innobase/os/os0file.c:5062: srv_set_io_thread_op_info(global_segment, "resetting wait event");
storage/innobase/os/os0file.c:5072: srv_set_io_thread_op_info(global_segment, "waiting for i/o request");
storage/innobase/include/srv0srv.h:499:srv_set_io_thread_op_info(
storage/innobase/include/srv0srv.h:766:# define srv_set_io_thread_op_info(t,info) ((void) 0)
storage/innobase/fil/fil0fil.c:4581: srv_set_io_thread_op_info(segment, "native aio handle");
storage/innobase/fil/fil0fil.c:4593: srv_set_io_thread_op_info(segment, "simulated aio handle");
storage/innobase/fil/fil0fil.c:4605: srv_set_io_thread_op_info(segment, "complete io for fil node");
storage/innobase/fil/fil0fil.c:4622: srv_set_io_thread_op_info(segment, "complete io for buf page");
storage/innobase/fil/fil0fil.c:4625: srv_set_io_thread_op_info(segment, "complete io for log");
Those who are too lazy to pull source code can just use this beautiful online tool to review few files listed above and make their own conclusions.
Unfortunately code formatting is different for different calls, so we still miss some states above as strings. Let's review the files then. It all starts with os_aio_init() function on line of 3494 of os0file.c:
3493 for (i = 0; i < n_segments; i++) {
3494 srv_set_io_thread_op_info(i, "not started yet");
It is clear what's going on here, you hardly will be able to catch this status in the output as it is set even before I/O threads are actually started.
Our next stop is line 4370, in os_aio_windows_handle() that, obviously, works only on Windows:
4356 /* NOTE! We only access constant fields in os_aio_array. Therefore
4359 ut_ad(os_aio_validate_skip());
4360 ut_ad(segment < array->n_segments);
4362 n = array->n_slots / array->n_segments;
4366 os_aio_array_get_nth_slot(array, pos)->handle,
4370 srv_set_io_thread_op_info(orig_seg, "wait Windows aio");
4373 FALSE,
So, "wait" here means that we do wait for completion. Later in the same function we are checking result:
That's all states that may be set in os_aio_windows_handle(). Next stop is line 4681 in an infinite loop in the os_aio_linux_handle() function:
4648 /* Loop until we have found a completed request. */
4650 ibool any_reserved = FALSE;
4651 os_mutex_enter(array->mutex);
4655 if (!slot->reserved) {
4657 } else if (slot->io_already_done) {
4661 any_reserved = TRUE;
...
4678 /* Wait for some request. Note that we return
4681 srv_set_io_thread_op_info(global_seg,
So state "waiting for completed aio requests" means that there is no pending request and no completed request, so we have nothing to do. When at least some AIO request is completed., state will be "processing completed aio requests". All these for real AIO on Linux.
In case of simulated AIO we have os_aio_simulated_handle() function, where state is set on line 4787:
4783 restart:
By the way, we end up at that wait_for_io label when there is no I/O to do by simulated AIO, so "resetting wait event" state is set for a very short period.
4787 srv_set_io_thread_op_info(global_segment,
Later in the same function:
4794 /* Look through n slots after the segment * n'th slot */
4805 srv_set_io_thread_op_info(global_segment,
5061 wait_for_io:
4805 srv_set_io_thread_op_info(global_segment,
So, here we have stages (a) and (b), before and after sleep, and here is the recommended_sleep label in the same function (before we do sleep state "waiting for i/o request" is set):
5061 wait_for_io:
5062 srv_set_io_thread_op_info(global_segment, "resetting wait event");
5067 os_event_reset(os_aio_segment_wait_events[global_segment]);
5069 os_mutex_exit(array->mutex);
5071 recommended_sleep:
5072 srv_set_io_thread_op_info(global_segment, "waiting for i/o request");
5074 os_event_wait(os_aio_segment_wait_events[global_segment]);
5076 if (os_aio_print_debug) {
5080 (ulong) global_segment);
By the way, we end up at that wait_for_io label when there is no I/O to do by simulated AIO, so "resetting wait event" state is set for a very short period.
Let's continue with os_aio_simulated_handle() back from line 4946:
4946 srv_set_io_thread_op_info(global_segment, "consecutive i/o requests");
4946 srv_set_io_thread_op_info(global_segment, "consecutive i/o requests");
This state is set if we picked up some request and found there are consecutive requests that can be processed as an array. This state is set only while we are allocating the buffer, for real I/O there is another state, "doing file i/o":
4989 srv_set_io_thread_op_info(global_segment, "doing file i/o");
4989 srv_set_io_thread_op_info(global_segment, "doing file i/o");
4991 if (os_aio_print_debug) {
4995 (ulong) slot->type, (ulong) slot->offset_high,
5000 if (slot->type == OS_FILE_WRITE) {
5001 ret = os_file_write(slot->name, slot->file, combined_buf,
5002 slot->offset, slot->offset_high,
5005 ret = os_file_read(slot->file, combined_buf,
5006 slot->offset, slot->offset_high, total_len);
5009 ut_a(ret);
5010 srv_set_io_thread_op_info(global_segment, "file i/o done");
When sync I/O is completed, state "file i/o done" is set, as you can see above.After we've done I/O and before next sleep and restart we copy data back to original several buffers, set/reset everything properly and call to os_aio_simulated_handle() is completed:
5042 /* We return the messages for the first slot now, and if there were
5050 *message1 = slot->message1;
5051 *message2 = slot->message2;
5053 *type = slot->type;
5055 os_mutex_exit(array->mutex);
That's all, so it is either Windows native AIO (a couple of states only) or Linux native AIO (a couple of states only), or simulated AIO (and in this case many states are possible). Probably that's why nobody cared to explain details - in real production situations with native AIO there are few states to consider anyway.
Now, there are some lines to check in fil0fil.c also, fil_aio_wait() function:
4580 if (srv_use_native_aio) {
4581 srv_set_io_thread_op_info(segment, "native aio handle");
4589 ut_error;
4593 srv_set_io_thread_op_info(segment, "simulated aio handle");
4595 ret = os_aio_simulated_handle(segment, &fil_node,
So, here decision is made (at compile time) on what functions to call from three described above and, for some short time, state reflects type of AIO that is going to be used.
When whatever AIO function called completes, we have this code running:
4605 srv_set_io_thread_op_info(segment, "complete io for fil node");
4607 mutex_enter(&fil_system->mutex);
4611 mutex_exit(&fil_system->mutex);
4613 ut_ad(fil_validate_skip());
4621 if (fil_node->space->purpose == FIL_TABLESPACE) {
4622 srv_set_io_thread_op_info(segment, "complete io for buf page");
4623 buf_page_io_complete(message);
4625 srv_set_io_thread_op_info(segment, "complete io for log");
4626 log_io_complete(message);
Here state reflects type of I/O completed, data page or redo log.
That's all. I have to think what is the better way to summarize the states and possible transitions for each state. Maybe just a table, or something like a state machine represented somehow. I should think about this. So, to be continued...
It would be great to have a table in I_S about how much time is spend in each state.
ReplyDelete