So, I decided to check for any ways to use bpftrace as a quick profiler, maybe "embed" it into pt-pmp one day and, surely, get what MariaDB developers ask me most often while working on performance problems or hangs - stack traces. This blog post describes my "achievements" so far on this way.
I had the same version of bpftrace coming from Fedora RPM (that had not updated in the meantime):
[openxs@fc29 server]$ rpm -q -a | grep bpftraceSo, after checking this fine reference guide quickly, I came up with a simple one-liner:
bpftrace-0.9-0.fc29.x86_64
[openxs@fc29 ~]$ sudo bpftrace -e 'profile:hz:99 { @[ustack] = count(); }'Ups! I tried to do what error message suggested:
Error creating map: '@'
Error creating stack id map
This might have happened because kernel.perf_event_max_stackis smaller than 127. Try to tweak this value with sysctl kernel.perf_event_max_stack=<new value>
Error creating perf event map (-1)
Attaching 1 probe...
Segmentation fault
[openxs@fc29 ~]$ sudo sysctl -w kernel.perf_event_max_stack=1000But it had not helped. This is how I hit my first problem with bpftrace. Fedora had not cared to update it when it stopped matching "something" in a new kernel, for whatever reason. This is actually a real problem for eBPF programs - what they assume may not be "there" in a kernel, so they have to be checked carefully. There is a separate "BPF Type Format (BTF)" for encoding the debug info related to BPF program (or map) that may help to find out if the program is "compatible" without trying it. In my case incompatibility was obvious and I needed some fast way to proceed with bpftrace, in a hope to come up with some useful examples for my talk.
kernel.perf_event_max_stack = 1000
So, i decided to build new version of bpftrace from recent GitHub source (against bcc tools I had installed and supposedly up to date). This ended up with errors no matter what I tried:
[openxs@fc29 build]$ makeQuick search led me to the idea that it's actually a know and recently reported problem, see Issue #1014. As a solution I decided to rebuild bcc tools from GitHub source as well (the alternative to to change bpftrace source) and removed Fedora packages. To make a long story short, this is what I did eventually (from fc -l output):
[ 3%] Built target parser
[ 4%] Built target arch
[ 5%] Building CXX object src/ast/CMakeFiles/ast.dir/irbuilderbpf.cpp.o
/mnt/home/openxs/git/bpftrace/src/ast/irbuilderbpf.cpp: In member function 'void bpftrace::ast::IRBuilderBPF::CreateSignal(llvm::Value*)':
/mnt/home/openxs/git/bpftrace/src/ast/irbuilderbpf.cpp:669:16: error: 'BPF_FUNC_send_signal' was not declared in this scope
getInt64(BPF_FUNC_send_signal),
^~~~~~~~~~~~~~~~~~~~
...
1051 rm -rf bcc/Basically I did it all as documented in related projects for my version of Fedora. Step highlighted (1056 above) is essential though, as it was missed here in any obvious way as a mandatory (while it is in recent versions). Step 1076 was successful:
1052 rm -rf bpftrace/
1053 sudo dnf remove bcc
1054 git clone https://github.com/iovisor/bcc.git
1055 cd bcc/
1056 git submodule update --init --recursive
1057 sudo dnf install -y bison cmake ethtool flex git iperf libstdc++-static python-netaddr python-pip gcc gcc-c++ make zlib-devel elfutils-libelf-devel
1058 sudo dnf install -y luajit luajit-devel
1059 sudo dnf install -y http://repo.iovisor.org/yum/extra/mageia/cauldron/x86_64/netperf-2.7.0-1.mga6.x86_64.rpm
1060 sudo pip install pyroute2
1061 sudo dnf install -y clang clang-devel llvm llvm-devel llvm-static ncurses-devel
1062 mkdir build
1063 cd build/
1064 cmake .. -DCMAKE_INSTALL_PREFIX=/usr
1065 make
1066 make test
1067 sudo make install
1068 cd ../..
1069 git clone https://github.com/iovisor/bpftrace.git
1070 cd bpftrace/
1071 git submodule update --init --recursive
1072 mkdir build
1073 cd build/
1074 cmake -DCMAKE_BUILD_TYPE=Release ../
1075 make
1076 ./tests/bpftrace_test
1077 fc -l
...while make test shown some failures (same as make test for bcc tools), but form previous experience this had not looked like a big deal. I continued basic testing that the resulting bpftrace now works (I tried to add kernel probe, guess what software runs on this workstation):
[----------] Global test environment tear-down
[==========] 350 tests from 9 test cases ran. (2590 ms total)
[ PASSED ] 350 tests.
[openxs@fc29 build]$ make testRunning tests...
Test project /mnt/home/openxs/git/bpftrace/build
Start 1: bpftrace_test
1/3 Test #1: bpftrace_test .................... Passed 2.42 sec
Start 2: runtime_test
2/3 Test #2: runtime_test .....................***Failed 21.10 sec
Start 3: tools-parsing-test
3/3 Test #3: tools-parsing-test ...............***Failed 5.88 sec
33% tests passed, 2 tests failed out of 3
Total Test time (real) = 29.41 sec
The following tests FAILED:
2 - runtime_test (Failed)
3 - tools-parsing-test (Failed)
Errors while running CTest
make: *** [Makefile:128: test] Error 8
[openxs@fc29 build]$ sudo ./src/bpftrace -e 'kprobe:do_nanosleep { printf("sleep by %s\n", comm); }'So, with some confidence built up, I've installed my new and very recent bpftrace tool I built myself:
Attaching 1 probe...
sleep by percona-qan-age
sleep by percona-qan-age
sleep by skypeforlinux
sleep by mysqld
...
^C
[openxs@fc29 build]$ sudo make installQuick test involved starting MariaDB 10.3 with thread pool enabled (more on "why thread pool?" in some next blog post):
[ 3%] Built target parser
...
-- Installing: /usr/local/bin/bpftrace
...
-- Installing: /usr/local/man/man8/xfsdist.8.gz
[openxs@fc29 build]$ which bpftrace
/usr/local/bin/bpftrace
[openxs@fc29 build]$ cd ..
[openxs@fc29 bpftrace]$ git log -1
commit afafbf561522dd33fa316be3e33375bc662399ac (HEAD -> master, origin/master, origin/HEAD)
Author: bas smit <bas@baslab.org>
Date: Fri Jan 17 12:33:43 2020 +0100
Add `--info` flag
This prints the supported kernel features
[openxs@fc29 maria10.3]$ ./bin/mysqld_safe --no-defaults --thread_handling=pool-of-threads --thread_pool_idle_timeout=10 &and running sysbench:
[1] 32643
[openxs@fc29 maria10.3]$ sysbench /usr/local/share/sysbench/oltp_update_index.lua --mysql-host=127.0.0.1 --mysql-user=root --mysql-port=3306 --threads=10 --tables=4 --table-size=1000000 --time=600 --report-interval=5 runand adding the probe like this in another shell:
sysbench 1.1.0-174f3aa (using bundled LuaJIT 2.1.0-beta2)
Running the test with following options:
Number of threads: 10
Report intermediate results every 5 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[ 5s ] thds: 10 tps: 54.37 qps: 54.37 (r/w/o: 0.00/54.37/0.00) lat (ms,95%): 467.30 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 10 tps: 40.60 qps: 40.60 (r/w/o: 0.00/40.60/0.00) lat (ms,95%): 530.08 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 10 tps: 40.00 qps: 40.00 (r/w/o: 0.00/40.00/0.00) lat (ms,95%): 520.62 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 10 tps: 38.00 qps: 38.00 (r/w/o: 0.00/38.00/0.00) lat (ms,95%): 623.33 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 10 tps: 34.00 qps: 34.00 (r/w/o: 0.00/34.00/0.00) lat (ms,95%): 746.32 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 10 tps: 36.40 qps: 36.40 (r/w/o: 0.00/36.40/0.00) lat (ms,95%): 549.52 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 10 tps: 37.80 qps: 37.80 (r/w/o: 0.00/37.80/0.00) lat (ms,95%): 601.29 err/s: 0.00 reconn/s: 0.00
...
[openxs@fc29 ~]$ sudo bpftrace -e 'profile:hz:99 { @[ustack] = count(); }' > /tmp/bpf.stackWhen I stopped collecting with Ctrl-C, bpftrace printed the content of associative arrays collected (with unique user space stack 9for all processes!) as index and count of time this stack was as a value). No segmentation fault, unlike with the version from package! In the resulting file we can see (among other things):
^C
[openxs@fc29 bpftrace]$ cat /tmp/bpf.stack | moreSo, this is the default format to output stacks (one line per call, no address, frame number or anything, symbols properly resolved for my MariaDB built from GitHub source etc). But I had something like this mind eventually (from pt-pmp):
Attaching 1 probe...
...
@[
rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, bool, unsigned long, mem_block_info_t**)+31
page_cur_search_with_match_bytes(buf_block_t const*, dict_index_t const*, dtuple_t const*, page_cur_mode_t, unsigned long*, unsigned long*, unsigned long*, unsigned long*, page_cur_t*)+1773
btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long)+8316
btr_pcur_open_low(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_pcur_t*, char const*, unsigned int, unsigned long, mtr_t*) [clone .constprop.33]+146
row_search_on_row_ref(btr_pcur_t*, unsigned long, dict_table_t const*, dtuple_t const*, mtr_t*)+85
row_purge_poss_sec(purge_node_t*, dict_index_t*, dtuple_t const*, btr_pcur_t*, mtr_t*, bool)+503
row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, dtuple_t const*)+1293
row_purge_record_func(purge_node_t*, unsigned char*, que_thr_t const*, bool)+2271
row_purge_step(que_thr_t*)+1057
que_run_threads(que_thr_t*)+2280
srv_worker_thread+956
start_thread+234
]: 1
...
@[
rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, bool, unsigned long, mem_block_info_t**)+31
btr_search_build_page_hash_index(dict_index_t*, buf_block_t*, rw_lock_t*, unsigned long, unsigned long, unsigned long)+2781
btr_search_info_update_slow(btr_search_t*, btr_cur_t*)+622
btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long)+10130
row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+4823
ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+338
handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+280
handler::read_range_first(st_key_range const*, st_key_range const*, bool, bool)+98
handler::multi_range_read_next(void**)+183
Mrr_simple_index_reader::get_next(void**)+74
DsMrr_impl::dsmrr_next(void**)+66
QUICK_RANGE_SELECT::get_next()+39
rr_quick(READ_RECORD*)+25
mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*)+3560
mysql_execute_command(THD*)+8446
Prepared_statement::execute(String*, bool)+979
Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*)+154
mysql_stmt_execute_common(THD*, unsigned long, unsigned char*, unsigned char*, unsigned long, bool, bool)+349
mysqld_stmt_execute(THD*, char*, unsigned int)+37
dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool)+5123
do_command(THD*)+368
tp_callback(TP_connection*)+314
worker_main(void*)+160
start_thread+234
]: 1
...
@[
trx_purge(unsigned long, bool)+1054
srv_purge_coordinator_thread+1872
start_thread+234
]: 2
@[
ut_crc32_sw(unsigned char const*, unsigned long)+1611
buf_calc_page_crc32(unsigned char const*)+61
buf_flush_init_for_writing(buf_block_t const*, unsigned char*, void*, unsigned long)+872
buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool)+149
buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool)+607
buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*)+1627
buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long)+392
pc_flush_slot()+2617
buf_flush_page_cleaner_coordinator.cold.151+1235
start_thread+234
]: 2
...
@[]: 6474
[openxs@fc29 bpftrace]$ pt-pmpThat is, number of times collapsed stack was noted and then the stack itself, without function arguments if possible. This kind of stack is generated from gdb outputs that looks as follows:
Tue Jan 21 13:52:12 EET 2020
...
10 syscall(libc.so.6),__io_getevents_0_4(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:1932),LinuxAIOHandler::poll(os0file.cc:2093),os_aio_linux_handler(stl_vector.h:805),os_aio_handler(stl_vector.h:805),fil_aio_wait(fil0fil.cc:4435),io_handler_thread(srv0start.cc:326),start_thread(libpthread.so.0),clone(libc.so.6)
6 pthread_cond_wait,os_event::wait(sync0types.h:476),os_event::wait_low(sync0types.h:476),log_write_up_to(log0log.cc:929),trx_flush_log_if_needed_low(trx0trx.cc:1218),trx_flush_log_if_needed(trx0trx.cc:1218),trx_commit_complete_for_mysql(trx0trx.cc:1218),innobase_commit(ha_innodb.cc:4637),commit_one_phase_2(handler.cc:1642),ha_commit_one_phase(handler.cc:1622),ha_commit_trans(handler.cc:1484),trans_commit_stmt(transaction.cc:505),mysql_execute_command(sql_parse.cc:6106),Prepared_statement::execute(sql_prepare.cc:4807),Prepared_statement::execute_loop(sql_prepare.cc:4235),Prepared_statement::execute_loop(sql_prepare.cc:4235),mysql_stmt_execute_common(sql_prepare.cc:3235),mysqld_stmt_execute(sql_prepare.cc:3132),dispatch_command(sql_parse.cc:1798),do_command(sql_parse.cc:1401),threadpool_process_request(threadpool_common.cc:359),tp_callback(threadpool_common.cc:359),worker_main(threadpool_generic.cc:1606),start_thread(libpthread.so.0),clone(libc.so.6)
3 pthread_cond_wait,os_event::wait(sync0types.h:476),os_event::wait_low(sync0types.h:476),srv_resume_thread(srv0srv.cc:2533),srv_worker_thread(srv0srv.cc:2533),start_thread(libpthread.so.0),clone(libc.so.6)
3 pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1215),get_event(mysql_thread.h:1215),worker_main(threadpool_generic.cc:1602),start_thread(libpthread.so.0),clone(libc.so.6)
...
Thread 1 (Thread 0x7f29eee66900 (LWP 32728)):So, each tread stack starts from 'Thread..." line, each frame is numbered and function call is actually the fourth word in the output (in awk terms).
#0 0x00007f29ef25f431 in poll () from /lib64/libc.so.6
#1 0x000055cc4ece656f in poll (__timeout=-1, __nfds=2, __fds=0x7ffe174108e0) at /usr/include/bits/poll2.h:41
#2 handle_connections_sockets () at /mnt/home/openxs/git/server/sql/mysqld.cc:6753
#3 0x000055cc4ece7a7c in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /mnt/home/openxs/git/server/sql/mysqld.cc:6222
#4 0x00007f29ef190413 in __libc_start_main () from /lib64/libc.so.6
#5 0x000055cc4ecd9ece in _start () at /mnt/home/openxs/git/server/sql/sql_list.h:158
Based on that I tried add probe to produce stack in perf format for mysqld process, with some decoration to have a clear marker of when new capture starts:
[openxs@fc29 bpftrace]$ sudo bpftrace -e 'profile:hz:99 /comm == "mysqld"/ {printf("# %s\n", ustack(perf));}' > /tmp/ustack.txtIf you wonder about the details of command format, functions used etc, check the Reference Guide. The result looks as follows in hexdump -C output:
00000000 41 74 74 61 63 68 69 6e 67 20 31 20 70 72 6f 62 |Attaching 1 prob|This is how you can see where are the new lines, how many etc, for proper parsing. Next step was to check pt-pmp source code and http://poormansprofiler.org/, review and adapt some awk magic used there to collapse stacks, get rid of function arguments etc, and I've ended up with this draft version:
00000010 65 2e 2e 2e 0a 23 20 0a 09 37 66 32 39 65 66 62 |e....# ..7f29efb|
00000020 38 34 33 30 30 20 6e 61 6e 6f 73 6c 65 65 70 2b |84300 nanosleep+|
00000030 36 34 20 28 2f 75 73 72 2f 6c 69 62 36 34 2f 6c |64 (/usr/lib64/l|
00000040 69 62 70 74 68 72 65 61 64 2d 32 2e 32 38 2e 73 |ibpthread-2.28.s|
00000050 6f 29 0a 0a 23 20 0a 09 35 35 63 63 34 66 32 30 |o)..# ..55cc4f20|
...
[openxs@fc29 maria10.3]$ cat /tmp/ustack.txt | awk 'The above looks reasonable (even if not yet a direct match to what pt-pmp produces, I am too lazy to do much awk these days).
BEGIN { s = ""; }
/^\#/ { print s; s = ""; }
/^\t/ { if (index($2, "(") > 0) {targ = substr($2, 1, index($2, "(") - 1)} else {targ = substr($2, 1, index($2, "+") - 1)} ; if (s != "") { s = s "," targ } else { s = targ } }
END { print s }' | sort | uniq -c | sort -r -n -k 1,1 | more
199 __sched_yield,srv_purge_coordinator_thread,start_thread
16 srv_get_task_queue_length,trx_purge,srv_purge_coordinator_thread,start_thread
13 trx_purge,srv_purge_coordinator_thread,start_thread
4 srv_purge_coordinator_thread,start_thread
4 srv_get_task_queue_length,srv_purge_coordinator_thread,start_thread
2 ut_crc32_sw,buf_calc_page_crc32,buf_page_is_corrupted,buf_page_io_complete,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level_func,row_search_mvcc,ha_innobase::index_read,handler::ha_index_read_map,handler::read_range_first,handler::multi_range_read_next,Mrr_simple_index_reader::get_next,DsMrr_impl::dsmrr_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_update,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysql_stmt_execute_common,mysqld_stmt_execute,dispatch_command,do_command,tp_callback,worker_main,start_thread
2 syscall,
2 __pwrite,
2 mtr_t::commit,row_ins_sec_index_entry_low,row_ins_sec_index_entry,row_upd_sec_index_entry,row_upd_step,row_update_for_mysql,ha_innobase::update_row,handler::ha_update_row,mysql_update,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysql_stmt_execute_common,mysqld_stmt_execute,dispatch_command,do_command,tp_callback,worker_main,start_thread
...
What's more interesting is that with bpftrace we can also get kernel stack for thread while profiling. I took a quick look at what we may get as a result of combining both:
[openxs@fc29 bpftrace]$ sudo bpftrace -e 'profile:hz:99 /comm == "mysqld"/ {printf("# %s\n%s\n", kstack(perf), ustack(perf));}' > /tmp/kstack.txtI just pring one after the other without any separator in perf format, so I can apply the same trick as above to the result:
[sudo] password for openxs:
^C
openxs@fc29 maria10.3]$ cat /tmp/kstack.txt | awk ' BEGIN { s = ""; }Looks cool to me, we see where we spend time in kernel while on-CPU, not only in user space code.
/^\#/ { print s; s = ""; }
/^\t/ { if (index($2, "(") > 0) {targ = substr($2, 1, index($2, "(") - 1)} else {targ = substr($2, 1, index($2, "+") - 1)} ; if (s != "") { s = s "," targ } else { s = targ } }
END { print s }' | sort | uniq -c | sort -r -n -k 1,1 | more
93 __sched_text_start,schedule,__x64_sys_sched_yield,do_syscall_64,entry_SYSCALL_64_after_hwframe,__sched_yield,srv_purge_coordinator_thread,start_thread
49 __sched_yield,srv_purge_coordinator_thread,start_thread
42 do_syscall_64,entry_SYSCALL_64_after_hwframe,__sched_yield,srv_purge_coordinator_thread,start_thread
20 srv_get_task_queue_length,trx_purge,srv_purge_coordinator_thread,start_thread
11 trx_purge,srv_purge_coordinator_thread,start_thread
2 srv_get_task_queue_length,srv_purge_coordinator_thread,start_thread
1 __x86_indirect_thunk_rax,do_syscall_64,entry_SYSCALL_64_after_hwframe,__sched_yield,srv_purge_coordinator_thread,start_thread
1 ut_crc32_sw,log_block_calc_checksum_crc32,log_write_up_to,trx_commit_complete_for_mysql,innobase_commit,commit_one_phase_2,ha_commit_trans,trans_commit_stmt,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysql_stmt_execute_common,mysqld_stmt_execute,dispatch_command,do_command,tp_callback,worker_main,start_thread
...
1 _raw_spin_unlock_irqrestore,ata_scsi_queuecmd,scsi_queue_rq,blk_mq_dispatch_rq_list,blk_mq_do_dispatch_sched,blk_mq_sched_dispatch_requests,__blk_mq_run_hw_queue,__blk_mq_delay_run_hw_queue,blk_mq_run_hw_queue,blk_mq_sched_insert_requests,blk_mq_flush_plug_list,blk_flush_plug_list,blk_finish_plug,ext4_writepages,do_writepages,__filemap_fdatawrite_range,file_write_and_wait_range,ext4_sync_file,do_fsync,__x64_sys_fsync,do_syscall_64,entry_SYSCALL_64_after_hwframe,fsync,fil_flush_low,fil_flush_file_spaces,buf_dblwr_update,buf_page_io_complete,fil_aio_wait,io_handler_thread,start_thread
1 page_simple_validate_new,buf_dblwr_flush_buffered_writes,pc_flush_slot,buf_flush_page_cleaner_coordinator.cold.151,start_thread
1 page_cur_insert_rec_write_log,page_cur_insert_rec_low,page_copy_rec_list_end_no_locks,btr_page_reorganize_low,btr_can_merge_with_page,btr_compress,btr_cur_compress_if_useful,btr_cur_pessimistic_delete,row_purge_remove_sec_if_poss_tree,row_purge_record_func,row_purge_step,que_run_threads,srv_worker_thread,start_thread
1 os_thread_yield,srv_purge_coordinator_thread,start_thread
1 os_thread_get_curr_id,fseg_free_page_func,btr_page_free,btr_compress,btr_cur_compress_if_useful,btr_cur_pessimistic_delete,row_purge_remove_sec_if_poss_tree,row_purge_record_func,row_purge_step,que_run_threads,srv_worker_thread,start_thread
...
It's sometimes scary and not that easy to use bpftrace, but you'll get there eventually, same as we got to North Greenwich via that Emirates Air Line back in 2019. The only limit is the sky! |
* * *
To summarize:- bpftrace and bcc tools you get from vendor packages may not work as expected, so be ready to get nightly builds etc or, as I prefer, build from source. If this is a problem better use ftrace :)
- It seems getting stack traces with bpftrace for pt-pmp is an easy next stap to make. I may be integrated as one of tracers supported here.
- Now we can also add kernel stack to the picure.
- "JUST DO BPF"!