Saturday, January 9, 2016

"I'm Winston Wolf, I solve problems."

My (few) readers are probably somewhat tired of boring topics of metadata locks and gdb breakpoints that I discuss a lot this year, so for this weekend I decided to concentrate on something less technical but still important to me - the way I prefer to follow while providing support for MySQL.

Before I continue, it's time to add the explicit disclaimer: the views on how support engineer should work expressed below are mine alone and not those of my current (or any previous) employer. Specific case I describe may be entirely fictional and has nothing to do with any real life customer. I love thy customers in reality...

One of my favorite movies of all times is Pulp Fiction. Coincidentally, it was released in 1994, more or less at the same time when providing technical support started to become one of my regular job duties, not just a hobby or boring part of sysadmin job role I had to play even when not wanted. I had to provide support for the software I had written as soon as it had got first customer, then moved on to helping my colleagues (whom we developed software with together) with all kinds of technical problems they had, from proper coding style to linking Pro*C programs to, well, getting more disk space on NFS server that was actually my workstation. In 10 years or so this ended up with sending CV to MySQL AB and getting a Support Engineer job there (instead of a Developer one I applied for).

So, with this "support" role that I played already when I watched the "Pulp Fiction" for the first time, it was natural for me to find out that my favorite character there is "The Wolf". As soon as I started to provide MySQL Support, I've added the quote that is used as a title of this post to my LJ blog, as a motto. This person and his approach to "customers" of his very specific "service" (resolving all kinds of weird problems) looked ideal to me, and over years this had not changed. Let me remind you this great dialog with Vincent Vega (see the script for the context and details if you do not remember them by heart; I've fixed one typo there while quoting, bugs are everywhere, you know...):

               The Wolf and Jimmie turn, heading for the bedroom, leaving 
               Vincent and Jules standing in the kitchen.

                              (calling after him)
                         A "please" would be nice.

               The Wolf stops and turns around.

                                     THE WOLF
                         Come again?

                         I said a "please" would be nice.

               The Wolf takes a step toward him.

                                     THE WOLF
                         Set is straight, Buster. I'm not 
                         here to say "please."I'm here to 
                         tell you what to do. And if self-
                         preservation is an instinct you 
                         possess, you better fuckin' do it 
                         and do it quick. I'm here to help.
                         If my help's not appreciated, lotsa 
                         luck gentlemen.

                         It ain't that way, Mr. Wolf. Your 
                         help is definitely appreciated.

                         I don't mean any disrespect. I just 
                         don't like people barkin' orders at 

                                     THE WOLF
                         If I'm curt with you, it's because 
                         time is a factor. I think fast, I 
                         talk fast, and I need you guys to 
                         act fast if you want to get out of 
                         this. So pretty please, with sugar 
                         on top, clean the fuckin' car.

Over the years of doing support I've found out how important it is to tell it straight and honestly from the very beginning: "I'm not here to say "please."I'm here to tell you what to do."

In services the approach is often the following: "The customer is always right". In reality, speaking about customers of technical services at least, it may NOT be so. Customer may be wrong with his ideas about the root cause of the problem, and (as the wiki page linked above says) can be dishonest, have unrealistic expectations, and/or try to misuse a software. What's even more important, customers are rarely right when they say how services should be provided for them.

All these:
"Can I speak on Skype with Valerii? ... Can I chat with him on Skype please? It's a lot easier ... and I have questions" 
may end up with a chat, or may end up with an email that was already sent while this chat with oncall engineer happened, explaining both possible root causes of the problem, asking followup questions to define the real root case, and listing next steps to pinpoint or workaround the problem.

As this is still a technical blog, the problem in the case I had in mind while writing the above, was a very slow query execution on MariaDB 10.x, where query had 80K IDs in the IN list and previously it was executed fast on Percona XtraDB Cluster 5.5.x. The range of root causes to suspect was wide enough initially, from Bug #20932 to Bug #76030 and maybe to some MariaDB specific bugs to search for, to disk I/O problems, lock waits (what if the SELECT query was executed from SERIALIZABLE transaction). So, I kept asking for more diagnostic outputs and insisting on getting them in emails, and yes, long outputs are better shared via emails! Sharing them in chat or, even more, discussing over phone, or seeing them over a shared desktop session (the approaches many customers insist on) is neither faster nor more convenient. I say what to execute, you do that, copy/paste the output and send email (or reply in the issue via web interface, if you prefer to paste there), "pretty please, with sugar on top".

As soon as evidence provided shown there is no locking or disk I/O problem, my investigation was concentrated around this important fact: it was fast on 5.5.x and is slow on MariaDB 10.x with the same data. What is the difference in these versions that matter? Most likely it's in the optimizer and new optimizations they have in MariaDB! 

Do I know all the MariaDB optimizations by heart? No, surely, I have to go read about them, check how proper optimizer switches are named and make up my mind about suggestions. Is it OK to do all these while hanging on phone with customer or chatting with him? Well, maybe, if customer prefers to listen to my loud typing and sounds in my neighborhood... I'd prefer NOT to listen to any of that, ever, and not to hang on the chat with 10 minutes in between messages. So, the only chat reply customer got was: "Please reply to his request and I'll have him follow up."

The last my request at this moment was simple:
"I do not see optimizer_switch set explicitly in your my.cnf, so I assume defaults there as shows. Can you, please, check if setting this:

set optimizer_switch='extended_keys=off';

before running the problematic query allows it to run faster?
Even though the bug that led me to this idea (after reading the details of what optimizations MariaDB provided and how they are controlled) is still "Open", and I had never been able to create a test case not depending on any customer data, I've seen the problem more than once. I think fast, and I type fast, so I shared this suggestion immediately (1h 20 minutes after customer started to describe the problem in chat, 43 minutes after customer provided all the outputs required in email). And you know what happened? Chat continued more or less like this:
"wow... optimizer_switch with extended_keys off worked... from 1000k+ seconds to ... 1 second ... rofl ...  so extended_keys=on by default ... it literally halted our database today almost ... thank you for your assistance ... i would have never of thought to change that... and you guys caught it, so props"
So, the immediate problem was resolved and, you know, the resolution started with my explicit refusal to join any chat until I see the evidence and outputs requested in emails. It also ended with email suggesting to switch off one specific optimization that is used by default and is known to me, the one pretending to be expert, from several previous cases (similar or not so much, as I never seen anything like this this happening on MariaDB before). It took 1 hour 20 minutes from initial problem statement communicated to problem resolved, and all this time I worked asynchronously and concurrently on this and few other issues, and had not said or typed a word to customer in chat.

This is how The Wolf solves problems: 
"I'm here to tell you what to do. And if self-preservation is an instinct you possess, you better fuckin' do it and do it quick. I'm here to help. If my help's not appreciated, lotsa luck gentlemen"
I do it the same way, and. IMHO, any expert should do it this way. If a customer always knows better what to do and how to communicate, why they ended up with a problem that brought them to me? 

I try to prove every point I make while working on problems, and I expect from the other side to apply the same level of efforts - they have to prove me the way they want issue handled is better than the one I prefer and suggest, if they think this is the problem. After that I'll surely do what's the best from them, knowing why do we both do it so. 

In reality, the problem usually lies elsewhere... Now, it's time to re-read my New Year wishes post.


  1. Replies
    1. I had to write a post like this 10 years ago... Unfortunately I did not have a habit to discuss my work in public at that times.

  2. Ah forgot to mention, some customers I have dealt with are a few steps further. Some of them do send you half the data , say 2 files of the 4 you had asked for. And then insist that you first analyze those 2 and provide them with the analysis.

    It leaves me confused, to laugh or to cry !

    1. I always comment on what I see in what was sent so far and remind about the missing items.

      Moreover, if I am on shift and working, I reply to emails as fast as possible (that is, as soon as I have something useful to reply with), no matter if they are from customers or colleagues.

  3. Good one !! Many times they also tell us what to do and how to resolve.