As part of an upcoming episode of Shift Shift Forward I answered a few questions about incident response. The description of the episode is:
Weāve all experienced it before ā you go to your favorite website, but itās not loading. Or you try clicking on a link to complete a transaction, but your browser times out and you get an error message. It can be frustrating to deal with outages and similar issues from a user perspective, but letās see what it looks like from the other side. What happens when these incidents occur, and what does it take to get everything running smoothly again?
The Shift Shift Forward team interviewed the SREs and our manager at Glitch - as well as a lot of other people - and asked a bunch of great questions.
The question I liked the best was āWhat are you feeling during incident responseā or something to that effect. I wasnāt able to give a good answer in the moment but thought it was such a good question that it wrote down some quick notes after and did a quick recording.
Hereās the notes of what points I intended to make - itās a bit different from what I ended up saying, but thatās usually how it goes for me š I think I like the audio version better, so if you want to hear me clumsily work through the notes Iāve included the audio recording as well š
As for the feeling you experience during incident response, for me, I go through almost the full spectrum of feelings, not just the bad ones as you might think.
-
When I first get paged thereās a short time where Iām feeling dread, or at least a bit afraid - Iām worried it might be an actual incident that will affect our users.
-
If it turns out it is an incident, then you go into incident response mode, youāre very focused. You assemble a team, so that means picking a scribe to take notes, and a communicator thatās responsible for keeping the rest of the company updated as we work through the indent.
-
Once the incident response it going it can be very exciting - itās still extremely stressful - but it can be quite fun. Youāre trying to figure out whatās going wrong, coming up with hypotheses and trying to prove, or disprove, them together with the team. That is extremely challenging and can be very fun. You also learn more about your systems by looking at them when theyāre broken than when theyāre happy.
-
Finally, once the incident is over and you can mark the incident as resolved, that is extremely fulfilling. At that moment Iām always feeling extremely proud of the team that worked on the incident, and Iām feeling really proud of myself for not breaking down.
I think thatās what makes it worthwhile to be part of the on-call rotation, itās not just stress and dread, it can also be very exciting, challenging, fulfilling, and fun.
Eventually, as you get more experience with incident response, the bad feelings take up less space.
The episode airs on May 13 - while this little clip might not be included in it I know itās going to be a great episode, so head over an subscribe š