Developers tend to think of ourselves as generally competent individuals. We tend to think that even more when it comes to how to operate or manage software. Especially the software we write.
And we would all be wrong.
We are often the worst at managing and operating our own software solutions. But why?
We Know Too Much
Because we wrote the code, we know all the details and the specifics. We know how it all works and how messages traverse the system. In fact, we know too much.
We know the nitty gritty, so we forgot the API definitions.
We know what is supposed to happen, so we make assumptions about what is happening.
We know how “scenario A” will work out in our software, so we fail to do proper contingency planning.
We Focus on the Wrong Problem
I don’t know about you, but I tend to be rather defensive of my code. Even if it’s terrible, when someone says “You software didn’t work” I have a mix of anger, shame, and an identity-crisis thinking I’m a terrible engineer (ok — maybe not a crisis). I want to defend my code and the decisions we made as long as I can.
For this very reason, developers don’t look objectively at an operational problem. Rather, we focus on why the operational problem will never exists; on the people who did their jobs wrong; or the other developers on our team who did a poor job.
All of this is a waste of time and energy and they all keep us from solving the problem at hand. They keep us from understanding how to change our software rather than changing the environment.
Don’t get me wrong: there are times and places you must have a defined set of constraints on the use of your software — but you should also assume people will break them.
We are Terrible Planners
Developers tend to be classic examples of the planning fallacy from Daniel Kahneman. We overestimate how fast we can solve a problem. We assume nothing wrong will happen. We down-play any risks as not a big deal because — of course — we are “the best in the biz”.
The reason NASA is so successful is because the do the exact opposite: they assume everything will wrong. And the assume it all goes wrong at once. They spend literally years preparing for a single mission that lasts 6 days.
We plan for 1 hour for the next week’s worth of week… in case you like the hard math we spend likely 1 hour of planning to 40 hours. Astronauts spend 1095 days to roughly 8 days in space…
I’m not advocating we move our ratios to be NASA-level, but it is interesting how little time we spend preparing for, documenting, and even simulating error scenarios.
So with all of that, what are we as developers supposed to do to help get better at operations? Even if we aren’t the operators directly, what can we do to better support those that are?
Seriously. You aren’t the whiz kid anymore — and neither am I. We allow our ideas and opinions of our skills keep us from getting even better. Instead of listening from our operators or end-users, we shut them down. Even if our users are wrong and misusing our software, if one person go tit wrong, don’t you think others are too? How do we correct that?
Don’t get me wrong, I bet you are a great dev. I bet you can code a dynamic programming problem, BFS, A*, and even Dijkstra’s with a single hand (w/o using Google?). But I also bet the last time you did mutation testing was a while ago. And I bet the last time you just poked your application to see if broke was a while ago too.
As mentioned earlier, you need to have some good documentation about what your software can and cannot do. And how to do it. And what to do if you experience a problem. How many of your have reviewed user-guide for your users? How many of you have checklists for deployments (complete with rollback steps)?
For all the naysayers quoting the Agile Manifesto “working software over extensive documentation” then bravo: you have missed the point entirely.
Working software isn’t an excuse to not have documentation. In fact, it should be part of the process of you definition of done and it should be easier to maintain if you build you software in lockstep with your documentation. The whole point of this aspect of the manifesto was that teams had created loads and loads of documents about what their software was going to do. It was out of sync before the project ever even started!
Get Serious About Requirements Gathering
The majority of projects I have worked on that have either been slow, had a large number of bugs, or both, was due to bad requirements.
But requirements are hard! And that is why it is worth extra time and effort to get right. As developers, we should question our user stories, and the requirements they have. We should spend time looking for contradictions in behavior. We should strive for as little ambiguity as possible.
When it comes to managing and using the software you build, remember that you likely are the worse at it.
But hope isn’t lost. Give the benefit of the doubt to your users when you feel they aren’t using your software correctly. Spend some time writing high quality documentation. And before you write a piece of code, get some really good and concise requirements together.
Try doing them and see what happens!