Could you be replaced? Could your product get hacked? Could an essential service fail? Could a key co-worker leave? Could a critical dependency arrive too late? People often measure themselves, their peers, and their heroes by how they respond to crises. That’s nice, but it’s dumb.
Some people freeze, run, or hide during crises—they’re pathetic cowards. However, most people band together and face crises with courage and comradery. It’s not about being a hero, especially since crises often are caused by denial, neglect, and a lack of preparation. We should be exalting people who face facts and prepare, instead of “heroes” who save the day from an avoidable issue.
I measure people by how they respond to potential crises in advance. Do they vainly try to be irreplaceable? Do they assume their product is too good to be hacked or their services too robust to fail? Do they cling to team members by generating guilt and promising perks? Do they treat troubled dependencies as their partner team’s problem? In other words, are they afraid to face reality? In reality, people are replaced, products are hacked, services fail, people leave, and dependencies are late. Don’t be a coward. Get real.
Face the facts
You can try to freeze reality, but change comes anyway. You can run from reality, but it catches up to you. You can hide from reality, but it will find you. Instead of being a coward, face reality on its terms. Prepare for it, practice for it, and embrace it.
Oh wait, I hear the crying of the clueless.
- “I just want to keep my job and my good people!” Jobs change and people leave—you’re frightened of dealing with that.
- “We follow the best security and reliability practices, so our products and services are solid!” Products get hacked and services go down—you’re in denial.
- “We’ll finish on time, no matter what it takes!” You’re not a Greek god—you’re a pathetic planner.
It’s good to retain your job and your people, follow the best security and reliability practices, and finish your work on time. It’s cowardly and contemptuous to deny reality, until you really are in crisis, and then play the hero in the calamity you caused.
Let’s go through some examples.
You can be replaced
We all want to retain our jobs and the people we enjoy working with. But artificially postponing inevitable changes leads to broken relationships, stunted growth, and eventual crises.
When you attempt to make yourself irreplaceable, you might keep your job a little longer, but you’ll also generate resentment, be stuck in your role, and rarely be on true vacation. Whenever you’re sick, slacking, or away, your co-workers feel exposed and resent it. Should something break, you’re on the hook, no matter where you are or how you feel. You get to play the hero, but after a couple of crises, your team will plot against you. I’ve seen teams refactor and redesign systems, just to free themselves from depending on a teammate whose code was undocumented and impenetrable.
As a manager, a similar situation occurs when you try to retain people using guilt, perks, and intimidation. You might keep folks a little longer, but many will resent being held hostage. Eventually, they’ll escape in secret (via a “stealth” interview), and you’ll be caught unprepared to deal with the gap.
Don’t avoid change—embrace it. Ensure everyone has a backup, with information spread across your team and documented as needed. Encourage people to take on new challenges and help them grow. When experienced team members are ready to leave, assist them in finding new roles. Recruit new young talent with fresh ideas, creating headroom and mentoring opportunities for previous recruits. Your team is a river, not a pond, and so are you. Keep challenging yourself, keep growing, and keep living.
I write about staying vibrant in A change would do you good and finding a new role in Get a job. I cover maintaining a healthy team composition in Go with the flow.
Failure is an option
We all take pride in writing quality software that is secure and reliable. But assuming your code won’t get hacked or that services won’t fail is naïve at best and negligent at worst (as we and Sony learned again last Christmas). When trouble arrives, you won’t be prepared, you’ll stumble and stress, and your customers will freak and flee.
You still want to write secure and reliable code, but you should assume failure.
- For security, that means having a plan in place for responding within days to an exposed vulnerability and then practicing. (If you’ve been following the news, you know we don’t always meet this standard.) You can practice by responding to a minor vulnerability as if it were major or by treating a known but mitigated vulnerability as a real one. Practice keeps you prepared, calm, and ready.
- For services, that means designing for failure. What customers see should always be available and functional, though perhaps less functional when some services are out. Design graceful degradation and rerouting into your services, and then practice by deliberately taking services down (controlled chaos). Practice helps you spot and correct issues in a controlled fashion and be ready for real problems when they occur. (Our best teams and competitors use controlled chaos as a catalyst for correction.)
Don’t assume trouble will overlook you—embrace it. Practice handling difficult situations regularly. Take away the surprise and panic, and replace them with competence. Customers will notice the difference.
I talk more about resiliency in Crash dummies: Resilience and the importance of practice in It starts with shipping.
Expect the unexpected
We all desire to deliver value to our customers on time, or better yet, all the time. But if delivering value depends on last-minute cuts and heroics, there will be times you deliver crap or deliver late.
You still want to be bold, and deliver breakthrough value that beats competition to market, but you should do so with open eyes and a firm grip on reality. There are three keys to realistic project management:
- Focus. Know the difference between what you must deliver, what you can deliver, and what is nice but unnecessary. Drive that focus throughout your team, and keep it sharp.
- Togetherness. We win and lose as a team. They aren’t late—we are late. They aren’t in trouble—we are in trouble. They didn’t fail—we failed.
- Transparency. Expose and encourage bad news early, while you have time to adjust. If folks are scared to deliver bad news, we all will suffer.
Don’t fear challenging schedules or bad news—embrace them. A realistic leader knows the difference between the possible and improbable, focuses on the true essentials, unites the team as one, and welcomes bad news early as a means toward eventually winning together.
I talk about proper planning in You can’t have it all and realistic breakthrough change in The value of navigation.
Embrace your destiny
It’s good to respond well to crises, but it’s better to tame them in advance. Instead of fearing or fighting change, failure, and deadlines, embrace them.
Embrace flow of people through your team. Embrace security patches and service outages. Embrace difficult tradeoffs and bad news. Practice doing difficult duties regularly. Get good at the hard stuff and everything gets easy.
You can’t freeze, run, or hide from reality. Face it with courage, conviction, and confidence. Don’t be surprised—be prepared. Make your world, and the world, a better place.
Failure is an option – love it. I used a similar line "Failure is Always an Option" and accompanied it with these stats from the first year of a new Google Data Center:
1 Power Distribution Unit failure (500-1000 machines)
1 rack-move (500-1000 machines)
1 network rewiring (rolling 5% of machines)
20 rack failures (40-80 machines)
8 network maintenances (~30-min connectivity losses)
12 router reloads
3 router failures
Dozens of minor 30-second blips for DNS
1000 individual machine failures
1000s of hard drive failures
Then I will advise folks to employ some sort of testing for their fault tolerant design. Test pre-prod first, but then use something like Chaos Monkey to test production.