At Microsoft, we can execute, but can we think? When billions of dollars are on the line, you better not be guessing about decisions. A decade ago, our products weren’t guesses; they were enhanced impersonations of our competitors’ successful products. We won by outdoing those ahead of us.
Now we lead in many areas, and without competitive targets, brain-challenged teams rely on guesswork. Their mantra: code cool stuff and hope that customers find something they like. Their results: a disorganized mess with little value or uptake.
Thankfully, enlightened teams don’t guess. They rely on data Microsoft gathers from research and directly from customers to determine real points of customer pain and delight, then enhance our products accordingly. Without that data and feedback, we’d be hopeless. Yet just mention using data to drive how we build our products and you’ll be lucky to leave the room with your body intact.
There is no try
If great teams use data to remove guesswork from what we build, why does guesswork dominate how we build it? Today’s software development process is all about guts and glory. “Best practices” are conventional wisdom, processes are tribal knowledge, and many self-righteous Agile methods are loaded with dogma instead of data. Why?
Don’t tell me there is data out there that proves certain methods are effective. I’m not talking about someone else’s data, I’m talking about yours. How do you know your team is using the right methods with the right results, before all the bugs arrive? How do you know if you’re any better than yesterday? Why aren’t you using data throughout to find out?
Maybe it’s because software development is a creative process or a craft that can’t be measured. Maybe measurements are faulty or easily gamed. Maybe there’s so much data that you could use it to justify anything. Or maybe frightened fanatics are foolishly focused on regressive rationalizations of their suspect superstitions. They are too scared to measure and too ignorant to know how.
Well giving it your best shot on tried and true methods isn’t good enough. Not with this much money and this many people’s lives and livelihoods on the line. Being clueless about using the right metrics the right way is no way to go through life, buddy. Fortunately, you don’t have to be in pre-med to understand it.
Is there a problem here?
I hear you saying, “You sick man, don’t you know metrics are evil? Don’t you know hollow-headed managers will use them to pit you against your peers, and your team against other teams doing different work? Don’t you know they just get gamed, while real progress and real customers suffer?” Yeah, I know. We’ve already established you don’t know anything about using measures properly. But since you brought it up, let’s break down your objections:
§ Software is a creative craft that can’t be measured. As I talked about in “A software odyssey – from craft to engineering,” craft is fine for tables and chairs, but not good enough for bridges, pacemakers, and software people depend upon. Regardless, you’ve missed the point. Lesson 1: Don’t measure how, measure what.
§ Measurements are faulty and easily gamed. Others put this as, “You get what you measure.” If you measure lines of code, people write lots of bad code. If you measure bugs fixed, they create more bugs to fix. Lesson 2: Don’t measure intermediate outcomes, measure desired end results.
§ There’s enough data to justify anything. Computers produce lots of data and software development happens on computers. However, all that data is useless if it presents more questions than it answers, regardless of how pretty the graphs might look. Lesson 3: Don’t just collect data, use measures to answer key questions.
§ Managers will use data against you. Managers are notoriously lazy. Why apply thought if numbers tell you what to do? Good measures don’t tell you what to do, because good measures don’t measure how (remember lesson 1?). Lesson 4: Don’t use measures that make your decisions, use measures that tell you a decision is needed.
§ Managers will make unfair comparisons. Managers are notoriously clueless. From their perch at 10,000 feet, software is software and bugs are bugs; all subtlety is lost. Focusing on desired end results helps, but it’s not enough to avoid improper comparison. Lesson 5: Don’t compare raw measures, use baselines and exemplars that provide needed context.
Now let’s follow the lesson plan.
What’s going on?
Lesson 1: Don’t measure how, measure what.
People hate being forced to work a particular way. Sure, they like pointers and suggestions. They can live with constraints and requirements, but no one wants to be an automaton.
Once anyone starts doing a task, they are sure to find ways to do it better. Forcing people to work your way, instead of their own, is guaranteed to hit a point where your way is worse than theirs. This leads to feelings of frustration on their part, and feelings of resentment, stupidity, and disrespect toward you.
Measuring how you want something done is equivalent to telling people how to do it. That sets you up as an idiot people resent and disrespect. I don’t recommend it.
Instead, measure what you want accomplished and leave the how to the intelligent human beings doing the work. Say, you want a scenario to work. Instead of measuring spec completion, function points, or bugs remaining (all hows), break down the scenario into segments and measure how many segments and segment transitions work as desired. Ideally, you’d have a customer be the judge, but an independent tester would suffice.
In the end you’ll thank me
Lesson 2: Don’t measure intermediate outcomes, measure desired end results.
Metrics get gamed. We all know it; most of us have done it. Why? Because managers manage metrics. If you don’t hit your metrics your manager is going to emerge from his or her cave and annoy you. That makes the goal “hitting your metrics” not “hitting your goals.”
How do you avoid the trap of hitting metrics instead of goals? Two ways:
§ Don’t use metrics; be dumb and happy.
§ Make hitting your goals and your metrics equivalent.
Think about your team’s goals. What are they really? What outcomes do you want as a team? What are you trying to accomplish? Measure that desired end result. Then it won’t matter how the team hits those metrics (within reason), because hitting them is just what you wanted.
I want to know right now
Hold on a second, a panicked manager has a question, “If I only measure end results, how will we ever get there?!?” That’s actually a good question. No one successfully develops software without iteration—take a small step forward, check if you’re on the right track, then take another step. How can you check if you’re on track if you’re only measuring the end result?
There are two approaches to iterative feedback that still focus on desired end results:
§ Make every iteration produce end results. This is the best approach and a fundamental concept of Agile methods. By producing customer value each iteration, you can regularly check with customers to see if you’re on track. Your metrics should enhance this effort by measuring end results the customers want (such as usability, completeness, and robustness).
§ Apply predictive measures that tightly correlate to desired end results. This approach isn’t quite as good, because correlation is never perfect. However, some end results can’t be measured accurately till the end, so using predictive measures is necessary (for examples, read my Bold predictions of quality column). If you must use predictive measures, always back them up with measures of the real results to be sure you are getting what you want.
Then make your choice
Lesson 3: Don’t just collect data, use measures to answer key questions.
Software development, by its very nature, produces tons of data—build messages, test results, bug data, usability research, complier warnings, run-time errors and asserts, scheduling information (including burndown charts), source control statistics, and on and on. Being software engineers, you’ve probably piped this data into various reporting packages and produced endless charts and graphs. Well good for you.
Actually, is it good for you? No, no it isn’t. Too much information is just as bad as not enough. In a crowded mall, you don’t hear more conversations, you hear nothing at all. Your brain treats everything as noise and blocks it out. The same thing happens with too much data.
Collect all the data you want, but don’t throw it in people’s faces. Instead, think carefully about the desired end results you want. What key attributes do you care about? Some people call these Critical to Quality (CTQ) metrics or Key Performance Indicators (KPIs). You want a small, focused mixture of specific and generic CTQs.
Some CTQs will be specific to your product. Say you’re working on wireless networking. A desired end result is a quick and lasting connection, so your CTQs would be time to connect and average time till connection failure.
Some CTQs will be generic to software development. Say you’re lean minded (and I hope you are). You’d care about minimizing cycle time. Your CTQ would be time to complete a scenario as desired from start to finish. Say you’re engineering-quality minded (and I hope we all are). You’d care about solid, stable code. Your CTQs would be a predictive measure, like code churn or complexity, and an actual measure, like Watson hits.
We are in charge
Lesson 4: Don’t use measures that make your decisions, use measures that tell you a decision is needed.
I think the biggest fear any employee has when it comes to metrics is being treated like a number. I wrote about the pitfalls of this in my column, “More than a number.” If your review and rewards come down to a formula, something is seriously wrong.
The same goes for all decisions. If the decision comes down to a formula, then all thinking and consideration are absent and we become servants to our processes and tools. That is backwards. Processes and tools work for us, we don’t work for them.
Luckily, if you follow the previous lessons, measuring only your desired end results, your management can’t use the measures to make decisions. Yes, if you consistently missed your team’s desired results, management could make you suffer, and you’d deserve it. However, because all management has are the end results, they wouldn’t know why you missed the results, or who or what was responsible. They’d have to investigate, understand, and analyze. They’d have to think before coming to a conclusion.
Great metrics tell you you’ve got a problem. They can’t and shouldn’t tell you why. Root cause analysis requires careful study. If people say the easy answer is in a metric, both they and the metric are lying.
A girl’s gotta have her standards
Lesson 5: Don’t compare raw measures, use baselines and exemplars that provide needed context.
Wait, a panicked engineer has a question, “Okay, so we measure a great end result, like completed scenarios, and our feature team has half the number of scenarios completed as another team. Now our manager is demanding we work harder and longer, even though our scenarios were broader and far more complex. Using the ‘right’ metric is only causing us grief!” That’s a fair point. I’ve got good news and bad news.
The good news is that the manager is actually trying to fix a real problem (the right metric helped). The bad news is that the manager didn’t consider what the problem was. Instead of analyzing the root cause of the problem (complex and broad scenarios), the manager is assuming the problem is lazy engineers. You need to help your manager by providing context.
The easiest and best forms of context are baselines and exemplars.
§ Baselines tell you what to expect from a metric. The first time you get an end result, its measure is the baseline. From that point forward, you know if you are getting better or worse by comparing to the baseline. Your manager can’t be surprised your scenarios are broad and complex if your baseline already established that fact. Baselines are extraordinarily handy for tracking improvement.
§ Exemplars tell you how good your results could be. The best result achieved for a measure is the exemplar. It doesn’t matter how it was achieved or which team achieved it. The difference between your results and the exemplar is your opportunity to improve. But what if they cheated to get scenarios done faster? If done meets the quality and compliance bar, then they didn’t cheat. They just found a better way. But what if our scenarios are broader and more complex? Well, you should break them down and simplify. Remember, you are measuring desired end results. If you really care about delivering value to customers in fast, small chucks, you need to keep your chunks fast and small. Exemplars are priceless for spotting your biggest improvement opportunities.
A unique perspective on the world
So, now you know what and how to measure, and the differences between good and bad metrics. You also know ignoring metrics leaves you happy but dumb. Ignoring metrics turns software development into guesswork and leaves your success to chance. I believe the polite word for such irresponsible behavior is “foolishness.”
However, you’ve probably also noticed that good metrics that only measure desired end results are not generic. You can’t simply use the same ones for every project. Sure, there are some engineering quality and efficiency results you always want (like being productive, secure, robust, and responsible), but other results around performance, usability, and overall customer value depend on your desired scenarios and the customer’s needs.
This means that putting the right measures in place isn’t trivial. It requires some thinking as a team to decide what you really care about for this release and how you’ll know you’ve reached your goals. Then you’ll need to make those measurements part of your iteration and feedback process from the beginning to always know you’re moving in the right direction. Heresy, right? Actually working toward known goals? Maybe I’m the fool to think we’d be so sensible.