Call me “old school” but I believe in shipping. Trying isn’t enough. Getting close isn’t enough. Good ideas aren’t enough. You’ve got to ship.
It used to be that interviews started with, “What have you shipped?” If you hadn’t shipped recently, “Why?” Why? Because you can’t deliver customer value if you don’t deliver. You can’t iterate and improve without finishing an iteration. You can’t get customer feedback without customers.
People used to complain that promotions and rewards were disproportionally distributed to those who shipped. I say, “Absolutely, that’s how it should be.” Does this hurt quality? No, you set a high minimum quality bar and ship. Does it hurt innovation? No, innovators have always risked an initial drop in pay to receive a big payoff should they deliver.
It all starts with shipping. This is particularly apt with services, where everything literally starts with shipping, and where I’m focusing the rest of this column. Our critics claim that in the new world of services Microsoft has forgotten how to ship. Perhaps, but Microsoft has forgotten more about shipping than most companies will ever know. We just need some reminders and reeducation, especially when it comes to services.
I offer you my service
How much about shipping services has Microsoft forgotten or doesn’t get, according to critics? Not as much as they would have you believe, but enough to make you think. Let’s go over the herrings and the heartaches, mixed with a little happiness.
The red herrings:
§ Services center on data while packaged products center on functionality.
§ Services have greater security concerns than packaged products.
§ Services have serious issues with dependencies.
The heartaches (and happiness):
§ Services are living, changing things.
What is that smell?
The first services red herring is a big one, “Services change everything.” As I addressed in At your service, this is total bovine fertilizer. Services start and end with helping customers achieve their goals, just like all products ever. You focus on the customer experience and what they hope to accomplish or you lose. End of story.
The next three red herrings—centering on data, security concerns, and dependency issues—all apply just as well to shipping packaged products, though it may have taken us longer to realize it. You can’t expose data format changes to customers without chasing them away, on the client or the server. There isn’t a computer product or service today that isn’t vulnerable to attack—you must secure them all. Finally, if you think external dependencies aren’t problematic on the client, you clearly don’t use many drivers. I’m not saying these aren’t real issues—I’m saying they aren’t new or specific to services.
The last red herring is among the most common concerns raised about why shipping services differs from shipping packaged products—high availability and Internet time. Look, it’s not okay for packaged products to never work or require a reboot every time you use them; at least it hasn’t been for quite some time. The quality bar is no different for services, though there are plenty of services that fail constantly.
As for Internet time, that hit packaged products a decade ago with the introduction of Windows Update. And if you think that those patches are just security fixes, you haven’t been paying attention. More and more we are fixing all kinds of experience issues shortly after customers report them, for services and packaged products. That’s a great thing for customers.
However, gradually improving the customer experience every month or every day isn’t enough. Both services and packaged products need to ship significant, orchestrated updates to deliver breakthrough customer value. Facebook wasn’t going to gradually update itself into Twitter any more than Vista would gradually update itself into Windows 7. You must focus on what the customer is trying to accomplish, and sometimes that isn’t a quick change.
There are too many of them
However, not everything about shipping packaged products applies to shipping services. There are mental, process, and team adjustments that you need to make.
First and foremost is that services run across hundreds or thousands of machines dispersed in multiple data centers worldwide. Sometimes functionality and data are replicated. Sometimes functionality and data are specialized. Usually, it’s a combination of both for scale and reliability. Naturally, this presents design and synchronization problems but plenty of books have been written about that (read don’t rediscover). The less obvious challenges are around debugging and deployment.
Why is debugging a service so tough? Timing issues are killer given multiple threads on multiple processors across multiple machines. Yikes! However, that’s not even the toughest challenge.
What’s the first thing you do when debugging an issue? Analyze the stack, right? With services the stack is split across servers and requests, making it nearly impossible to trace a specific user action. The good news is that there are new tools that help tie user actions together across machines. The bad news is that this isn’t the toughest challenge either. The toughest challenge is that you’re always debugging in the live environment. You don’t get symbols, breakpoints, or the ability to step through code.
So let’s recap. Debugging services means debugging nasty timing issues across multiple machines with no stack, symbols, or breakpoints on live code. There’s only one solution—instrumentation—and lots of it, designed in from the beginning, knowing you’ll soon be debugging across live machines with no stack, symbols, or breakpoints.
They’re multiplying too rapidly!
Solving debugging brings us to the other huge challenge—deployment. Deployment needs to be completely automated and lightning fast. We’re talking file copy installation, with fast file copy. No registry, no custom actions, and no manual anything.
Why does deployment need to be so fast and simple? Two reasons:
§ You’re installing onto hundreds or thousands of machines worldwide while they are live. Installation must work and work fast with zero human intervention ever. The slightest bit of complexity will cause failures. Remember, five minutes times 1,000 machines equals three-and-a-half days. It had better just work.
§ The number of servers needs to grow and shrink dynamically based on load. Otherwise, you are wasting hardware, power, cooling, and bandwidth in order to meet the highest demand. Because your scale depends on load, it can change any time. When it changes, you need to build out more systems automatically and instantly.
The happiness around deployment is that Azure will do most of the heavy lifting for you (so let it, don’t reinvent). However, you still need to design your services to support file copy installation.
Life is so uncertain
Enough of the challenges you can predict, how about the unpredictable ones? The services landscape is in constant change. While some services are sticky because they hold your data (like Facebook or eBay), many aren’t sticky at all (like search or news). A few minutes of downtime can cost you thousands of customers. Data compromise or loss can cost you millions of customers. They’ll just switch. Our competitors will be happy to accept them. It cuts both ways so you need work hard to both welcome and keep new users.
When you update a service everyone gets the new version instantly, not over years. If there’s a bug that only one customer in a thousand experiences, then that bug will hit thousands of customers instantly (law of truly large numbers). That means you need to resolve the issue quickly or rollback. Either way, it’s a bad idea to update a service on a Friday and a good idea to have an emergency rollback button always at the ready.
Finally, it’s important to realize that services are living, changing things. You’d think that because the servers are all yours with your image and your configuration that it would be a controlled environment—and it is until you turn on the switch. Once the server goes live, it changes. The memory usage changes, the data and layout on the disks change, the network traffic changes, and the load on the system changes. Services are like rivers not rocks. You can’t ship and forget services. They need constant attention. To make your life easier, bake resilience in by automating the five Rs—retry, restart, reboot, reimage, and replace (though replace may require human hands at some point).
The happiness that comes with these heartaches are customers willing to switch; an ideal idea testing platform because you can show customers different ideas and see which they prefer on a daily basis; and the ability to ship now and find the tricky intermittent Heisenbugs later (using your five Rs resilience to keep up availability).
Back to basics
There you have it. Some food for thought mixed in with the old basics of writing solid code that focuses on customers and their goals.
However, none of this is worth anything without shipping. Make shipping a priority and we all win. Sure, the quality bar has gone up, but we’re not kids selling lemonade anymore. We need to ship quality experiences regularly, on both long and short time scales. We need to ship on the Internet, on the PC, and on the phone. We need to serve our customers well and delight them into sticking with us. It’s a long journey, but it doesn’t start until we ship.