Almost the entire meeting was taken up with interpreting data. The problem was that nobody could quite agree what the data meant. Many custom reports had been created for this meeting, and the data warehouse team was in the meeting, too. The more they were asked to explain the details of each row on the spreadsheet, the more evident it became that nobody understood how those numbers had been derived.
Worse, nobody was quite sure exactly which customers had been exposed to the experiment. Different teams had been responsible for implementing different parts of it, and so different parts of the product had been updated at different times. The whole process had taken. And by now, the people who had originally conceived the experiment were in a separate division from the people who had executed it.
Listening in, I assumed this would be the end of the meeting. With no agreed-upon facts to help make the decision, I assumed nobody would have any basis for making the case for any particular action. Boy was I wrong. The meeting was just getting started. Each team simply took whatever interpretation of the data supported their position best, and started advocating. Other teams would chime in with alternate interpretation that supported their position, and so on. In the end, decisions were made – but not based on any actual data. Instead, the executive running the meeting was forced to make decisions based on the best arguments.
The funny thing to me was how much of the meeting had been spent debating the data, when in the end, the arguments that carried the day could have been made right at the start of the emeting. It was as if each advocate sensed that they were about to be ambushed; if another team had managed to bring clarity to the situation, that might have benefited them – so the rational response was to obfuscate as much as possible. What a waste.
Ironically, meetings like this had given data and experimentation a bad name inside this company. And who can blame them? The data warehousing team was producing classic waste – reports that nobody read (or understood). The project teams felt these experiments were a waste of time, since they involved building features halfway, which meant they were never quite any good. And since nobody could agree on each outcome, it seemed like “running an experiment” was just code for postponing a hard decision. Worst of all, the executive team was getting chronic headaches. Their old product prioritization meetings may have been a battle of opinions, but at least they understood what was going on. Now they first had to go through a ritual that involved complex math, reached no definite outcome, and then proceeded to have a battle of opinions anyway!
When a company gets wedged like this, the solution is often surprisingly simple. In fact, I call this class of solutions “too simple to possibly work” because the people inside the situation can’t conceive that their complex problem could have a simple solution. When I’m asked to work with companies like this as a consultant, 99% of my job is to find a way to get the team to get started with a simple – but correct – solution.
Here was my prescription for this situation. I asked the team to consider creating what I call a sandbox for experimentation. The sandbox is an area of the product where the following rules are strictly enforced:
- Any team can create a true split-test experiment that affects only the sandboxed parts of the product, however:
- One team must see the whole experiment through end-to-end.
- No experiment can run longer than a specified amount of time (usually a few weeks).
- No experiment can affect more than a specified number of customers (usually expressed as a % of total).
- Every experiment has to be evaluated based on a single standard report of 5-10 (no more) key metrics.
- Any team that creates an experiment must monitor the metrics and customer reactions (support calls, forum threads, etc) while the experiment is in-progress, and abort if something catastrophic happens.
Putting a system like this in place is relatively easy; especially for any kind of online service. I advocate starting small; usually, the parts of the product that start inside the sandbox are low-effort, high-impact aspects like pricing, initial landing pages, or registration flows. These may not sound very exciting, but because they control the product’s positioning for new customers, they often allow minor changes to have a big impact.
Over time, additional parts of the product can be added to the sandbox, until eventually it becomes routine for the company to conduct these rigorous split-tests for even very large new features. But that’s getting ahead of ourselves. The benefits of this approach are manifest immediately. Right from the beginning, the sandbox achieves three key goals simultaneously:
- It forces teams to work cross-functionally. The first few changes, like a price change, may not require a lot of engineering effort. But they require coordination across departments – engineering, marketing, customers service. Teams that work this way are more productive, as long as productivity is measured by their ability to create customer value (and not just stay busy).
- Everyone understands the results. True split-test experiments are easy to classify as successes or failures, because top-level metrics either move or they don’t. Either way, the team learns immediately whether their assumptions about how customers would behave were correct. By using the same metrics each time, the team builds literacy across the whole company about those key metrics.
- It promotes rapid iteration. When people have a chance to see a project through end-to-end, and the work is done in small batches, and has a clear verdict delivered quickly, they benefit from the power of feedback. Each time they fail to move the numbers, they have a real opportunity for introspection. And, even more importantly, to act on their findings immediately. Thus, these teams tend to converge on optimal solutions rapidly, even if they start out with really bad ideas.
Putting it all together, let me illustrate with an example from another company. This team had been working for many months in a standard agile configuration: a disciplined engineering team taking direction from a product owner who would prioritize the features they should work on. The team was adept at responding to changes in direction from the product owner, and always delivered quality code.
But there was a problem. The team rarely received any feedback about whether the features they were building actually mattered to customers. Whatever learning took place was happening by the product owner; the rest of the team was just heads-down implementing features.
This led to a tremendous amount of waste, of the worst kind: building features nobody wants. We discovered this reality when the team started working inside a sandbox like the one I described above.
When new customers would try this product, they weren’t required to register at first. They could simply come to the website and start using it. Only after they started to have some success would the system prompt them to register – and after that, start to offer them premium features to pay with. It was a slick example of lazy registration and a freemium model. The underlying assumption was that making it seamless for customers to ease into the product was optimal. In order to support that assumption, the team had written a lot of very clever code to create this “tri-mode” experience (every part of the product had to treat guests, registered users and paying users somewhat differently).
One day, the team decided to put that assumption to the test. The experiment was easy to build (although hard to decide to do): simply remove the “guest” experience, and make everyone register right at the start. To their surprise, the metrics didn’t move at all. Customers who were given the guest experience were not any more likely to register, and they were actually less likely to pay. In other words, all that tri-mode code was complete waste.
By discovering this unpleasant fact, the team had an opportunity to learn. They discovered, as is true of many freemium and lazy registration systems, that easy is not always optimal. When registration is too easy, customers can get confused about what they are registering for. (This is similar to the problem that viral loop companies have with the engagement loop: by making it too easy to join, they actually give away the positioning that allows for longer-term engagement.) More importantly, the experience led to some soul-searching. Why was a team this smart, this disciplined, and this committed to waste-free product development creating so much waste?
That’s the power of the sandbox approach.