An ominously top-lit cloud from a vacation years ago
This week’s topic comes from John, though he had suggested three different ideas, I’ll just deal with uncertainty aspect this week. Also, some people have send in some questions via DM, and I’m working on those in the coming weeks. As always, I’m always happy to take questions and article suggestions, via email, Twitter DMs, etc.
Uncertainty is a constant in industry work, and especially product work. There’s always a risk of something bad happening when we make a product change. As data scientists, we’re generally expected to be able to understand the basics of quantifying and explaining a concept of “risk” to people, even though risk management/analysis isn’t in any job description that I’ve ever seen.
The majority of us aren’t trained as actuaries (seriously, the actuarial P exam SYLLABUS already makes my head spin) and so we’re nowhere near being masters of probability, but at the minimum we have the ability to do measure and quantify the situation enough to be able to understand and work with risk in the limited situations that exist within our domains.
So when we’re placed on a product team that is a mix of engineers, managers, designers, writers, etc, we’re the only “quant person” in the room. Our job is to help our teams understand and work with the inherent risk in the work that they do. We’re not the perfect person for the job, it’s unlikely that there’s a better person available to do a better job.
Helping people handle risk doesn’t mean predict the future
Let’s be honest, if any of us could see into the future significantly better than chance for novel events, we’d quit our jobs and just use our ability to play the stock markets or something. Nothing in the data science toolbox can predict the future to that extent.
Yes, there’s a whole predictive analytics field out there. Most of that stuff is essentially variations on the “past performance -hopefully does- predict future performance” assumption. Product work almost by definition works outside of that.
Instead we’re trying to help teams who are struggling (and possibly even afraid) to make decisions because there is a worry that they will make the wrong choices that will cause damage in some way. There’s risk there, and they want help to understand what is at stake so that they can move with more confidence.
Data, and us by extension, can help with this problem!
For most situations, risk = probability * consequence
Risk analysis is a mix of science and art. In the context of property/economic risk the most common methodology falls under the general concept of risk = probability * consequence. Essentially, read like a statistics formula, risk is the expected value of loss. This shouldn’t sound particularly controversial.
Other fields have other ways of conceptualizing risk that don’t follow that model. For example, biological risk models are a completely different family of beasts. If you’re in a specialized field that has a history of working with risk, figure out what THAT field uses in the relevant literature, I can’t help you there.
Coming down to practical implementation, we’re balancing two different parameters: the probability of an event, and the expected outcome of the event. It’s easy to see why this general framework is useful, but can also become subjective very quickly.
For example, extremely rare events with little to no prior data makes estimating the probability of event occurrence hard to estimate. We’re gonna launch a brand new 4d(!?) product picture thing on home page, what are the chances it blinds all users through sheer ugliness and they’re unable to check out? 8.4%? (I did the design, so it must be pretty bad.)
Similarly, predicting the impact of an event is very complicated. Just try to calculate the value lost if your apartment burnt to the ground, or even just your car. What counts as a loss here? The value of the item itself?? What about anything inside? How do you handle depreciation? What about lost wages and time? The values are very rough estimates and different people can model the same concept very differently.
While the general population probably hasn’t thought much about the topic of how to analyze risk, they do intuitively understand the concept that risk is a balance between probability and “how bad the outcome is”. So it’s not an impossible goal to help them understand the situation.
The first goal is communicating the existence of risk to the team
Product work is inherently risky. It involves doing something new, and there’s very little data (if any) to actually model out probabilities. At best, people can leverage the power of previous experiences and “gut feel” to have a mere sense of the likelihood of various scenarios.
What’s our plan then? We don’t have the data and tools to tell people what the probabilities are of winning/losing when they sit at the casino table for a new and untested game (which describes this situation fairly well). But we can tell them the value of the chips being put down on the game.
The field of probability and statistics is full of unintuitive concepts that shows how humans have inherent blind spots in how we process probability problems. But despite this, humans deal with risk and probabilities enough that have a sense of “big/medium/small” that we use to evaluate probabilities. That intuitive sense is biased in all sorts of ways, so you have to avoid forcing people to reason with probabilities without due care. But it’s a useful starting point because we often don’t HAVE a good estimate of the probability besides “big/medium/small” anyway.
We can make sure that the team is aware that a move is potentially unacceptably risky (or conversely, relatively low risk) in terms of cost. It’s important to know that a proposed change will affect 1%, 10%, 100% of the user population. What’s potential the blast radius if it explodes? It’s important to know that if a change goes horribly wrong, what can happen. Will people die? Will they get lost in the interface? Or will they just be mildly inconvenienced for a couple of moments.
This is where we can break out our quant skills and count all the things. Count who’s affected. Count revenue risked. Heck, count the number of users who are likely to be around during the expected launch window if you need to. You can definitely do this part.
Once communicated, help the team be intentional about their risk
Once risks are known, we don’t have to accept them at face value. Teams can work to mitigate risks. It costs time and resources to do so, but once the risk is on the table, it’s an intentional choice to address it or not.
In my opinion, the fact that teams are up front and aware that they are accepting certain risks and not others is the most important part of the exercise. It’s not worth mitigating all risks, but we shouldn’t roll the dice blind. Having an honest discussion about what’s acceptable is very useful to the team and also for executives up above.
Help with the risk mitigation
There are tons of different strategies that teams can use to help mitigate risks. While not an exhaustive list, here’s a bunch of very common ones:
Use research to test out ideas and eliminate problems before broader release
Make sure potential changes are tested by all the important segments of users before moving forward
Having staggered, ramping launches to make sure nothing has been overlooked, roll back if necessary
Instrument things so that the system can be monitored as things go out
Continue monitoring important metrics for a period of time after 100% launch, and revisit to make sure everything is still working as intended
Have a plan for if things go wrong (rollback, etc)
Change the project scope
Abort the whole project and work on something else
Most of these methods mitigate risks by lowering the probability of something bad happening. Every issue you fix before launch means one less issue that can bite you, which lowers the overall risk profile. Obviously we can always be surprised, but the more work we put in up front, the less likely that becomes.
When teams have access to these tools, and have used them to mitigate the most worrisome risks, they’re going to naturally have more confidence in their work.
As a quant, and a researcher, you can help with most of these things. This is why it’s so important for us work closely with teams to help them understand and overcome risks. In fact, since we rely on data to tell stories, we are even more equipped to monitor and help make decisions as more data gets collected.
There’s very few other people with the skills and data access that can help in this way.
Silkworm Updates: If you don’t like fuzzy little insects, bail out now.
For those who aren’t following my constantly shifting projects on Twitter, the background to all this is a friend was raising some silkworms as a school project for their kids and gave us 4 tiny ones a month ago. The 1yo at home doesn’t understand things enough to appreciate this, so I’ve been having the most fun with them with my camera gear. Since we have to go out of our way to harvest muberry leaves from a nearby park to feed them, I needed to get something out of the bargain.
They just became fully developed moths over the weekend, and are quite fuzzy, cute, and unable to fly. They’ll flap their wings however. The male one especially likes to flap rapidly for some reason.