I very often say that I don’t use much statistics in my job (research for software product development), with the most difficult thing being the occasional use of a t-test, a sample size calculator, and calculating confidence intervals. From that alone, it would appear that anyone who takes a single “Research methods and Stats for grad students” class would cover everything you need to do basic product development work.
But… is it?
Aside: Yes, there are other data science jobs, that definitely require more stats up front than what I’m looking at here. I’m ignoring those today. We’ll be talking bout entry-level positions working on a specific class of fairly simple (but highly in-demand) problems in the product development space. These are not production model-building roles.
In terms of “actual statistics”, the list seems really short and easy. But the more I thought about it, the more I realize that it’s a very incomplete picture. Because while I’m rarely called upon to use “advanced” statistical tools very often, I’m constantly being asked to use what I’m calling “statistical decision-making” skills to make decisions about all sorts of things. I’m not sure what a good term for this class of questions is, so I’m just calling it that for the purposes of this post.
So what are you calling “statistical decision-making” here?
While using stats to make inferences isn’t what I do often, I realized that I use stats in a ton decision-making situations that involve setting up data collection so that it’s easy (or merely possible) to analyze later down the line. Analysis is not where the hard work should be.
What’s setting up data mean to me? It’s a whole range of interesting and familiar questions. You might be familiar with some of them.
Sample size related stuff — There’s no end to the variations of these.
How many survey recipients/participants/pageviews do we need to get statistically significant results? Heck if I know. How much can you afford to give me?
Ways to pick different, better, bigger samples to answer similar questions — instead of looking at focused group A, we can look at relevant group B that’s bigger
How can we show/use/interpret this stat from this small sample sized observation? You don’t? But if you realllllllly need to here’s some minimally-hacky ways…
Making effective use of small samples when that’s all you have — by leveraging confidence intervals, etc
Creating fair experiments — There’s so many things that can go wrong, such as…
Making sure “random sampling” is actually random, verifying that
Identifying ways bias can sneak into a sample, e.g. self-selection and friends
Knowing what to do when things aren’t random — either in interpreting, reporting, or finding adjustments — maximum handwaving
Knowing when to call experiments as over, and when to resist that urge — because peeking by all sorts of people IS going to happen
Knowing to avoid doing multiple comparisons like t-tests without correcting — and more importantly, stopping others from doing that
Balancing business and practical user needs w/ experiment design — no we can’t wait 9 months to collect data. Yes, we’re willing to be wrong.
Making statements about generalizability and validity
How representative IS that sample anyways? —Ahaha
Did we REALLY control for everything besides the treatment? — Ahahahaha
As you can see, a lot of these do require a certain amount of experience with statistics, but it tends to be of a specific quality. It’s about understanding a small set of methods well enough that you know what effect various parameter changes have on the ultimate end result. How would adding a few hundred, thousand observations change things? Does adding one set of users bias the whole experiment? etc.
The models themselves are the same basic things that are taught to every 1st year grad student in social science — everyone has access to these bottom shelf hammers and nails.
Being good with just hammers and nails
What I feel separates an experienced data practitioner from a fresh data person out of school is the level of mastery in wielding those hammers and nails on the fly. Where can we trade time for statistical power, how can we shift uncertainty, where are the rules bendable (and to what extent) and where are they absolutely required? How are those concerns balanced against practical business concerns?
Being able to navigate the complexities of setting up data collection quickly, in the face of uncertainty and inevitable bad incoming data, is hard. It takes practice and lots of seeing things go wrong and being forced to come up with contingency plans to save the day.
Oftentimes, it’s not about the math, or the data, but convincing other people to do certain things — primarily accepting less “certainty” (which never existed anyway) in exchange for having some useful knowledge. Sometimes, doing ugly post-hoc analysis is technically bad science, but sometimes it’s all the information we have to use to make any decisions at all. Other times, we just have to cut losses and declare that it’s not possible to understand a thing of interest today.
So it appears to me that it’s this squishy hard-to-pin-down skill that defines the minimum statistical bar. Not having this sort of comfort level and flexibility with the tools and methods has a ton of bad knock-on effects down the line. Botched experiments (which are inevitable) can’t be salvaged. Decisions can’t be made through the constant fog of uncertainty. It all builds up to lost TIME, which is the most expensive resource for any person or organization.
So how do you learn these skills?
I’m not sure. I think most of us learn through through direct experience. An experiment setup fails and suddenly we must find SOME results out of a pile of crappy data. Maybe we struggle for a week to do a pre/post analysis because that’s all that’s available. Maybe we cry and throw out most of the bad data, making do with a tiny usable slice.
Each horrible situation forces you to be creative at problem solving in a situation where it’s not even obvious a solution exists. But you soldier on because it’s your job. The pain from those experiences would teach you things to avoid, as well as give you confidence in being able to find some hacky method that may work in the future.
But, what if it’s not your job?
Give yourself challenges and don’t allow yourself to give up. Like hey, the lunar eclipse prediction challenge is still unsolved. Go at it! I’ll be returning to it myself once life calms down a bit.
Here’s some background reading I used while looking at what things were commonly recommended for statistical literacy for researcher-types. Mostly a couple of related intro stats course syllabi, plus a timely list of analysis design patterns that are somewhat more advanced than “the absolute minimum” but are very important backup tools in the toolbox.
Statistical reasoning OLI course syllabus
Causal and statistical reasoning, OLI
https://emilyriederer.netlify.app/post/causal-design-patterns/
About this newsletter
I’m Randy Au, currently a quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. The Counting Stuff newsletter is a weekly data/tech blog about the less-than-sexy aspects about data science, UX research and tech. With occasional excursions into other fun topics.
Comments and questions are always welcome, they often give me inspiration for new posts. Tweet me. Always feel free to share these free newsletter posts with others.
All photos/drawings used are taken/created by Randy unless otherwise noted.