Happy Lunar New Year to anyone out there that is celebrating it this week!
Last week, a question came in that, uh, triggered some memories.
In the software engineering field, estimating the time needed to take something is a known Hard Problem. There’s a near-infinite number of things written on the topic within a software engineering context, which I have no hope in writing a decent summary of:
My experience is that data science projects experience either a similar level of, or possibly a superset of SWE project time estimation problems. Which makes sense — there’s a software engineering aspect to much of our work to begin with, but then there’s a broader class of research and business factors that can make things even more complicated.
It’s hard to plan for insight
While most (but certainly not all) software engineering projects are aiming to do something that is known to be possible — for example, there’s rarely any doubt that it’s possible to build a phone app that can save data to a database in the cloud, no matter how complicated the intervening steps might be. The expected payoff is essentially guaranteed so long as we put the effort into getting there.
Meanwhile, when we’re looking for an insight, or a statistically significant difference, there’s no guarantee that we’ll find anything in the data even if we do everything right. At best we might have a hypothesis that we might find something useful if we use linear regression, or there might be an interesting difference between new and older users. But we must go to the data and check to see if it’s true.
So perhaps the most important thing to let people, especially managers and stakeholders, know is that data work alone does not guarantee results. Even in the best case scenarios, we can merely use all our experience and prior insight to increase the likelihood that we find something, but there’s always an element of surprise. We’re exploring the unknown in search of knowledge after all.
Exploration is also hard to plan
Since I’m often asked to work on a constantly shifting landscape of projects, exploratory data analysis is a huge part of my workflow. So when someone asks me if I can submit a research plan for my project I give them a funny look, and then this:
Get access to data
See what’s in the data
Decide what to do next from there
It can somewhat feel like I’m just being a snarky troll, but that’s the most honest answer I’m able to give. If they push for a list of what things I can do, it’s a laundry list of extremely items like “identify and fix data quality bugs”, “see if there are any interesting segments”, “explore the distribution of important dimensions”, etc. Everything depends on what’s in the data. I’m not going to promise to do work that can’t be done.
I find that it’s helpful to tell people up front that a given project is going to involve a lot of exploration, which may or may not yield results.
This can even be a problem in UX Research
One thing that surprised me when I started working as a quant UX researcher was that occasionally, UX managers who had experience working with standard (qualitative) UX researchers would ask me whether I could submit research plans like the quals typically do.
I had no idea what such research plans looked like, so I took at some examples from my colleagues, and they looked something like this:
New signup flow usability test (fictionalized)
Recruit 5 users who have never seen our product and fit [criteria] (1-2wks)
Do 1hr interviews with each (2wks)
Analyze and synthesize results (1wk)
Total research time: 4-5wks
In this kind of design, the UX researcher actually knows they can deliver a result after 5 weeks, barring no major scheduling mishaps. This is because in a usability test, finding out that no one had any issues at all would be a significant (and honestly surprising) finding. As a quantitative UX researcher, null results are generally uninteresting. If anything, it’s entirely possible we get no results at all because the data is either too sparse or broken to make any sense of.
Luckily for me, it wasn’t hard to convince my managers that a lot of quant work just doesn’t lend itself to detailed project planning. Instead we typically have to adopt a more flexible work-to-a-deadline model.
Working to a deadline by being flexible with output
Since I’m typically unable to plot forward in time to make an estimation, I have to rely on working backwards. By asking stakeholders “when do you want to make a decision on this?” I can then promise to make best effort to provide an analysis on or before that date — we might not find anything, but we’ll do our best to provide good information then and/or provide updates as we unearth new things. Instead of having the goal of squeezing some predefined amount of insight out of a dataset, we’re promising to deliver however much juice we can squeeze out by a certain day.
I can hear what you’re thinking — Randy, did you just commit to providing some kind of finding on an arbitrary date? Are you nuts? What if nothing works out? But relax, we’re taking advantage of the nature of quantitative work — we can select different methods based on how we need to balance rigor against time constraints.
First, we usually aren’t doing too much data collection, which takes up a huge part of any research project’s time budget. Next, we can constantly refer back to the amount of time we have left to make judgements about whether we can afford to use a costly but more rigorous method to search for quasi-causality, or whether we’re just going to have to do something quick, dirty, and correlational.
The hardest part about this flexible process is that it works best when you can lean heavily on domain knowledge. Domain knowledge is needed to both guide you into generating good hypotheses that have a higher chance at yielding interesting results, and it’s also needed to identify things that could be useful to share back. There are lots of little facts that you learn in the process of doing exploratory analysis, such as the various distributions of attributes, that are actually valuable things to know from a business standpoint.
Back to planning - resource management
So let’s circle back to the original question of dealing with people asking for impossible plans. The best way to handle it is education. The nature of quantitative research work just leads to more unpredictable dead ends than a lot of other work.
Instead, we need to address the why is it that people ask for plans and time projections — the organization as a whole has to understand when projects are going to ship, and know whether things are staffed appropriately. Planning is about resource management, and if we express our work process in those terms, they’re more likely to be accepted as-is.
The “work to a deadline” way takes care of the shipping date part pretty clearly. We’ll get you what you want when you want it, maybe even sooner. Easy sell.
Handling the staffing question is a bit trickier. If you’re adopting a flexible methodology that’s trading rigor for time, you can actually put a single quant researcher on lots more projects — they’d just cut back on rigor and do more dirty hacks.
Rampant corner cutting might be fine for certain situations, but there are projects out there where cutting corners is a terrible idea (e.g. things that involve life and death, etc.). So it’s very important for the quant to be very cognizant of the tradeoffs they’re making. If something needs to have more time to do correctly, it’s our job to make it clear and fight for it.
No one else can do that job for us, especially managers who don’t have the background to recognize the problem.
Stuff data people shared
As promised, here’s some stuff that other data folk have shared with me. I’m still getting a feel for how I’d like to share stuff going forward so expect things to evolve. If you’d like to share something w/ the broader data community or just want private feedback, email or DM me on Twitter. I won’t blindly publish everything, but will make an effort to review everything that comes through.
Susan shared a blog post about their experience moving from being an entry level engineer to a principal, explaining the concept of leverage and how a more senior engineer gets their broader impact from applying leverage to affect more things in the same amount of time.
Arpit wanted to share something they were building, a kinda StackOverflow for data questions named astorik. From the looks of it, it’s still in its very early stages, but there’s some conversations going on already.
About this newsletter
I’m Randy Au, currently a Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly data/tech newsletter about the less-than-sexy aspects about data science, UX research and tech. With excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise noted.
Curated archive of evergreen posts can be found at randyau.com
Standing offer: If you created something and would like me to share it w/ the data community, my mailbox and Twitter DMs are open.
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s supported me!!! <3
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!
I was a product manager with a team of 5 product specialists and about 15 developers, during a transition to Agile. After we got used to it, we loved Agile, but the biggest headache for me was trying to run interference with upper management who didn't understand it and wanted detailed Gannt charts. You data plan strikes me as being quite agile. Do small bits of work, if successful do more small bits. If it isn't working pull the plug and move on. The key thing for management to understand is that there is always a risk that what they want cannot be done in a reasonable time/budget, and that agile lets you reach that understanding (will it work?) fairly quickly and cheaply. It is largely a risk reduction strategy.