We’re raising 4 little silkworms at home as a kinda science project and it resulted in my first good photo of 2020 because of “what is outside?” I didn’t realize silk comes in various colors (ours are orange-ish and yellow-ish)
Thanks to Benjamin here for giving me this week’s topic. I mostly draw ideas on what to write from either something happening at work, something I saw floating by on Twitter (that’s NOT the goddamn plague. Wear a mask and stay safe.), or questions that readers occasionally ask me (via email, Twitter DMs, etc). Coming up with at least 52 topics a year is tricky, so I’m ALWAYS thankful.
I find this a fun question because my instant reaction was this: “Wait, has there been any?”
Part of this is likely because I have a more limited emotional swing than most, but also collectively, we data scientists seem to roll our eyes and complain about many things. This is also a personal question because my interests are very likely not your data scientist’s interests.
That said, I can think of a couple of general important things. Here’s the two guiding principles I could identify:
Things that allow creativity and learning tend to be fun
Things that involve a lot of painful drudgery tend be not fun
Potential findings that obviously “matter”
I’m reminded of this tweet. Motivation can help make dull, painful work tolerable.
“Obvious” is somewhat optional here. Sometimes, things that matter might be a bit complicated and take some explaining. So long as you can convince me that it’s going to result in important decisions being made.
Everyone loves knowing that their work will do something useful and positive. If I can run a couple of easy SQL queries, make a chart, answer your question, and make your job 50x easier? Yeah I’m going to do it.
There’s many ways to get someone interested in your question, but one very consistent way is to pose questions where the results inherently matter. If I know that answering a question could potentially lead to a business-changing decision, or a massive chunk of revenue, I’m going to be interested in at least hearing it out.
Examples include “are there ways to this existing process?” “Is anyone using this very expensive-to-maintain feature?” “Can we automate [this inefficient process]?” are potential candidates.
The more concrete, the better. Because questions also have to be balanced against the work involved in answering them.
BUT, questions needs focus
The problem with “big questions” full of potential impact but also full of ambiguity is that they can rapidly devolve into a fishing expedition.
Yes, MAYBE if we go trawling through our 5 PiB worth of logs, we might unearth a segment of user behavior that can predict if a customer is about to leave and stop paying. But maybe not.
We don’t even have a proper definition of churn yet. It’s at least a 6-month project to wrangle the data to usable form. We have no proof that we can convince people to stay even if we identified them. All of these individual pieces need to be solved, and then they all have to chain together, and finally the ultimate payoff is unknown, even if everything works as planned.
Having gone on many of these fishing trips before, I’ve learned to have a healthy skepticism for them. I’m sure many others have had a similar experience. This is often why unbounded super ambitious questions are met with a certain amount of eye-rolling. They’re very high risk, with unknown reward.
There’s a time and place for fishing trips. Sometimes it’s good to go on one during a slow quarter just to figure out where there are gaps in the data infrastructure. Sometimes we really need that fish and are willing to pay the cost of the expedition. Sometimes, our job is to actually try to hit the moonshot.
Because once in a while, despite the odds, it actually works.
Let’s understand what’s going on with [thing]
As a UX Researcher, research questions are usually interesting. I’m particularly interested in what users are doing (hence the UX in my title), but for other people it can be interesting to analyze other systems and processes.
These sorts of fundamental research can be useful in a business setting because, presumably, if we understand something well enough, we can make improvements to products and processes around that knowledge. It’s a source for inspiration and hypothesis generation.
When these fundamentals aren’t well understood, a lot of tactical questions can’t be answered. “We’re not exactly sure why users are leaving? Is it because of cost, or we’re doing something wrong, or never did it right to begin with?” “Do our users want all the options displayed here because they’re very skilled, or do they only care about defaults?”
These projects tend to also be fun because they stretch various brain muscles. I might need to collect data, I might have to employ mixed methods, or even dust off methods I haven’t used in ages (or ever).
I might even have to talk to other humans????!!!!
Plus, on top of it all, I think it’s fun to just know about stuff.
But not all the time please
Balance is good.
It’s best to remember that many data scientists stepped away from academia precisely because we like having our work lead to actual impact in the real world. If all we’re doing is fundamental research, it might get boring.
How can we measure [this concept]?
I get asked to help set metrics all the time. The problem is that many metrics are very boring to set. Yes we’re going to count users who do this action, compared to users who saw it and didn’t do the thing. Yes, we’re gonna exclude some weird subset. Yes, we’ll track it over time.
I don’t really roll my eyes at this work, because it’s important work. It needs to be done and either I do it, or someone less qualified does it. But it’s not usually fun work, because I’ve done it enough that I know that the simple counting methods are usually the best.
But being asked to step back and figure out how to measure a concept that’s pretty abstract and novel? We don’t see that every day. That could be fun.
Just like the infamous Target pregnancy model news report from 2012, (and this interesting complaint about the reporting of model from 2020), someone had to at least have asked the question “can we measure the ‘is_pregnant’ property using the data we have?”. Yes it’s a fishing expedition, but assuming the data and infrastructure was already in place, and all the modeler had to do was actually develop the model, it’s a pretty fun problem to try.
In theory such a model might be possible, so it’s a fun project with potentially large payoff (expectant mothers are one of the most sought-after marketing segments, the many thousands of dollars I’ve spent this first year w/ my kid is proof of the money on the line).
What isn’t fun: Combine these 2, 3, 5 databases together
This will never be fun. I can’t think of any situation where I’d get excited to embark on such a project.
It might be NECESSARY to do. It might even be super-duper business critical that it get done. I will grudgingly agree to do such a project if I really believe that there’s a legitimate need to do so.
But I won’t like it, and I definitely will not enjoy it.
These projects always hurt because of the intense drudgery involved. It’s why calling a project “a data warehouse” is almost a kiss of death. It’s like offering to do surgery on yourself. It might be necessary and life-saving, but you damn well want to be sure there are no easier alternatives.
You’re going to spend days, weeks, months trying to make sense of database schema that were never intended to make sense (I see you, Salesforce). You can get stuck for weeks waiting to obtain proper access credentials. You’ll write and rewrite fragile data pipelines as new rows generate new and exciting crashes. Someone rolls out a major schema change right when you’re almost done. You know there’ll be months of testing and debugging involved. You might even get a beeper to monitor the system after it goes into production (oh no).
It’s a monumental effort, and the people who carry these projects on to fruition (and their continued maintenance) deserve more respect than you’re probably giving them. There’s a reason why most of us mortals flee from the “opportunity” to do such work.
There’s probably other fun things I’ve missed
Everyone’s different after all. I obviously favor the “fun” of creative exploration and busting out of the usual routine. But I recognize that those things feel fun exactly because they’re rare. If every day is a new novel fire to fight, I’d probably yearn for some mundane routine.
So feel free to let me know examples of things I’ve missed that are exciting. Comments to posts are always open.
Thanks to Jeff Hale who recently gave Counting Stuff a call out on Data Awesome! I’m garbage at collecting and summarizing useful links to data stuff, so these resources like these are always very useful to me.