Data science has a tool obsession
That we need to balance out
Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
Normally every-other-Thursday posts are for paid subscribers and I write about things on my mind that aren’t necessarily tightly pulled together. But this week the topic is more applicable to a wider audience so it’s free to everyone. If you like this sort of thing, consider subscribing!
Since it’s a new year, I’ve been thinking back and scanning over the past year or so of posts on this newsletter and I’ve noticed that I’ve spent a lot of time writing about process and issues surrounding how work is done. Much of that is probably a reflection my current work life, in that I have to spend a lot of time identifying flawed (or missing) processes and coming up with solutions. It luckily fits in with the overall theme we have here of “the mundane, less-than-sexy parts of data science”. I suppose it’s also very lucky that not that many people seem to write about it — free niche!
I’ve seen complaints from various people on Twtiter about how data science conversation is largely dominated by discussion about two things: tools and hiring. I’m sure you can think of any number of articles and posts from individuals and vendors, about things like interviewing, tools for the modern data stack, using a given library to do some cool thing, etc.. Tool topics are especially easy to write about because it’s relatively novel content that follows a well established template. There’s also plenty of people who are interested in tools, so there’s no end to the engagement it can bring.
Meanwhile, process discussion is often seen as a “human problem”. It’s “soft skills” stuff (I hate that term but don’t want to go off on a tangent about today). Some might even argue that topics like work prioritization, managing requests, and tracking work execution aren’t “data science problems” but instead “management problems”.
I generally agree with the sentiment that there’s TOO MUCH talk about tools, to the exclusion of almost everything else in the data science discussion. We need many more people talking about how to use our tools to do good work. We need people sharing their processes for handling situations so that good work is possible.
But at the same time, I get a sense that there’s no avoiding the fact that a majority of discussion will inevitably revolve around tooling. Beginners need to learn the fundamental tools to even start doing data science work, of which there are many to learn. Experienced practitioners will always be on the looking for new tools to solve novel problems that they’re facing. Our tools make our work possible, and we can easily fall into the trap that tools practically define our work.
Thankfully, this obsession about tools isn’t unique to data science, it seems to pop up all over the place in almost every hobby that I’ve seen. I know that in music and photography circles, there’s the phenomenon of “Gear Acquisition Syndrome” (GAS), a phenomenon where practitioners obsess over buying more equipment to improve their experience with a hobby/profession. I honestly can’t name a hobby that doesn’t have a component of looking at equipment.
It is often pointed out in forums for those interests that the point of diminishing returns for new gear is very low and most people are better off taking a class to learn how to best wield the cameras and instruments they already own instead of buying more stuff. Good composition skills can take a stunning photo with a cheap phone or disposable camera. A new waterproof camera won’t make you a better photographer, but you can’t take photos underwater without one. All the new tool did was open up a new possibility.
What’s amusing to me is that music and photo gear have giant online communities and industries dedicated to discussing, reviewing, and buying/selling tools. While everyone repeats the wisdom about avoiding GAS and finding ways to become better without buying forever more stuff, there’s a whole machine generating temptation right in plain view. A casual scan of such forums and the conversation is very much dominated, much like DS conversations are dominated, by gear talk.
I’m sure that mentally, you can see that data people do the exact same thing with our data tools as photographers talk about buying their 5th camera body and 20th lens. One more technology to add to our resume, one more DAG framework to automate our data pipelines, one more SQL dialect tutorial or Python package. Many of us probably recognize that all this tool talk isn’t really doing anything other than fueling our socializing. We get rewarded for sharing a bit of knowledge with visibility and praise.
But here’s the difference I can spot between the DS conversation and these broader hobbies — we don’t really yet have a contingent of “grumpy veteran folk” who inevitably pop up and remind people that maybe learning better process and technique is the solution to a problem instead of getting more stuff. There’s relatively few courses and tutorials around “how to avoid wasting time on ridiculous data requests” or “when to just give an estimate instead of an exact count”.
Where’s our equivalent of a “help critique my photo” forum? Where do data scientists go to ask others about how to improve (like in this reddit thread)? I don’t think we have anything even close to such a thing yet. And that’s sad.
So perhaps the gripe that we should ALL have with how we all talk about data science isn’t that we talk about our tools a lot, the tools are forever going to be a part of this profession. We should be upset that we haven’t even started to talk about the other important stuff, because those topics are much harder to write about, much harder to make space for. Volunteering critique time is probably not going to give us big internet engagement points. Sharing advice about handling messy, complicated, human interactions is really really hard. Discussing our failures and the lessons we learn from them in public is also hard.
But I think as more and more data scientists gain experience and climb up the career ladder, we’re now getting close to a point where there’s not only pent up demand for that sort of topic, but we also have people who have the experience to create meaningful content around it.
I’ll close with this paragraph from a book I found on Gear Acquisition Syndrome. (Yes, I didn’t know there was enough content and research about it to put into a >270 page book, but there is).
Since relating to and being part of music-making, GAS [Gear Acquisition Syndrome] summarises a spectrum of cultural practices. It accompanies musical learning processes, and so, exploring the creative and expressive affordances of gear should be understood as contributing to musical expertise and reflecting it. We took it for granted that GAS-affected musicians would differ significantly from collectors and other groups such as purists and crafters, which, however, proved wrong. The findings revealed only insignificant differences. Music-making is so multifaceted and varies with changing personal circumstances, ambitions and interests that it is challenging to maintain clear classifications. Interest in gear fluctuates throughout a lifetime; it will never diminish completely, at least for musicians with a respective propensity. GAS is a constant companion for many musicians and perhaps an indicator of the great importance that music-making and music-related practices, be it as a hobby or profession, holds for the individual.
— Concluding paragraph of “Gear Acquisition Syndrome” by Herbst, Jan-Peter and Menze, Jonas