Time for another (mostly) bi-weekly paid subscriber post, where thoughts are a more in-the-moment because they’re mostly thorny or otherwise something I haven’t worked out yet but is bouncing around in my head and needs an outlet.
Today, I was reading Jesse’s latest newsletter post, which is a delightful mishmash of film, animation, data science, and other thoughts. But one part in particular stuck out at me, quoted below.
Learning to animate — to animate 3D characters specifically — continues to be orthogonal to my experiences in learning to do data science and machine learning.
In animation, the process is invisible. We focus solely on the final product - the animation that’s been created - even as students. There’s nowhere to go to look at someone else’s graph editor for a specific shot, let alone copy it into your own environment and tinker around with it to pull it apart and understand it better. I can only know another artist through their product, and by reverse-engineering what they’ve done with what I know how to do.
In data science, the process is part of the product. Think back to your experiences learning to code in R or Python, where it’s considered best practice to put your code up on GitHub either for the world to see, or at least other individuals within an organization to investigate. There is no point in coding where someone says “ah yes, you know enough now that you no longer need to share your code with anyone - all we care about is whether or not it works.”
It made me realize something very interesting. Of all the data-related fields I can think of, primarily the flavors of data/BI analysts, various flavors of quantitative-focused researchers (including academics), and the different flavors of data scientists (including Data Eng, ML Eng), data science stands out as having a particularly open teaching culture. I feel like we’re more the exception.
What I mean by this is that, as I wrote about a couple weeks back about “how do we actually pull stories out of data”, I’m honestly not sure if data analysis is “taught” in any intentional way. I only see it as a side effect of learning something else like methods or projects. The expectation is that you have an base innate ability to spot patterns and connect stories, and you hone it from lots of exposure while learning how to use various methods.
Similarly, the academic model of “learning to do research” is very much an apprenticeship model, much like apparently learning to do 3D animation. It’s a craft that is best learned by watching (and being advised by) someone who knows how to do research. That lead researcher sets the example and imparts a list of skills that goes so deep, it takes years.
My as-yet half-baked hypothesis for why data science has a much more open culture, even from the start when there wasn’t a financial incentive to churn out beginner-friendly material, is because we inherited a lot open source culture. Since early data science required a LOT of new tooling to do work, people built their own tools and then released it as open source. In many other industries, these sorts of tools would be considered competitive advantages, trade secrets, or products to monetize, but that’s not what happened.
Then, if you release a tool as open source, there’s a natural tendency to write blog posts and tutorials, or do conference talks about the new tool. That provides an easily accessible topic for many practitioners to gather and discuss. Books can easily be written about tooling topics. The list goes on, fueled by the desire to share our code and methods to others while learning similar things from those same other people.
If not for the culture of open source and sharing experiences, data science might not have exploded into the hugely popular job title that it has become. Things could have easily become this niche job function that focused on making stuff tat would be considered “trade secrets” or abstract “research”.
But there are also downsides
Since it’s so easy to talk about tools and methods, data science easily fit into the tech/engineering community and the practice grew. Data scientists continue to talk endlessly about tools while other topics like management, ethics, and process get significantly less attention. If we don’t really discuss these topics, then everyone who has to use those skills, like managers, are left to fumble around on their own.
This also has the side effect that people who are trying to learn data science are presented with a lopsided view of the world where tools take on a greater importance. I’ve seen plenty of resumes of people listing an endless number of tools, many of which aren’t particularly important for any specific job because every organization already knows they have their own unique way of doing things.
I don’t fully see what the implications of this cultural quirk of history is.
The one use I get out of it is take hints from software engineering culture as a rough roadmap of what to expect, since data science pulled so many cultural foundations from SWE culture. This leads me to expect that the relative dearth of discussion about important non-tool topics will increase over time as we get more and more data scientists into management and strategic positions.
But what about academia?
While I did caricaturize academia as being an apprenticeship model for learning to do research, I can’t ignore that giving conference talks and sharing work in journals run by peer researchers as being very much like the tech conference circuit. Which makes sense. I’m willing to bet that tech conferences trace their roots to academic conferences since many early data scientists held PhDs and thus would have attended a mix of academic as well as industry events.
I’m not sure if this knowledge is useful to know
But whatever the long term implications, I’m actually very grateful about how we wound up with a welcoming culture that interested folk from all sorts of backgrounds can pick up and learn from. While it’s a bit boring for people like myself who have seen a dozen too many beginner tutorials in our lives already, the volume of content shows just how much interest there is in making sure the discipline continues to grow.