We should treat data science as a craft

And stop obsessing on surface technical skills

This week’s post comes from a bit of a Twitter discussion on “soft skills”. More specifically, there was a brief discussion about how a person who is strong in the “soft” stuff can become good hires, (in this context it covers everything that’s not explicitly technical knowledge, like SQL or a specific programming language). From there that reminded me of a line of thinking I’ve held for many years but never really put energy into thinking through and articulating clearly.

In the modern discussion of data science, data analysis, and quantative UX research, everything feels like a checklist. Know these programming languages, learn SQL, learn these methods, make your github, do this kaggle, check this off, check that off, and then boom! you too can become a data scientist! But what do you do once you become a data scientist, aside from basking in that sweet, sweet salary?

Hopefully people realize that just getting a job title doesn’t mean that’s the end of the journey. This is the part where I want to declare that we need to treat our discipline as a craft.

The word craft in modern usage is primarily associated with the hands and making of physical objects. People who work in the trades can make stuff and display craftsmanship. So I’m sure that without any context, the most common reaction to this is confusion. Data science isn’t manual labor by any stretch, and we already have plenty of jokes about artisanal bespoke charts and powerpoint slides. But hear me out.

My usage of the term is rooted in my old philosophy degree, where there’s a very long thread starting with the ancient Greek philosophers such as Socrates and Plato. For an extended look at how the terms and concepts evolved across time, check out “Episteme and Techne” on the Stanford Encyclopedia of Philosphy.

Depending on which philsopher and which text you’re working with, the concept of “techne” which is often translated as “art” or “craft” points to practical knowledge and craft. It’s the root from where we get the word “technology” from. It’s important to note here that techne encompasses a ton of skills and knowledge, with geometry, farming, music, carpentry, rhetoric, and medicine all playing a part.

Also, techne usually (because different people used different terms in different ways) stands in contrast to “episteme” (from which we get “epistemology”), which is often translated as “knowledge”. In more modern terms, for certain readings, you can sorta think of it as the difference between theory and practice.

Craft is a complicated web of skills

What’s it mean to be a good woodworker? It’s an endless list of skills about how wood works, how to use certain tools, how to design and put pieces together. For a writer it’s skill with the use of language, research, storytelling, planning. To become good at the craft, you need to build up all those interrelated skills to the point where they function as a distinguishable whole. The web is so intertangled that there’s rarely an obvious starting point other than “do a simple version of the thing”.

So what about data scientists?

During the discussion about soft skills, one thing I noticed was that the divide between “technical” and “everything else” wasn’t really useful. There’s a bunch of soft skills that everyone needs, for example good communication and organization skills. Such universally valued skills are useful no matter what job you have. But there are things that aren’t as universal. Here’s some super broad buckets of things I can think of:

  • Statistical and mathematical reasoning — the ability to reason and know the effects of using math on certain problem spaces

  • Methods — the more general patterns of doing things, like setting up a linear regression, designing a data schema, a specific experimental design for collecting data, etc

  • Technical skills — all the technical details that support programming, SQL, data pipelines

  • Specific domain knowledge

While I made the categories overbroad to save on space, I’m pretty sure that I make use of multiple items under all four categories every day, often within the same project. Someone who isn’t doing data science is unlikely to use all of them in conjunction either, it’s a web of skills that seems relatively unique to what we do.

Craft is ultimately about practice

The reason why techne stuck with me for almost two decades is because there’s still ostensibly intellectual endeavors that we still use the word ‘craft’ for, like writing and translation. What’s important about techne and craft here is that they’re things that can be learned (as in, they’re not innate). Mastery of the craft only comes from practical experience. The only way you can get better at writing, translating, woodworking, or doing data analysis, is to actually do it. No amount of reading about data analysis can fully prepare you to do data analysis.

The reason practice must be put in to data analysis to improve is because there are too many little details that need to be juggled to effectively teach it. Just imagine explaining everything that needs to go into collecting a good data set, all the considerations about sampling biases, question biases, how metrics are decided and operationalized. And much of that work would only apply to a single specific situation. You could practically write a whole book about one little project if you were to include everything. In fact, even though I tend to get into lots of sordid details, not even I go to that level of detail.

Sometimes when I interview people and we wind up discussing A/B tests, it becomes obvious very quickly who has and hasn’t actually run one before. The experienced people inevitably start discussing how they’d deal with various issues that could come up. Or they’d at least make design choices that are clearly “extra work” but are necessary to handle specific edge cases. The voice of experience speaks very loudly in contrast to people who have never run one before but has studied the topic. That’s not to say studying and knowing the ‘book version’ is bad, it’s definitely better than nothing, but you can definitely see the knowledge gap.

So get lots of practice in, deliberate, mindful practice, where you think critically about what you’ve done right and wrong every step.

Craft is also about community

Over the course of history, practitioners of crafts group together, whether in actual guilds, unions, and professional societies, or more informally like writing circles. Putting aside the socio-economic reasons for gathering together, one other function of such groups is to share and spread advice, feedback, and experiences amongst the members. Because crafts take practice, and practice is significantly easier when there is outside feedback available.

Community is something that the data science community does a pretty good job of as a whole, just see the Rstats community as an example. We all reap benefits when we collectively welcome in new members and share knowledge to raise the quality of everyone’s work. Every tutorial, blog post, video, public talk, Stack Overflow answer or funny data tweet contributes to our community.

Like snowflakes, those contributions are small but collectively pile up into something huge and I’d like to see it accelerate. I want to see more people practicing in public, soliciting and offering constructive feedback in public, and more people standing up and sharing intermediate and advanced knowledge with everyone else. The internet lets us scale what used to be a very local and small phenomenon of sharing our practice to a group of peers and getting feedback and new ideas, and share it out globally.

Finally, craft has no end

Data science is already too wide a field for any one person to completely master — its why we see some people arguing that the title be split into subfields such as ML Engineer. But data science is also too deep. Right after you learn about experimental testing, suddenly you learn there are lots of experimental designs. When you learn about those, suddenly you come across all the whole field of causal inference and the challenges there. The depth goes on forever in any direction you choose to put effort in.

All we can do is keep practicing, keep getting better, and always know that we’ll never be done or bored. I find that to be super hopeful.


About this newsletter

I’m Randy Au, currently a Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. The Counting Stuff newsletter is a weekly data/tech blog about the less-than-sexy aspects about data science, UX research and tech. With occasional excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise noted.

Supporting this newsletter:

This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options: