Staying Sharp in Data Science
Can take a lot of unexpected paths
I finished my Normconf lightning talk over the weekend! Because I had misremembered the recording submission deadline to be Nov 15th when it was actually December 1st. Oops! But hey, it means I won’t be editing video over Thanksgiving! Look forward to it being released in early December!
But since I spent most of the weekend buried in recording and editing, I resorted to soliciting questions on Mastodon for things to write about, and Bea here asked a pretty good question. How does someone “stay sharp” in data science?
This is rather timely because Vicki Boykis had just posted a blog post earlier in the week about how she learns machine learning and keeps up with new developments.
Personally speaking, if someone were to ask me out of the blue whether I “did stuff to stay sharp as a data scientist”, my initial reaction would very likely be “no, I don’t do anything special”. If anything, I have so many things going on in life, between family and hobbies that are largely unrelated to data science, where would I find the time to do extra data science work to stay sharp?
But THAT IS A LIE born from lack of self reflection.
In reality, I probably do more things to stay up to date in the data science field than the average person. I’m certain I’m not in the top 5%, or even top 10% of people who do extracurricular stuff in data science, but the mere fact that I’ve been writing two years worth of weekly data newsletters is an obvious sign that I’m abnormal. My initial reaction stems from how I do a lot of things out of personal interest, so I don’t even realize that I’m doing what amounts to actual work. But I am doing work, and it needs to be recognized as such.
Since data science is a knowledge job, there’s always a need to keep moving with the field. But there is a nuance there. There are many paths and sub-fields within data science that you can choose to develop into. You have to have some understanding of what you want to become, and what the demands of your career might look like, before you pick up anything to study.
For example, if your goal is to become an ML Engineer someday, you’ll wind up focusing on topics in that area. If you want to climb the corporate ladder, maybe become a people manager, there’s a completely different skill branch to explore. Or perhaps infrastructure and data engineering are more your thing? Or you’d prefer to hang out more with the statisticians and handle experimental design? Or maybe you’d like to chat with academics and explore the survey research route more? Those, and many more, are totally options that you can choose to “upskill” into regardless of what your existing data science skill set is.
There’s no right answer to any of these. It’s just a matter of picking a combination of things that you find interesting (and thus, provides motivation to continue) as well as having some relevancy to your future career. It’ll probably change multiple times over the course of your life as circumstances and interests change.
Pursuing a wide-T-shaped life
As far as I can tell, all data scientists wind up being rather T-shaped, in that they have a small set of skills they’re very strong in —usually along one or more of the “pillars” of DS such as statistics, computer science, or business knowledge. They then have a very wide set of shallow skills (the thin crossbar of the T), where they have a workable amount of knowledge. Maintaining skill relevance essentially boils down to bouncing between two basic strategies: expand your breadth of skills, or increase the depth of certain skills.
Since I’m pretty much a die-hard generalist, I naturally spend a lot of focus on constantly expanding the breadth of my skills.
I enjoy hearing stories about other people’s data problems and how they might have solved them. I’ll then store that information somewhere so that if I come across a similar problem in the future, I can hopefully recall that someone else has encountered the problem and one solution looks a certain way.
This is often why I love hanging out on data twitter, or skimming through blog posts and talks of things that aren’t directly related to data but instead deep in subject areas. I’m convinced that the bits of knowledge I pick up on how TCP works or how shipping containers by boat works will eventually come in handy someday. So long as I retain enough to be able to go back and find the information later, I’m good. This aligns with my constant information foraging habits anyways and if I’m honest about it, I adopted this strategy because I can’t stop myself from reading or watching videos about esoteric things.
Nowadays, when I’m trying to dredge up inspiration for the weekly newsletter post, I often scan around to see what’s going on. It’s given me a reason to stay more abreast of “current data events” than I had earlier in my career. I think this is a net positive thing for me, but I’m not 100% certain about it.
I will emphasize one thing here that’s not immediately obvious. When I excitedly learn about methods and domain knowledge from other places, I’m usually learning the well-understood bits of that field — the fundamental stuff that most people in that field know and trust. That stuff has stood the test of time, and most importantly, is widely known by practitioners who I might interact with someday. It’s not going to go out of style anytime soon. I want to be learning the generally accepted ways that accountants record and report revenue, and not the latest international tax-evasion technique.
And finding opportunities to go increase depth
Snowballing broad topics in the hope of it becoming useful in the future works (for me anyways) over the long term, but it isn’t very practical in the near term. How do I stay sharp there?
Probably the most important thing is this — I get to work as a data scientist. I have actual work to practice on. Practitioners don’t have a problem finding new projects because work will constantly provide new projects and challenges. Some work will be repetitive and won’t require any new thinking, but there are often other projects that are going to land you somewhere outside your normal skill repertoire. These stretch projects provide opportunity to increase the depth of your skills.
Having a practical at-work problem to use a new method on (because it’s the appropriate one for the job and I’m not just forcing it) is, for me, a much more memorable and useful way to learn a method than any amount of toy examples found in tutorials or textbooks. It forces me to engage with the tools, the process, and really learn the ins and outs of the thing because I have to explain it in human terms to my stakeholders.
Stretching the skill depth can also apply even to techniques that I’m already familiar with. For example, I’ve run many retention analyses over the years, but if I have to run another one again, I can always see if I can make it better somehow for the new project. Maybe there’s some new data available that I can make better slices from, or there’s a way I can tell the story more clearly with a new visualization tool. If I can’t come up with interesting improvements in the analysis, maybe I can instead find a way to do it faster and more efficiently. Either way, using a method I’m already familiar with is also an opportunity to refine it just a little bit more.
This strategy of increasing skill depth via work project is a pretty conservative one. There’s plenty of methods out there that exist and are very useful that have no relevance to my work and thus I’ll never get a chance to use them. One example is geospatial stuff is completely unnecessary in my line of work. If I develop an interest in that area, I’ll have to just find a side project or something to scratch that itch.
On the bright side, it means that I’m not usually studying methods that I won’t ever use in practice. Considering how long it takes to learn a new method, that’s a pretty significant time savings.
Data Science doesn’t have a set curriculum
Right now I’m sure you’re thinking “Randy, you just told me to go learn whatever random topic I stumble across on the street, and then try to apply things to my work. That’s not helpful!”
But that’s sorta the problem that we’re facing. We don’t have a well defined job role, let alone a shared curriculum of skills that we all share. It’s a complete free-for-all and we have no choice but to just take advantage of whatever opportunities appear before us. It’s the classic solution to optimization problems — the greedy locally optimal solution is probably a good enough.
So yes, I am saying with a straight face, to go out and learn anything that strikes your interest that you can even tangentially relate to your work and interests and then try to apply it to your day job.
It’ll be fine.
No one has the authority nor ability to tell you you’re doing it wrong.
If you’re looking to (re)connect with Data Twitter
Please reference these crowdsourced spreadsheets and feel free to contribute to them.
A list of data hangouts - Mostly Slack and Discord servers where data folk hang out
A crowdsourced list of Mastodon accounts of Data Twitter folk - it’s a big list of accounts that people have contributed to of data folk who are now on Mastodon that you can import and auto-follow to reboot your timeline
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
New thing: I’m also considering occasionally hosting guests posts written by other people. If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord.
Support the newsletter:
This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options: