The only path to be a data scientist is to be human

This job will use everything you are

WHOA, 1000 people signed up for this newsletter. I’m stunned and never expected to get this far. Thanks to everyone for giving me your precious attention. I’ll continue writing as I have been doing all year.

I was going to write a thing about MapReduce since we’ve been on a run of reflective articles lately, but then this more interesting thing came up. MR isn’t going anywhere so it can wait.

For a brief flash in the data Twittersphere, a bunch of data science-y folk started talking about their journeys into becoming a data scientist. It appears that the trigger was this subtweet.

I tried, I really tried to find whatever series of tweets triggered that subtweet. But Twitter’s search functionality is… a nightmare to handle. To make things worse, THERE ARE SO MANY BAD TAKES OUT THERE, many guilty of just confidently declaring ways to become a data scientist. I honestly have no idea which was the actual offender. Maybe it was all of them. I also don’t want to give any of them traffic so, just search for variations of “becoming a data scientist” on the internets and bask in the overconfident rainbow of toxicity.

Enter, all the gloriously quirky data scientists

While every field, including data science, has a fair chunk of gatekeeping asshats who don’t know when to shut up, DS does have an even more vocal community of much more welcoming folk. Many of which started chiming in with the very unique routes to illustrate just how people who proscribe One True Paths are full of hot air.

I really wanted to collect and highlight all those tweets and put them on the record because it’s important to show that an interdisciplinary field like data science, necessarily has multiple entry points. We live at the lively intersection of engineering, statistics, and business. If these three fields are the true foundational pillars (and I believe they are), then everyone needs to have basic proficiency in all three. It must be the case that someone can first specialize in one, and pick up the other two.

In fact, the various specializations that we’re observing in the job market, ML researcher, ML engineer, data engineer, quant UX researcher, are just reflections of the specific balance of skills within the pillars.

Subject Matter Expertise is our strength, our paths provide that

Very often, one of the main differentiators between a good data scientist and a bad one (for a given task) is their mastery of the subject domain, or lack thereof. It helps with communicating results, applying models, avoiding pitfalls and mistakes, or managing teams and stakeholders.

All that squishy experience comes out of our lives and our lived experiences. A lot of my communication and leadership skills came out of running guilds in online MMO games and managing forums and chat rooms. Handling multi-lingual research projects is informed by my game translation work. A stint in interior design taught me lots of data collection methods. There’s always interesting parallels that can be drawn from a seemingly irrelevant aspect of your life to other things.

There are so many weird data problems out there that need diverse viewpoints to crack. We should celebrate, and encourage, people from all sorts of backgrounds to join the ranks, because we honestly need more of them.

So with all that lengthy prelude, here’s a sample pulled from when I was watching the thread. The number spiraled out of my ability to curate as it forked into multiple threads. None of these journeys are alike (though there is an amusing micro-cluster of people from political science out of NYU).

Chris Albon @chrisalbon
Want to become a data scientist? Get a political science PhD, study health effects of civil wars, join a humanitarian non-profit, join a Kenyan humanitarian non-profit, found a startup, join a Kenyan startup, join a startup all people who did
HealthCare.gov, & ta-da!

Andrew Therriault @therriaultphd

On the one hand, tweeting out confident pronouncements about the secret recipe for being a successful data scientist is great for engagement and followers. On the other hand, the idea that there's one "right" path is bullshit and you should feel bad.

My own quirky journey

As for myself, I hope my own meandering, undirected path that ultimately stumbled into data science should give some hope to people who don’t have a math degree, and haven’t taken more than 3 CS classes in their entire life. It’s more than possible.

Despite being a video-game loving nerd all life, I somehow found myself doing a double major of Philosophy and Business degree for undergrad. The business part is somewhat relevant today in that I focused on operations management (SIMPLEX and truck routing!) and decision support systems, all done in VBA at the time making that my first language I used for serious work. Thanks to that, I had to take basic accounting, biz strategy, marketing, and finance courses — knowledge that helps me talk to people from those areas today.

The Philosophy bit was also Continental Philosophy, the sort where you read a lot of books from Dead European Males. That in itself is rare because Analytic Philosophy (aka, logic) departments are more common in the US. I also had some applied math thrown in the background, but most of it came from the business side.

I somehow was under the illusion that I wanted to be an academic at the time, so I FAILED to get into multiple grad programs, including the Information Science program up at Cornell. But in my SoP I mentioned something about being interested in communication, and the chair of the IS program happened to be also the chair of the Communications department, so they offered me a spot in the MS program there, with the lure that I could reapply to IS next year.

Since that was the only acceptance I got, I went there and paid my way through with loans. Spent the time learning a ton of philosophy of science (because social sciences often feel the need to reinforce that they’re a science and not just making stuff up amongst the students) and psychology research methods, both of which probably proved more useful to my career than anything else.

I also took a 600-level CS course (the only CS course I’ve ever taken at any university) on Information Retrieval and BARELY squeaked by Pass/Fail. I never learned how to prove math well, especially not grad level probability distribution math. But my layman’s-terms work apparently was up to snuff. I also did my thesis on analyzing the speech of murderous psychopaths using statistical linguistics. Wild times. Incidentally, I also flat out failed a 500-level Econ Decision Theory class, again because I suck at proofs.

Two years of grad school convinced me that I wasn’t going to enjoy academia, so I somehow convinced an interior design consultancy to take me on as a “general problem solver”. Learned a ton about surveys, observation methods, and office design. Also coded up unholy chimeras of python+win32com+Excel/Powerpoint automation programs.

After that, a brief stint at a horrible ad-tech company, where I learned how to write performant MySQL queries against production while a red-faced CEO was screaming at me and various other people. Then did a bunch of years at Meetup learning all sorts of stuff about UX research, product work, customer support, and making snacks with the office toaster oven.

Then there were stints at Bitly learning how to write MapReduce jobs the hard way on legit Big Data systems. Then a stint at Primary learning about clothing supply chains and physical manufacturing and inventory. By then I’ve worn titles of consultant, data analyst, and data engineer.

Where’d I learn my programming skills? I learned to run a small FreeBSD server as the backup SVN server for my MS thesis. I somehow learned Python to handle random tasks. An engineer friend I talk to regularly pointed me to design patterns for Java ages ago, but it made more sense in Python. The rest I picked up on the job, and keeping things in a very simple imperative-based style.

My data skills are grounded in my old Communications/Psychology/Operations Management methodologies, the rest picked up on the job dealing with live data and thinking hard about biases.

All in the background, I used to run a Go Meetup group, translated 8 visual novel games, run 4 annual gamedev online-only conferences before going online-only was all pandemic-hip, advised a couple of failed startups and a surviving one, and wasted plenty of money on various hobbies that may or may not contribute to my overall effectiveness (anyone want some chocolate covered gemstones?).

Do I use ALL of my quirky history in my daily work? Probably not, but I do think I find ways to apply experiences from all of it to other things. Nothing is ever completely irrelevant so long as you’re willing to draw parallels.

Celebrate who you are

Life is super weird, but it all comes together into this beautiful mess. I’m sure that you, reader, have all sorts of interesting things in your background, just like everyone else who’s commented on that Twitter blip. It’s a strength, just find a way to apply it.


In other unrelated news, I’m very sad that the iconic Arecibo Observatory must be dismantled due to significant and unresolvable structural issues. You should read the link, the engineering report is fascinating.