Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
There are a gajillion posts on the internet about “doing well in data science interviews”, it being one of the hottest job titles in the past decade has definitely incentivized the content mills to churn out a flood of material over the years. I honestly have no idea to what degree those endless posts are useful.
But as I grew into my career, I got hit with a problem that I wasn’t expecting — I started getting asked to give interviews. More importantly, besides the very basic “don’t do this list of illegal stuff” trainings from HR departments, I found that there was little practical guidance or discussion around for learning how to be an actual good interviewer. I was pretty much thrown into my first interviews with candidates with a “go in and talk to this person about X, Y, Z, they’re applying to be a $Position.”
If you were lucky and in a larger organization, you probably benefitted from having some processes put around the hiring process that helped put some guard rails in place to minimize your clueless impact on the process. If you were part of a much smaller organization, at best you get to observe other people interviewing as part of a group panel or some kind of shadowing arrangement. The only real way to “get better” was to do lots of interviews, lots of self reflection, and hope to figure it out on your own.
This is probably why there are so many horrible interviews. As data science is entering its second(ish) decade, there’ll be more and more people out there finding themselves on the other side of the interview table.
So let’s have a talk a bit about interviews.
As usual, these are my personal views on the topic, I’ve deliberately scrubbed specific details from my current/previous employers from the examples. So don’t expect to get any secret interviewing tips there.
Background setup
Everyone’s experiences in this space are very unique, so I’m going to lay out my past history to provide some perspective.
I’ve been thrust into doing the occasional interview since as early as 2012ish while at various startups, but the startups I joined were never in the hyper-growth stage and thus the need was pretty rare. It was only in the past year or so have things ramped up and I’ve started averaging one interview a week. As an extreme introvert, such a pace is a bit of an energy sink, but I put in the effort since I want to help the important process of getting new colleagues.
I’ve never been a hiring manager, nor have I been part of the “hiring committee” that are supposed to provide final go/no-go decisions on candidates based on notes and information that individual interviewers like myself pass along. (It seems Google’s published a lot of info on the process at their re:work site, I don’t have the time nor energy to comment on that whole process and plenty of people have analyzed it to death already).
So I’ve a okay amount of experience at being put in front of candidates and trying to understand what skills they can bring to a job position. The goal is to draw out as much information as possible to get a sense of whether the candidate’s skills line up with the job’s hiring bars or not.
First, make sure you’ve got your objective hiring bars and grading rubrics articulated and agreed upon by everyone
Having multiple people interview one person (as is the norm in tech) to collectively decide whether they “should be hired or not” is, to put it in data science terms, a form of classification problem. The problem of getting all the interviewers to agree on whether to hire someone is very much similar to making sure all text coders agree on labels — there needs to be a coding book that lays out how to distinguish hires from no-hires and where the hiring bars are for the important skills.
Imagine we’re trying to hire a data scientist and everyone agrees they “need to be able to program”. But has anyone stopped to think what constitutes the bar for “able to program”? Does being able to work through the pseudocode, but needs to frequently consult docs/Stack Overflow to look up library/function details count? Do they have to speak PyTorch fluently? Does the candidate need to be able to write perfectly organized, compile-ready, performant code while reciting Shakespeare? At what point do you say that someone can’t code to the necessary level?
Same goes for statistical knowledge. Do they just need to know fairly basic stuff like descriptive stats and inference methods like t-tests? Or do they have to be able to rattle off how to build and analyze structural equation models? While you can’t set a checklist that covers whole fields, you can at least set the expectation that “this minimal stuff must be there” beforehand. Yes, it’s difficult. It’s worth the trouble if you hire the same position more than once.
Every position needs different levels of skill in all things, and it’s wasteful to reject perfectly capable candidates because someone on the interview panel is a hardass about their esoteric pet peeve for a skill that will never be used.
There also needs to be some leeway for learning and skill substitution. For example, is it bad if someone doesn’t know SQL if they know how to work with manipulating joinable relational data through experience with something like R’s tidyverse? Basic SQL could be learned within a day, so perhaps they might work out fine if hired.
It’s important to be clear about this stuff before you even bother interviewing for people, since these bars are going to be the primary reason anyone gets rejected.
Remember that interviews are high stress, high stakes experiences
Tech interviews are grueling marathons that can possibly take the better part of a day, people are understandably exhausted and drained by the end. Some people seem to delight in high stakes interviewing and using it as a gatekeeping mechanism. I highly disapprove of such approaches since our actual day jobs are NOTHING like an interview situation. It’s not helpful to us to crank up the stress with ridiculous interview styles.
For my part, I try to do everything I can to get candidates to relax and have a better chance to show me their best selves. Some can be a bit stressed from their previous interview session, and it barely takes any time to make make sure they have a moment to take a breath, grab a last second drink or restroom break before we start.
Similarly, I tell candidates early on what to expect so they know what they’re in for. We’ve got 45 minutes together, and I’ll save roughly 5-10 at the end for them to ask any questions they might have. It’s also super OK if they don’t have any questions, since I don’t grade based on what questions anyone may or may not ask.
Not everyone’s done a tech interview before
Five hour interview marathons, whiteboard/pair coding, hypothetical situations, discussing past experiences, portfolio reviews, research presentations, discussions about stats, are all part of “tech interview” nonsense many of us are used to in industry. But a fair number of candidates have never been through such a crazy process before. Perhaps it’s their first industry job after a life in academia, or they’re from a different industry or country, or they’re fresh out of school and haven’t read whole books on the topic. Those people might not know that it’s to their advantage to work out loud to show what they’re thinking.
So as a way to make sure everyone’s aware of what they should be doing, I often tell people up front that my job for the day is to report on any skills they demonstrate to me today, and their job is to provide me with as much fodder to write about as possible. Yes, that applies to any interview I’ve ever had anywhere, so I’m not really saying anything new, but it’s a convenient frame to remind people that they’re supposed to show off what they’re capable of in as many ways they can think of.
Take home work is probably biased
Giving candidates a “small problem” like a dataset to analyze at home seems to come into and fall out of favor every couple of years. Proponents like how it’s more “like work” in that there’s less time pressure, and you can look things up in familiar work environment. It’s often proposed as a way to get rid of coding interviews while getting a better read on “ability to do real work”.
Detractors point out that no matter how much you tell candidates “only spend 3 hours on this”, no one will ever ever listen. They’re incentivized to sink as much time into it as they can to outcompete anyone else that might be applying. So inevitably take home problems tend to bias towards people who have untold hours to work on these problems, often younger people who have no family or work obligations.
Whiteboard coding (and all its variations) are terrible, but can be made to hurt a little less
Most people agree these things unrealistic and pretty shoddy. Some people are experimenting with alternative formats like pair coding with candidates and I think it’s worth a try if you can figure out the methodology. But if you ever find yourself in a situation where you have to do one to see if someone can at least code their way out of a nested list, there are some ways to make it less crappy than it typically is.
On the rare occasions I’m supposed to check for programming skills, I usually tell people up front that I’m not a compiler and neither is the whiteboard, I do not care if they make little syntax/library call errors. So long as I can reasonably understand the intent behind the code and it looks about correct, I’m going to assume that minor silly errors like a missing bracket or misspelled function name will be caught when the candidate tests their code. I’m never evaluating software engineer candidates, so I have a significantly more leeway to let incidental things slide since I personally make those silly mistakes ALL THE TIME.
That said, I’d make a positive note if someone can write code as fluidly as if they were speaking their primary language, but those folk are extremely rare. Most people, as expected, can do competently but are far from perfection. It’s up to you and your rubric where on the spectrum meets your hiring bars.
Basing questions off the real job is often a good idea
If the job is going to be about data cleaning, asking questions or doing exercises around cleaning wonky data are great choices. While it sounds obvious, some people seem to feel the urge to be clever when making questions and come up with contrived things. Having things grounded in the role makes it much easier.
You’ll likely have “true stories” to base your question off of, so you can fill in details as needed. It also helps work around silliness about “how is coding a sorting algorithm relevant to the job? I use .sort()
.”. Finally, it also means your question is less likely to pop up in some online forum where people share interview questions.
But don’t ask people to do free work for you
In case you didn’t know — don’t be an unethical parasite.
There are stories about sketchy companies that give “design interviews” with “design this logo” problems to candidates, then they don’t hire anyone and run off with the completed designs. That’s downright evil on top of being (maybe?) illegal.
Even if you’re not out to be evil, you don’t even want the appearance of something sketchy. Even if you don’t work on $Thing and candidate talks about improving $Thing and you forget about it, if by sheer coincidence $Thing releases the feature described that looks bad.
I’m sure HR and corporate lawyers have a lot more to say about this. Just avoid the whole situation altogether.
Please test your questions out on someone first
It’s really easy to create bad interview questions. You don’t want a question that’s easily misinterpreted, nor one that tends to elicit answers you don’t find relevant. I’ve experienced some really weird questions where I just had to take a shot and guess at what they interviewer was trying elicit using clarification questions.
For practical things like programming or SQL tests, make sure other people from varied backgrounds can actually solve them in reasonable amounts of time. I was once handed a laptop with a PostgresSQL terminal opened and asked to run some basic SQL commands on it — considering I had primarily been using MySQL prior, it was a miracle I managed to pass (and it involved me being very up front about how I’m not very familiar with PostgresSQL’s interactive interface with commands like \d
).
Also, in case it needs to be said, asking near-nonsense in an attempt to elicit clarification questions is a really obnoxious, don’t do that. I do understand we sometimes get bonkers requests that require a “so WHAT are you trying to do?”, but now’s not the time.
Please. Do some basic user testing before you use them to determine the fate of some poor candidate.
Finally, you’re going to be bad the first dozen times. Practice.
Being an interviewer is nothing like being an interviewee. They have nothing in common. You’re just going to need to understand that you’ll be horrible at it for a while and keep working at it. Pay attention to how other people conduct theirs, practice on colleagues, and just be humble about the whole thing. You are, in no small part, doing something that affects other people’s lives. Treat the job, and the candidates, with the respect they deserves.
Things to share w/ the data community
Jenna Jordan is running a book club! This time they’re reading “Data & Reality” by Bill Kent. Join in on the fun. You can discuss on Twitter and also on discord.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.
Curated archive of evergreen posts can be found at randyau.com.
Join the Approaching Significance Discord, where data folk hang out and can talk a bit about data, and a bit about everything else.
All photos/drawings used are taken/created by Randy unless otherwise noted.
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!