Showing value as a support data scientist

A past experience dump

Aug 25, 2020

Perf season is upon the Googleplex right now, meaning a lot of time and energy is being spent to write hundreds, possibly thousands, of words towards documenting our work for performance review and promotion. I’m not about to waste your time writing about whether the whole involved perf process is good, bad, whatever, because plenty of articles talk about those topics if you choose to search for them.

On average I think the process uses a huge amount of people-hours and has plenty of flaws and biases, but then, all the other review processes I’ve ever experienced has tons of issues too. At least the conversations I have with my manager surrounding the process are valuable to have.

So instead of dwelling on that, I wanted to reflect upon a question that pops up fairly often, how do data scientists show, to management and others, the value and impact on the business they’re creating?

Background

Since views about this topic depend a lot on where you’re sitting in the organization, executive, manager, senior DS, junior DS, etc. I’ll first explain where I’m speaking from.

For my entire career, I’ve always been a individual contributor, the “only data person” in many instances across many startups. Even now I think I’m the only quant UXR in Cloud that’s directly reporting to a product focused UX org instead of being within a group of quant researchers shared across Cloud.

This means that I’m not speaking from direct experience of being a manager.

However! Due to being the only data person for so long, I’ve had to work very closely with my managers to understand how to showcase data science work. So despite it not being one of my responsibilities, I’ve still had to put some thought into what teams have to do to show their value.

The problem of showing value in DS

Very broadly speaking there are two main types of data science teams: Teams that create tools and/or models for use by end users, and teams that provide a support function to internal teams.

Teams that create tools and models for use by end users usually have a pretty clear way to show how much impact their work has on a business by simple virtue that they can say “X number of people are using our things!”. It’s often even possible to tie that usage to a revenue number, either direct revenue generated, or a “our users represent X% of paying customers and $Y amount of revenue”.

These teams are essentially product development teams, just creating data science products. They have feature launches, usability tests, infrastructure requirements, etc. As such, the measurement of their productivity and value is fairly straightforward.

I’m not going to talk about these teams much.

Instead, the second class of teams, the ones that are providing a support function for other people within an organization, have significantly more problems concretely showing their value.

The core of the issue is how support DS teams usually work on things that are far away from a source of revenue, so they have to fight against the perception of being merely a cost center. These teams are usually supporting product development, or providing research and analysis to make various teams more effective, oftentimes there’s a fair amount of small, disjointed, ad-hoc data work from various corners of the company.

These teams have trouble saying “We built X!” because their actual work was helping engineering, product, and design understand and think about X, then those people put their own unique creative processes into X before end users see it. It’s very squishy and hard to be definitive about “we contributed this specific part”.

So what’s some of the things we can do to make things easier?

Groundwork: keep track of what you’ve worked on

It’s hard to show what you’ve done over a period of time if you don’t remember what you’ve done. We all know that human memory is severely flawed and biased in all sorts of ways. So the most important thing to keep an ongoing record of work what’s been done.

If your workday looks anything like mine, on top of any long term projects I have going on, there’s a constant stream of ad-hoc requests that pop up randomly throughout the week. Sometimes the requests can be finished off in 30 minutes, sometimes they take longer. Either way, these requests can fly through your attention, it’s normal to want to just handle the request and move on. Find a way to integrate some minimal form of recordkeeping into your workflow.

Some teams use formal ticketing systems to keep track. Those are great (some might even argue necessary) for larger teams and/or for people who are much more organized than me. The more a team of data scientists want to work with a layer of abstraction between them and the world to share workload, the more a ticketing system makes sense.

I’ve tried on multiple occasions to use a ticketing and I very quickly fail to form a habit around it and it rots. So as an alternative, I got into the habit of putting a date upon all my file outputs. Every analysis, every spreadsheet, every query file, I prefix it with a date (in ISO-8601 format, because I’m not insane). This way, I can always go to my file folders and easily grep out everything that I’ve done. I might miss out on a bit of context, but those files provide a timestamp reference point to search through my email for any missing context. It also helps in finding things when someone walks up and goes “remember that analysis you did for me a few months ago?” (answer: no).

Recently, thanks to the rigorous nature of Perf, I’ve also started making a somewhat-daily 1-liner journal of things I’ve done during the day/week with links to large outputs like decks. It gives a more complete context in one spot and saves me the trouble of reading a bunch of timestamped files to figure out what’s going on. To be honest, if I didn’t have to do a formal review process every 6/12 months, I wouldn’t be doing it, but it’s very useful for that purpose.

Keep tabs on what’s been influenced

It’s very common for data science to be many steps removed from final product decisions. The variations of the sentence “Provided research that influenced the people who designed and refined the concept that became the product feature” comes up pretty regularly. It’s often okay to work with that. But the problem is the “feature” you contributed to is a looooong ways from completion.

For example, I’ll often provide some quick analysis to a project manager as they’re thinking about what their next feature should be. That information bounces around in their brains along with other information, before a decision is finally made. By the time that decision has been made, my own contribution has probably been long forgotten by everyone, including myself. And people haven’t even started building out the feature yet!

Similar things can happen when you’re giving data to various other members of teams, including designers and engineers. The more often you provide foundational information, the more often this happens. You wind up being so early in the process of decision formation that your contribution gets forgotten.

One way help counteract this effect is to make sure you’re in the loop of what those decisions eventually become. Build those long-term relationships. It might take a long time, months, years even, before things come to fruition, but if you’re in the loop, there are going to be many more opportunities to contribute and affect things. If you then note those things down over time, you should be able to build a picture of how you’ve influenced things going on. You can then gather all those into a broader, formal update to give to the team in general because not everyone on the team is aware of what questions other team members are asking, and it’s a good idea to make sure everyone is on the same page on a regular basis.

The trick is finding good times to put these formal updates together. There’s usually places within the development cycle where a large decision is coming up and it’s worth everyone taking a brief pause to reorient their bearings. I’d aim for those moments where possible.

As the project becomes more concrete, you’ll be able to switch from providing broad strategic-level data and insights into more tactical things. Experiments and user testing results, surveys, metrics collection, etc. Those are all concrete things you can point to as being valuable work being done.

But what if you don’t have the time and energy to soft-join yet another project?

Then the only thing you’ll have are those formalizing documents to point to. You can say “I provided this to help the team figure out something, then they took this and worked on XYZ".

There’s a risk that what you provided winds up being completely irrelevant to what’s finally worked on, so there are plenty of instances where there’s zero payoff involved. But sometimes you get lucky and you can trace a path from your work to the final product that is clear enough to say “hey, I contributed something”.

Try to trace to $$$ anyway

Oftentimes it’s hard to directly cite a revenue number like “ran an experiment that generated a $5 million dollar revenue boost in 6 months”. How much was it worth to the company for me to give data that ultimately resulted in the removal the ability to say “Maybe” on an RSVP to a social event? No idea. But it certainly pissed off a few thousand customers.

Despite this hurdle, with some creative stretching, you should be able to give some color into exactly what you did. In my example, I had a few thousand people vocally angry at the decision. We also didn’t damage the ecosytem too badly in that by raw counts, most people who used to put “maybe” simply owned up and entered “no” instead, like we had predicted. IIRC, churn also wasn’t noticeably worse. But we believed the overall UX was improved, so it was a net positive.

Other times, you can trace back to cost savings. Headcount being one of the top expenses for any company, make a tool that saves a bunch of people a bunch of time and that usually adds up to a ton of money. If you’re aiming to make this sort of argument, make sure to collect baseline data available to show the improvement.

And IF you give this method a try and completely fail to come up with anything that sounds reasonable, because tracing it to concrete dollars involves way too many crazy leaps of math, it’s probably a sign that something weird is going on organizationally speaking. Maybe you, or the company, is losing strategic focus. Finding that out itself is going to be valuable work.

If you haven’t noticed, I like working on things where you can find some sort of win even if you fail.

Stay Visible

Review processes have tons of flaws and potential bias within them, and even the most well-intended ones that try to minimize the bias. Things like recency bias, and the halo effect are going to be in play. You don’t want to be in a situation where you’re taking credit for some contribution to a thing and everyone doesn’t even remember you were in the room.

So it means not letting your work fall into obscurity and taking the time to occasionally step up and formally present to a wider group the many quick questions individuals have been asking you. Just like when you’re maintaining relationships with a team by presenting that stuff to them, you can sometimes find opportunity to share it more broadly, especially if you have interesting insights that may be of use to other teams.

Find reasons to say no to work

There’s always more data work that needs to be done than there are data scientists to do it, so an interesting way to boost your impact is to be proactive about making attempts to say no to work.

I don’t mean say no to work arbitrarily. Instead, if you take the time to look into whether a piece of work is worth picking up to begin with, it will eventually pay off.

When evaluating whether a project is worth doing means you’d scope out the research question, how much effect it could have, how all that ranks with the current priorities of the company, etc. Just ballparking things will do. If you find that the work doesn’t seem likely to pay off, you not only save yourself time, but could potentially save the time of the requester too.

You’re not likely to have all the answers to all these questions. Sometimes it’s not clear what work to prioritize. That’s okay! It means you get to ask your manager, other managers, team leads, or even an exec. Sometimes all those more senior people will decide amongst themselves where to prioritize things. Unless you’re the manager, at which point, that’s part of your job.

That visibility alone is useful, and it makes sure you’re on the same page as everyone else. While there’s always a chance that you can run off working on something on your own and find unexpected and amazingly powerful insights, you can’t disappear into the void completely pursuing those.

Find work that has a longer shelf life

There’s tons of work that are pressing in-the-moment questions. How many people are using this feature? Who’s getting stuck in this flow? It’s important work, but there’s usually some opportunity to do work that is useful for more than one project.

Foundational research fits in this category. Something that’s true of the business or users regardless of specific features. In some rare instances, these insights can also affect the whole strategy of a company. Ideally you find these projects by taking an existing question and then modifying it slightly to look into a broader question.

Infrastructure and tooling is the second broad class of projects that have a long shelf life. Creating useful data pipelines and tools that people routinely use are a way to add clear value. The only problem here is that you’re going to wind up maintaining these things until you find a way to hand them off to someone else.

Recognize there are limits

Collecting and preparing data for performance reviews is important for moving up in your career ladder, but in my experience it doesn’t really offer too much in the way of protection from getting flung off the ladder entirely for reasons beyond your control.

Will these methods save you if large staffing cuts are coming? Probably not.

They certainly didn’t save me the two times I’ve been hit with the layoff axe over the years, both at places where I had time to build strong relationships and was generally well known for the work I did.

Being let go just meant the organization was willing to fly blind without detailed analytics insights for a period of time. That risk of making bad decisions (that can always be fixed with a patch) wasn’t worth my salary when a manager somewhere needed to hit a cut quota.

When it comes to making hard budget cuts and figuring out who’s left behind, the decision becomes cold pragmatism and retaining the ability to launch any feature at all is usually prioritized over launching good features.

When this happens, the last thing you can do is, if you’ve been keeping track of contributions you’ve been making over time, you can take those stories with you to demonstrate your potential value to a new employer.

Counting Stuff

Discussion about this post