Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
Happy Pi Day!
I was reading Ergest's post this week about how data science/analyst positions can be taken over by a lot of “glue work" (the often under-valued work of that’s needed for success like documentation, cross-functional aligning, meetings, project management, etc.). One symptom of it often manifests in the form of making endless operational dashboards for other teams, wrangling spreadsheets, trying to merge data from disparate teams and broken systems.
Note: there’s a broader discussion about how glue work disproportionally falls upon women and hampers their success in the workplace. That’s not quite what I’m discussing here. While I’m borrowing the term here today, you could also possibly mentally replace the term with “low-impact work” or “bullshit work”.
The topic of glue work in relation to data work has popped up occasionally over the years. It summons the now-familiar story of how someone joins some company as a data scientist, expecting to be working on AI, modeling, generating insights, doing algorithmic wizardry, etc. but finding that 99% of their work life is fighting incompatible data systems, cleaning data, endless dashboards, and spreadsheets. Maybe on a good day they get to put stop making pie charts and run a linear regression.
There’s plenty of reasons why disillusionment happens. Companies who don’t have existing data teams have no idea whether the underlying infrastructure even supports using data at all, let alone at scale. If there’s no data-savvy person doing the hiring and managing, the role is vague, undefined, and unrealistic. Anyone hired for such a position will have a manager who has no idea how to value or evaluate the work.
On the worker’s side, there’s also unrealistic expectations — data cleaning, wrangling, and infrastructure building is often a huge part of the work while some people “just want to do modeling”. Many school programs and courses don’t help because there’s an outsized emphasis on modeling and tools. Models are almost the easy part compared to all the work involved in getting everything ready to shove into a model.
Unhappiness springs from the many gaps in expectations. Obviously that’s a huge expanse of things that can go wrong and contribute to disillusionment. But to lump it all into “gap of expectations” largely shows that I haven’t really put enough energy into thinking deeper about what kinds of things can go wrong.
What caught my interest in Ergest’s post was how it draws a distinction between work that is done for improving business processes, such as figuring out the best shipping strategy, and work that needs to be done to operationalize new processes, like building a dashboard to pull data from different silos so that teams can execute some initiative. Both of them require skills that data scientists have — data skills and programming. The problem is that people do not derive the same amount of satisfaction and meaning from both.
Even if dividing the world into creating/modifying versus executing process doesn’t cover all the possible causes for trouble, asking data scientists to effectively be bridge functions to help other teams accomplish their own new processes oughta be a pretty hefty chunk.
Execution is a different beast
In my opinion, a lot of the “sexy” parts of data science derives from the word “discover”. Many of us pull new and interesting insights out of masses of data that just might, if we’re lucky, change the entire business. A few find new patterns and apply them at scale and make magic happen. The value is relatively straightforward, and at the least you can see whether people are using your information to make decisions or not.
Execution is far less exciting, doubly so when it’s execution of someone else’s project that we don’t own and might not get much credit for. Once a new process is created, workers will need data tools to get their job done. Teams need dashboards to keep track of their key metrics. Disparate data systems have to be merged together and no engineer can make sense of it. Someone’s got to track issues, bugs, and project timelines for doing this work. All of the work covers extremely well-worn ground — there’s very low novelty factor involved. The appeal of this sort of work is much lower.
So the argument in the posts I’ve mentioned above is that there is room for a sort of “Data Project Manager” who does find all this execution stuff interesting, and let them handle it so that all the data scientists can go back to doing what they love, which is not this. On smaller teams where there isn’t room for such a niche role, someone is is going to have to step up and wear this hat, even if it’s not the data scientist.
One critter’s unwanted space is another’s ecological niche
While I often joke that my work is all about “counting stuff”, the other joke I often say that my job is to “make other people smarter”. This often meant that I’d be going between multiple teams acting as liaison and translator, helping get teams what they needed to do their jobs better. In addition to the weird little one-off projects, there’s a bit of project planning and engineering knowledge involved to recognize “hey we need some staffing resources to solve this” and get the all rolling.
So you could say that I’ve settled into a world where glue-like tasks are a significant component to my contributions. Execution isn’t interesting in the same way that discovering things is interesting, but it’s also not all boring dashboards. Instead, there’s a lot of thought and design going into figuring out robust processes, designing tools that save people time, and connecting teams that wouldn’t have known they could work together otherwise. Plus, there are ways to show definitive impact on teams from those efforts. If you build a tool that saves coworkers hundreds of hours of work, that’s a huge amount of money you can point to.
But before I start advocating my (likely minority) way of life to people, there is some extremely important distinctions between how I work and the disillusionment scenario — I have enough local power in the form of management support, trust, and past experience, to be able to take at least some level of ownership of the situation and will push back on projects that look unreasonable or unsustainable. I can advocate for projects that I do want to work on when I find them. Plus, I make sure to have teams know they have to maintain their own processes once it’s up and running.
Having all those things is critical to remaining effective in a “come one, come all” type of setup. I dislike working on worthless throwaway dashboarding projects as much as anyone else. The goal is to convince people that those projects are bad ideas to begin with, that they should do something better.
This aspect of work, effectively the benign form of workplace politics involving horse trading and prioritization of work, is very difficult for someone fresh to industry to navigate. It’s a skill that’s often built up from practice, failure, and observing others. And even with those skills, there are always crappy projects that can’t be fought against, like if the CEO decides to haul the whole company down a specific path for a while. It’s not exactly an environment where you can toss a fresh data scientist in without support.
But I think for the small group of people who do find a sense of achievement in helping others do their work better by doing what it takes to herd disparate teams and systems and processes to a new destination with a giant wad of glue, it’s something to think about.
If you’re looking to (re)connect with Data Twitter
Please reference these crowdsourced spreadsheets and feel free to contribute to them.
A list of data hangouts - Mostly Slack and Discord servers where data folk hang out
A crowdsourced list of Mastodon accounts of Data Twitter folk - it’s a big list of accounts that people have contributed to of data folk who are now on Mastodon that you can import and auto-follow to reboot your timeline
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord.
Support the newsletter:
This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Tweet me with comments and questions
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!
Dashboards are the new “user friendly” (a term that came into currency in the 80s as a substitute for thought. They share the same original sin. If you make something so simple that any damned fool can use it, then every damned fool will use it. Besides, you can make it foolproof because fools are so ingenious.