Working in data as a "meh programmer"

Working around myself

Dec 12, 2023

Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.

On a scale of terrible to amazing, I firmly believe that I rank as a “meh” programmer — gets jobs done without any notable grace or fuss but otherwise utterly bland and unremarkable. The vast majority of my code these days involve just linear data processing pipelines where data enters on one end, gets smashed around inside, then plops out the other end like a oddly geometric chicken nugget. Every time I have to touch l33tcode-like coding exams, I can typically get to a “working solution” but nothing I ever write can be described as fast, efficient, or elegant off the bat.

I’ll put Python and SQL on my resume, and can talk about the various atrocities I’ve done over the years… For example, having to read the RFCs on email to figure out how to get my janky py script to send a weekly metrics report via my company email server. It sounds sorta impressive, but mostly just took a couple of days of trial and error learning the bewildering madness surrounding email headers and protocol handshakes.

But as a whole, writing code is not something I find enjoyable, and so I don’t dive too deeply into it. I’ve got plenty of peers who just simply care to be better experts at it than I do. You’ll never catch me writing libraries and packages to share, nor reading books about programming languages or being effective or efficient with my code. It’s just the tool that gets me to my actual goals.

And yet, I’m doing pretty decently for myself. Despite how plenty of “very confident people on the internet” say that not knowing X, Y, or Z things means you’re not a “Real Data Scientist”. The field is full of very diverse people from many backgrounds, and I want to give an example of how it’s possible to create value with tech skills that have nothing to do with raw programming ability.

Everyone who works with data are T shaped in different ways and the louder people are about how just frickin’ DIFFERENT we are from each other, the better.

So, what does someone in my shoes do to be effective doing technical stuff while also being exceedingly boring at technical stuff?

Learn design patterns, big and small

The biggest thing that comes to my mind is having an understanding of how software systems are put together — essentially common design patterns and architectures. Ideally you’d want to understand things both at a “within a single program” level of how functions, classes, and abstractions are used to make software relatively easy to understand and maintain, and also at the “how many pieces of systems work together to provide a service” level.

To go with a car analogy, this is knowing what the major parts and systems of a car are for and why they exist — even if you don’t know the specific details about how they work. If you know that your car battery seems to be dying, you can guess that maybe there's a problem with either the battery or alternator and not your radiator. Same goes for knowing that belts in a car tend connect very important bits. While knowing this stuff might seem elementary and not worth discussing, being able to understand and have conversations about how big complex IT systems function is critical when working with people to debug or improve said systems. It’s usually more important that knowing the what specific algorithm is being used during a certain step.

The nice thing about understanding things at this high level of abstraction is that things change much slower. While database technology comes and goes with changing times and needs, the “a thing that stores the user’s account information” job will always need to be done by something. Having that framework in your head gives you scaffolding for learning whatever specific software is used to fulfil that role. A technology is significantly less mysterious if you understand that it’s serving as a user database and so must allow certain basic interactions.

Eventually, once you start working long enough and have to get other teams to agree to do work for you instead of doing everything yourself, being able to hold architecture discussions becomes ever more important. Now you can’t (and usually shouldn’t) micromanage what specific algorithm is being used, but need to make sure that you can express requirements clearly.

Being translator and cross-team advocate

Data roles usually interact between teams and have to do some translation between them. Who else has to talk to the customer service team about their data and metrics and then turn right around to the engineering teams to discuss how those metrics can actually be implemented. Having good communication skills is important in general, but we’re forced to become familiar with the technical language of everyone we work with.

While the base need of clear communication is just part of the job, it is possible to go above and beyond with this aspect and help teams advocate for new technical solutions to problems. Leveraging your understanding of how the tech stack works (see above), you have a much better sense of what things are “easy” (or feasible) than most people. Sometimes, to push things along, you need to use your technical knowledge to actually design out a proposed solution to convince people its a worthwhile idea.

I once helped a team save multiple hours of manual data reconciliation work every week by suggesting the engineering team spend a bit of time hacking up a simple feature. The team hadn’t thought to ask for help because they assumed the problem would be too difficult to automate. It turned out that 80% of the work could be automated and it only took an engineer half a day to build.

While it’s hard to “out engineer” the engineering team at their own field, it’s much easier and fruitful to do so for everyone else.

Willingness to dive into tech as needed

Another underrated tech skill is having the willingness to learn a new technology when we see the need for it. Many people working at the intersection of tech and data are drawn to new shiny tech when it comes out (hence our obsession with discussing tools), so it might feel strange to think of it as a skill, but it really is one.

Learning a new thing involves spending a significant amount of time and resources mostly being unproductive while we figure things out. There’s frustration, lots of memorization, reading, and general bumbling around involved. Many of us don’t think much about it, but to constantly engage with new tools means we have a surprising amount of confidence that we can figure this stuff out in fairly short order. That’s pretty atypical — there’s plenty of places where needing to get a job done is more important that thinking about tooling.

For example, Kubernetes is infamous for having a steep learning curve, but plenty of data people pick up the fundamentals of it on the job with some help from their colleagues or with some quick courses. It’s not usually enough for them to spin up their own production-ready cluster, but it gives them enough familiarity to interact with the existing infrastructure at their workplace to deploy models and services. It doesn’t mean we need to learn everything that comes across our vision, but learning new stuff unlocks new opportunities, and there are plenty of people who aren’t even willing or given the chance to take such a risk.

Don’t try to be a unicorn

Every couple of years, we get jokes about there existing “full stack data scientists” who can do everything — write a brand new statistical package in Python, launch brand new reporting infrastructure, then turn right around and present metrics to the board of directors. They’re called unicorns for a reason. I’m not one, and most likely you’re not one. Given how we all have limited time and attention, something has to give. For me, I’m a meh programmer so I can spend time on other things. Conversely you could decide to excel at writing code and be weaker on stats or similar.

Either way, our scope is too broad and you’re going to have to utilize the skills and resources of other people and other teams to get stuff done. Attempting to do it all yourself would just lead to burnout. So team up with people who can complement your existing skills — in my case people who can code better than I can — and make your best impact elsewhere.

Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.

Support the newsletter:

This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:

Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!

Counting Stuff