Announcement: This June 8-9, 2022 Google is hosting a “Quant UX Con”. A bunch of Quantitative UX Researchers from across the industry, Google, Meta, Uber, etc. will be presenting a bunch of talks. Consider attending if you’re interested in the field of Quant UX Research.
Attendance/registration is FREE. If you’re in Sunnyvale, you can register to attend live (space permitting) before April 7th, after that people can join virtually. I’ll be on a panel of qUXRs discussing the vastly differing paths we took to get into the field: “Mid-career quant UXers transferred from other disciplines”
Many years ago, I attended some talk that I’ve now forgotten the details of where the speaker was talking about building stuff in the civil engineering “let’s build a bridge or skyscraper” sense. Along the way they made an offhanded comment that stuck with me — the phrase “no one’s done that before” comes up a surprising amount in civil engineering projects. Want a building in some location with weird soil? Want to have it be really tall or have an interesting shape? Want a fancy balcony/bridge thing inside? Very often there’ll be some aspect that hasn’t been done before. That speaker heard the phrase often enough they wondered if humans had built anything except for tiered rectangles.
At the time, I found the story amusing and it sorta stuck with me. But when it randomly popped into my head this week and I thought about it some more, it provides an interesting view about how we as data scientists can view our work where we personally stand in the field.
In any field, there are people who are considered “at the forefront” for various reasons. Maybe they’re a leading researcher coming up with new techniques. Maybe they’ve been around a long time and have deep knowledge, insight, and connections. Maybe they created or maintain a famous tool that everyone uses and continues to make contributions. Maybe they have other notable achievements. Or maybe they’re great with memes and are just really fun to talk to.
Naturally, many of us look up to those folk. They’re high status, have some sort of social capital, and are very visible. Very often they’re doing “new and cool things” and their interest provides a rough indication of where the “cutting edge” of data science is — either through the things they’re working on, or projects that they become aware of and share due to their position and status. At least from a the broad public’s point of view, these people are “moving the field forward” in some way.
But the problem with comparing ourselves against such folk is that they seem so distant from us. I personally spend much of my time pulling data out of a warehouse in SQL, dumping some analysis with code into a spreadsheet, doing simple tests to generate charts and insights, then ship out slide decks. I know tons of people who are doing very similar things too. That’s miles and miles away from the apparent “cutting edge”. It’s easy to feel that we’re just followers that aren’t really making contributions to the field. This easily feeds into imposter syndrome.
I’m here to argue that there’s a very neglected side of “cutting edge” work — the applied/execution part — and much more of us are nibbling at the edges of that frontier.
The theory and execution are separate things
Humans are really really good at making stuff up with theory. We can “solve” tons of problems with our minds. I have no idea how to solve a Rubik’s Cube, but I know that methods to generate solutions exist and are fairly easy to pick up, so I could notionally claim that I could solve one if given enough time and resources.
We’re also really good at generalizing our knowledge — once you know how to make a simple soup (meat + veggies + salt in water then boil) we can easily imagine how to make chicken soup, lamb soup, fish soup without having to actually do it.
But just like with making soup, theory and execution are different things. While it’s true you can just throw food in a pot and boil to make a kind of soup, you can make a significantly tastier soup if you adjust your cooking technique and ingredients. Tough cuts need to be handled differently, certain meats can be a bit smelly or oily unless handled in certain ways, some spices don’t go together, and so forth. To get to that level either requires learning even more cooking theory, or direct practice to build experience (which helps develop some of the theory along the way).
It is extremely difficult to completely “book learn” your way to dinner.
Despite that, there are people who turn up their noses on “solved problems” in the sense that if we know in theory how to do something, it’s not particularly interesting or considered cutting edge any more. For example, optical character recognition (OCR) has been around for decades and is pretty reliable in a lot of contexts, and some would say “just load up XYZ OCR package and it’ll do everything for you! Easy!”.
But ask anyone who’s tried to implement even “well understood” solutions realizes, things are never that easy. The execution of the solution within every unique context can be ridiculously challenging. OCR is “easy” until you have to do it within milliseconds, or with bad lighting, or a horrible mix of languages, or scribbled handwriting, or done on a tiny battery-powered device with limited CPU. “Analyzing churn” is very often a linear regression problem, but wrangling the data to that point can take years of work.
Those practical challenges and limitations are very real, and rarely addressed in academic settings. The “interesting stuff” is already done, the rest is left as an exercise to the reader. The people who do deal with these problems typically work in industry, and industry is usually not motivated to even share their work in a blog post, let alone write an open source package or publish about their “secret sauce”.
The resulting situation is pretty interesting — the “true” cutting edge of data science is… somewhere out there *vague gesture into the void*, but no one’s seen it. Like in Matt Might’s illustration about an individual PhD’s contribution to knowledge we’re creating a tiiiiny bump along the frontier.
So instead, you can look at what’s around you and see if you might be close to the edge of knowledge or not.
Signs that you might be on the cutting edge of something
You definitely can’t buy the exact thing as a service
Pretty obvious that if someone is making and selling a product for something you’re working on, you’re unlikely to be on the cutting edge. It also brings up a very good question of why aren’t you buying their solution instead of trying your own.
But even in such a situation, there’s always room for innovation, so don’t write yourself completely off.
Lots of people are doing it, but everyone’s struggling with the same issues
If you look around and teams all around the world are all struggling against the same problem space are you are, the odds of you being on the cutting edge without even knowing it go up. The space is usually so unsettled that if you manage to come up with a solution that works for your situation, it’s got a pretty good chance that others can learn from that experience.
For example, practically every organization that has a notion of repeat customers has questions, and often problems, with “customer churn/retention”. I am not aware of any solution that definitively “solves” the problem despite decades of attention. At best everyone finds broad techniques that sorta works for their situation. If you manage to crack that problem in the general case for everyone, run up on stage and claim your prize and giant bags of money.
You can’t really find any help anywhere
Ever find a bug in a piece of software because no one else has bothered to use that feature? Can’t find any help in Stack Overflow or Google? No talks, no blogs, no examples? Baffle even the creators of the software you’re using? Yup, that sense of being confused, alone, and making things up as you go, that’s the cutting edge.
The majority of the “amazing new stuff” we get in data science comes from people working through such situations. They had to create new tools and techniques because no one else has shared a solution. It happened purely by necessity.
Does being on the cutting edge mean anything?
In the day-to-day? Absolute nothing at all.
But that’s the whole point. It doesn’t really mean anything.
This should act as a reminder that the attribution of being “on the cutting edge” is largely imposed from the outside. The people in the midst of it are merely doing their best to figure out how to get something to work. Since there’s no pre-made solution, by default they wind up thinking/working on it more than most people and become experts. Only after the fact, the community at large winds up recognizing that the work was valuable and attributes status to the creators.
Many of us are in similar positions of figuring things out as we go along, but what we do just doesn’t happen to go viral for whatever reason. We wind up being experts on some niche.
Through that lens, the “club” of being out in the forefront of the field is hilariously easy to join because the entire surface area of our field is extremely wide and relatively accessible. Many of our tools are open source and free. Much of our work results in knowledge and software — stuff that’s infinitely copyable. The data community loves sharing tools, talks, and blog posts with each other. Stuff can go viral overnight. All it takes is being willing to share your work with the rest of the community. If it somehow resonates with the community in some way, you’ve likely nudged the frontier of data science out just a little further.
I hope this gives everyone some confidence and inspiration to go out there and contribute something. While it’s going to be old hat for you because you’ve already solved it for yourself, it’s guaranteed to be new to most of the rest of us.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.
Curated archive of evergreen posts can be found at randyau.com
All photos/drawings used are taken/created by Randy unless otherwise noted.
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!