Thanks to everyone who sent well wishes. Luckily, thanks to lots of sleeping, looks like the family is mostly over our run-in with COVID, just some pesky residual runny noses and occasional itchy throats left.
To start off this week, I’m going to promote an awesome event that I want every one of you, dear readers, to participate in if at all possible. Caitlin Hudon and Laura Ellis are hosting a second Data Mishaps Night! It’s a place where data practitioners submit brief talks about their biggest, most spectacular data mistakes and present them to the audience. There will NOT be any recordings of the event, so attending live will be your only opportunity to hear a lot of data folk from all sorts of companies tell some hilarious, and sometimes hair-raising stories about mistakes they’ve made.
I’m going out of my way to push this event because I think it’s a very important thing for us to have in the data community. From what I’ve seen in my career, data teams tend to be very small because they typically scale very well. Unless the data science team is specifically building a production feature, one data person can often support the work of dozens of engineers, designers, etc. So having just one person working across multiple teams solo is extremely common. The organization typically has to scale to hundreds of people before it makes sense to hire a small team of data folk.
One consequence of this tendency to be running nearly solo is that many data practitioners don’t really have much opportunity to work in teams of peers. For the 15 years I’ve been in industry, I’ve spent over 10 of those years being solo. While I’ve painfully learned how to handle and thrive in such a situation, there’s one thing I would miss — being solo means I can’t easily learn from the mistakes of others.
That’s the niche that Data Mishaps Night fills.
While I’d like to believe that the data community generally takes their post-incident processes from modern engineering practice, which increasingly emphasizes a blameless post-mortem process, the reality is that even if our mistakes aren’t usually blamed, there’s rarely anyone to share them with.
It’s also very rare that there’s a reason to publish a public post-mortem of a data related incident. While a data breach or a retraction would be important enough to warrant some kind of public posting, no one is going to post a public correction to an analysis that was presented completely internally. Why would anyone in the outside world care that I broke the internal analytics pipeline by accidentally introducing a Unicode processing bug?
Different industries, similar mistakes
The previous Data Mishaps Night was a whole year ago, and I’ve pretty much forgotten all the details and had to go search for the #DataMishapsNight hashtag on Twitter to remind myself. But the overwhelming impression that I got was that while everyone’s industries and use cases were completely different from each other, everyone very quickly “got” every story. Even when someone was partway, many of us would start commenting “oh no…” because it’d look very similar to situations that we’ve found before.
There were stories about getting too excited about results that were too good to be true, or some assumptions about a model turn out to be invalid, or just straight up counting is hard. I just dug these out of the Twitter search for the last year’s event, and you can dig through the search to unearth more.
I’m sure that those topics alone would resonate with just about everyone who works with data. But the specific details are what made the night fascinating.
This sets a good example for people new to the field
Much like introducing a bug that breaks production is a rite of passage for all junior software engineers, making a big dramatic data mistake is inevitable. The key to both is learning that such mistakes aren’t the end of the world and learning how to professionally handle such situations. For SWEs, they have a lot of peers and senior engineers to help them learn the process while data folk most often rely on their managers (unless they happen to have peers).
Even if the managers are understanding of mistakes and help handle them, it’s still doesn’t help the junior data person understand that these mistakes are extremely common. The publication bias of social media often means that everyone looks like they’re working on big important data projects with near perfect execution. At most we see complaints about minor bugs and issues AFTER people have figured out a solution. It’s a climate for lots of people to feel a lot more unnecessary imposter syndrome.
That’s why I’m so hyped up about any event that normalizes mistakes. A lot of comments from the event came from people who are earlier in their career and were relieved to see how everyone else was screwing things up even more spectacularly.
Is there a way we can normalize making mistakes more?
While I think this event is a great start, I’d love to see more people publicly admitting to mistakes in a constructive fashion somehow. Data Twitter is a great community, but we’re certainly just a tiny fraction of the broader data community. Spreading a practice of doing data post-mortems — in public — will take concerted effort from us. Whether by hosting events like this, making blog posts, Twitter threads, or just finding opportunities to talk about the fact that we all make mistakes all the time, what’s important is how we handle and learn from them.
Other interesting thing that happened this week, I had to use a couple of in-home COVID tests, and work provided an in-home molecular (not antigen) test. I eventually gave in to curiosity and opened up one after using it to test my kid. It’s a whole miniature wet chemistry setup running some form of loop-mediated isothermal amplification in a plastic cartridge. I barely have any understanding of the details, but it’s super cool.
About this newsletter
I’m Randy Au, currently a Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. The Counting Stuff newsletter is a weekly data/tech blog about the less-than-sexy aspects about data science, UX research and tech. With occasional excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise noted.
Curated archive of evergreen posts can be found at randyau.com
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s supported me!!! <3
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!