Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
I don’t normally do “what’s happening now!” type posts so this goes into the paid subscriber bucket.
Data Mishaps Night happened last night and I wanted to scribble down various notes and thoughts of the event since it’s not recorded anywhere. This’ll be a bit rough since I’m writing this from hastily jotted down notes and memory before it all fades away. Please forgive any errors and misattributions since it all goes so fast. I’m also obscuring the names of folks in keeping w/ the precedent of not having recordings of the event.
As always, DaMN was full of great stories and a lot of people saying “oh no” in chat as the speakers presented their some of their biggest mistakes. Topics stretched from “small and cute” to “millions of dollars on the line” to “had workers close the store to re-label every single item with a new price sticker in the store twice due to a mishap… on two separate occasions”.
The keynote was from Benn Stancil about how he views his biggest mishap was going along with the expected career path of becoming a manager and then executive before realizing that he wasn’t particularly good or happy at those things and might have been much happier finding an alternate role that’s not the obvious one in front of him. Considering how the career path for data science is still quite undefined, it’s a lesson that many of us who may never even have a chance to become an executive should pay attention to.
Probably the most innocent and fun talk was when a speaker had prepared a big live demo involving calculating carbon footprints… and then late in the night was checking stuff, saw “NAN” in the dataset and freaked out and went on a giant debugging spree to figure out what was broken about the demo. Truth was, nothing was wrong. NAN is not NaN. It’s a completely valid airport code for the airport in Fiji.
Another talk was an academic that was analyzing air pollution data, correlating two time series together. Except that one of the time series was very expensive to collect, so it was only done every six days, while the other time series was daily. They decided to drop the missing values since 80% of the data was missing, but wound up having a very subtle localized nonlinearity in their model terms that caused their results to be wildly, massively, off. They then presented their results at a conference and “got yelled at in various European languages” for their results before they went and found the issue.
Keep reading with a 7-day free trial
Subscribe to Counting Stuff to keep reading this post and get 7 days of free access to the full post archives.