Fighting Confirmation Bias

It's always been important. It's just a bit more important these days.

Jun 09, 2020

The US continues to open a firehose of police brutality, authoritarianism, and general doom into my Twitter feed every day, and I’m largely powerless but to hope that it ends, positively, sooner than later. As an Asian-American, the racism I experience on average is nothing compared to what happens to people of black and brown skin. Black Lives Matter, and I’ve little else to stay on this. I just hope people remain safe and healthy during all this.

A flower in the grass on one of the rare trips outside to the park w/ the kid

I struggled to find something to write about that isn’t The Plague, and the current protests going on, and I recalled this somewhat offhand tweet by Chris Albon a couple of days ago.

Chris Albon @chrisalbon

Small observation about ML Twitter: When someone messages me politely that there is an error in the machine learning flashcards that I post, almost always they are right. When someone snarkily tries to assert their technical dominance, they are almost always wrong. 🤷‍♀️

My response to this was a reference to an instinct that many of us in the science and analytics fields eventually develop. It was just a random thing dashed off on my phone while moving around the house.

Randy Au @Randy_Au

@chrisalbon It's like with doing analysis, The "OMG I FOUND SOMETHING" reaction automatically deducts 50 IQ points. A sign to check your code.

But as the week went on, and more and more data related posts surrounding BLM and police brutality in general, (for example this thread dissecting the #8cantwait paper), and I’m reminded that data is used as a weapon every day and we need to maintain skepticism and fight against confirmation bias. There’s no easy answers when it comes to systemic racism, nor COVID-19. It’s very easy to get baited into something that sounds good amidst the flood of information.

This is also important at work. I always have to remind myself when finding something interesting to always look for bugs. The more excited I am the more I need to stop and double-check, triple-check, and maybe even triangulate/cross-validate my results. I’ve been bitten by not doing this enough times to know that this mandatory.

One memorable (but utterly harmless) instance of this was when I was working on some kind of analysis early on in my career. I was pulling data from the database through a series of subqueries, some time in 2010 or so. I honestly don’t even remember what the specifics involved was. What I do remember was that it was getting late into the night and I was plotting a distribution of the data, probably users, and the craziest thing happened, the distribution looked approximately normal.

It had a bell-like curve on both directions around the mean and everything! My first reaction when finding this was to stop, gasp, then run over to my manager at the time and say “I found something really cool and it’s probably wrong but you should see this.” We both marveled at this bizarre unicorn, since you never see such distributions in the wild, then we went to figure out what went wrong because neither of us believed this to be right.

After examining what I actually did with the data, it seemed I had effectively pulled a random sampling of averages generated from samples of data (sorta). Meaning I had accidentally created the conditions for seeing the Central Limit Theorem in action. At least it worked out as mathematically predicted?

Sadly, the data I was analyzing and the results I wanted to present didn’t involve the way I pulled the data, so it was all wrong and I had to go back to square one.

A bit more serious misadventure

Many many years ago, I was working on a product w/ a feature where you could respond yes, no, maybe to a request. Everyone within the company hated the feature because a maybe would almost inevitably count as a no. There were also users who complained about people using maybe to effectively say no.

So come one day, late in the evening, a bunch of product and eng folk were chatting about the feature and a project manager goes “how many people are using maybe anyways? Why can’t I just kill it?” Said PM looks at me, and I hop over to my desk to run a very quick query to check. We all suspected it was small and I just needed to give verification.

I come back in about a minute with the answer—as expected, maybes were a tiny fraction (< 5%? maybe even < 1%) of the total number responses all users were given within the past month or so. I happily report this back to the group, everyone agrees that getting rid of that maybe feature would be a positive thing for the system as a whole, and the change goes out within a couple of days.

When we did, within an hour of the release, we had complaints coming in about how we had “removed a critical feature”. A decent amount of people were pretty darned upset about the change. It wasn’t a huge enough outcry that we changed direction (to this day, maybes are not a valid response), but it was definitely more than the tiny amount of users initially implied.

What went wrong here?

What I hadn’t realized at the time, and but it dawned on me weeks later, was that a few months prior to us eliminating the maybe feature, the same product team had made a change to make responding with a maybe much harder. It had gone from being a simple 3 button yes/no/maybe to primarily being yes/no, but if you went out of your way you could find a way to respond with a maybe. Prior to this change, users had been using maybe significantly more, maybe 10-20% of the time. Making it harder cut it down by a factor of 10 or so.

That decision had been made somewhat out of my sight, so I didn’t even realize the gravity of what had happened. If I had looked at data going back a couple of more months, which was how I realized this had happened, I would have seen the whole story. While I do agree that having a ‘maybe’ response was detrimental to the overall UX of the system, we probably should have done more rigorous work to understand just what users were trying to accomplish with their maybe and perhaps provide some way to fill the resulting gap.

So I had very clearly falling for our collective confirmation bias, found the easy piece of data that confirmed “well no one uses the thing anyways”, and we all went ahead with it. Scratching it a bit more and while we may ultimately end up in the same end position, we may have executed things differently, to better effect. It’s a lesson that still bounces around in the back of my mind fairly regularly.

What can we do about this?

Always be wary of excitement

Getting excited over a result generally means we’ve found something we want to believe is true. That’s the time to crank up the skepticism and be very careful. While it’s hard to remember to stay cool in these situations, it’s important to try to do so. But we’re human and we’ll never be perfect at catching our emotions in-flight.

Stop, formalize your findings into a narrative, try to disprove it

Since you’ll eventually need to put your findings into a digestible format to transmit to others anyways, another thing you can try is to write up your findings for a skeptical audience. Often, as I’m attempting to explain what I’m doing in detail, down to how the data was pulled, and imagine what the most likely skeptical questions I’d get from my audience, I can find holes in my arguments and setup.

Oh yeah, I forgot to account for internal accounts. Right, there’s an extra step here. Yes, there’s an alternative hypothesis (or four) that we need to rule out with some further work.

Recruit some sounding boards

The next best thing is to have some data-savvy/domain-savvy friends to run your findings by informally before making a big announcement. It could be your fellow researchers, a PM you have a good working relationship with, a friendly manager, etc. Talking things over and getting surprised by their questions is a good stress test.

These people you can generally trust to provide you a fresh set of eyes and a clean context state, so you’ll have to explain everything to them and find holes. Plus you should be able to trust them to give you good feedback.

Unrelated Other things

I had this in the background while I was writing… a look at USB connectors at a sufficiently nerdy level of detail and historic context. I didn’t realize the A and B type plugs actually had a meaning of A(host) and B(client) devices.

Counting Stuff