Most metrics are conditional

And your product folk might not know that

Aug 29, 2023

Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.

This post fueled by way to much exploding giant robot gaming on my part, as evidenced on the Discord/social media.

A surprisingly unsettling mass of cacti at a botanical garden

After helping people build products with data for many years now, there are plenty of things that set off alarm bells in my head. Today, the alarm that I’m reminded of is when someone starts asking questions about getting a high level metric for a specific kind of sub-task on a given page.

As a concrete example, let’s say you’re some cloud tech vendor and have a product that lets users create virtual machines. Obviously you make money getting users to create and continue to use these machines. You’ll also definitely want to have a “delete machine” button somewhere in the product, because it’d be a horrible dark pattern to not have one. Users would hate the product and never use it if they’re locked into paying for your thing in perpetuity.

So the delete button exists in a relatively intuitive place on the page, and no one really pays attention to it since it’s obviously not a money-maker. Then someone new comes and starts asking questions about all the metrics of all the clickable buttons on the page. Behold, the delete button is clicked 0.01% out of all the times people visit the page. “Wow, that’s low.” says the new product lead.

Cue the alarm bells.

We have come to a crossroads here. Down one path, we start a long discussion about what purpose that delete button has within the product as a whole and how it fits into the overall metrics picture. This is usually the happy path we want to go down.

The other, significantly more dangerous, path we can take is the product person can latch onto the “low conversion” of the button, and decide they want to “optimize” it somehow. Someone who comes from a background where “feature metrics need to go up or not exist at all!” can go down this route.

The reason is because most metrics are conditional. Interpreting things within a vacuum, especially in the context of making product “data-driven” changes, only leads to bad decisions.

There’s whole bunch of analyses that need to be done to understand whether that delete button is performing well or not. I’d initially hypothesize that 0.01% of pageviews click on the button because our product is so awesome people don’t want to stop using our service. Optimistic, but worth checking out. Even if that weren’t the case and there’s a more mundane explanation, like “we route everyone to that page when they log in so it just has tons of pageviews”, there’s still an analysis to be done to understand if that is the case. More importantly, we need to check the case where “If someone really wants to delete their machine, how likely are they to succeed?”. The success of the button is conditional upon user intent and not just a random “everyone”.

My example took a really obvious critical function and makes it sound like the hypothetical product owner may decide that since so few people are clicking ‘delete’, we should just get rid of the button or use the screen real estate it takes to do something more useful. Surely, no one would make such a bad decision, right? That’d be like claiming that since so few people use the neutral gear in their automatic transmission car, we should just get rid of it and save the cost of including that feature. Who would do that?

To my disappointment, I’ve been in exactly those sorts of conversations before, albeit infrequently.

Part of the cause seems to be that some people have a simplistic view that a whole is merely the sum of the parts — if I make this specific handle on the car “objectively better”, then the car as a whole just got better, even if it makes opening the windows more difficult.

I’m not sure what the origins of this way of thinking are. Even for engineers, who you might readily blame for reductive mindsets, aren’t unfamiliar with systems-level interactions and effects. I’ve worked with tons of engineers who understand this and don’t need to be reminded that not everything can be blindly optimized.

Anyways, the main reason we’re on this discussion today is because I came across this post last week, and while “hyper-focusing on optimizing a specific metric without considering products as a wholistic thing” might not be the specific cause of this, it certainly stinks of it.

Post from Bluesky, of a Fortune article reporting how Twitter/X is exploring removing article headlines/text from tweets with links, leaving only the header image in an effort to reduce post height

In this bizarre Twitter story, the metric of “screen space taken up by articles” is being optimized for a fairly inexplicable reason. If I had to guess, they’re trying to get more tweets to show up on the screen, which in theory would provide more ad impression potential. Otherwise it’s being done purely for aesthetic design reasons.

I’m sure you can imagine all sorts of potential research that could be done to understand what kind of value the linked article titles/text brings. While the value seems self evident, the actual answers might be surprising and unintuitive. I’m sure very little of that research is being done.

If this change ever does roll out, given the pushback from advertisers, we might get a hint of what metrics they’re actually optimizing for when we see what compromises they make. For example, maybe they overlay the text over the image or something. That creates all sorts of potential visual clutter, but would largely achieve the stated goal of reclaiming screen real estate.

Discuss everything in terms of the system

So, as the people who provide the data and the analysis that derives from the data, we have to always be ready to make sure everyone else views the data from at a holistic level. Early in my career, I’ve contributed data to decisions for killing off important features that users liked, because a previous product lead had made a decision to put the feature in a more obscure part of the page. That small change led to usage of that feature to decline over the course of a year. Finally, when they were making the decision to kill the feature, I was asked to pull the usage numbers and they were (as expected) low and trending downwards. I was too young and new to realize what had happened until years later. We never did the research on what that feature was being used for and whether that use case should be addressed or not. Even now, users apparently still request that exact same feature occasionally.

I use that experience to remind myself that product decisions aren’t just about “line goes up/down to the right”. It’s important to fight for making sound decisions with data, not just what an arbitrary percentage or trend tells you.

For situations where I’m in the conversation and can speak up to prevent such incidents, this is a relatively easy task. After all, I now know when to raise objections. It is significantly more difficult when data analysis has been “democratized” and everyone can pull their own numbers for their own justifications. It takes a lot more training and building a culture of working with data and research to make things work. And there’s always going to be some new person who comes in and makes these mistakes. It’s our job to catch and make sure things don’t get flying off the rails.

Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord.

Support the newsletter:

This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:

Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!

3 Comments

Tim Hochberg

Aug 29, 2023Liked by Randy Au

Off topic, but the image looks like the aptly named "creeping devil" cactus, which is likely why it's kind of disturbing. I think the sign behind it says "opuntia cochenillifera", but that's definitely not what's shown. Opuntia's are species of prickly pear or cholla and cochenillifera is one of the large, mostly spineless cultivated prickly pears [similar to Indian Fig].

Expand full comment

1 reply by Randy Au

Richard Careaga

Two outta three ain’t bad

You can’t manage what you can merely measure. The same goes for the upwardly aspiring who think that a dashboard is a Harry Potter wand that can be merely waved to achieve magical results without so much an an accompanying abracadabra. This is especially true in the case where fixing the number is synonymous with fixing the problem. The definition and/or interpretation of the inputs to the metric will inevitably be gamed to achieve the demanded upward trend. And, if that doesn’t work, data will be adjusted as needed. As a last resort, an attempt will be made at casual reasoning to trace the bad effect (failure to make your bonus bogey) to some external phenomenon potentially susceptible to influence. Rather than attempting to exert influence through innovation (hard) recourse will be had to salesmanship. If the indicator happens to improve, credit will be taken, otherwise blame will be cast. All of this can be done with equal efficacy at far less cost by AI.

1 more comment...