Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
By now, I’ve spent over a decade working with various kinds of software/internet products. One consistent metric that pops up in this space (and almost everything else that is service based) is some measure of “customer satisfaction”. There’s a bunch of ways to measure customer satisfaction (abbreviated CSAT here for my typing convenience), but most are variations on the theme of sending a survey question to a customer, usually after a purchase, and asking them “how satisfied are you with X”. The user can answer on a 5-point scale, thumbs up/down, or some other variation. The metric used is typically “what percent of respondents give a high enough response”.
Most businesses want to increase CSAT, under the intuitive theory that happy, or at least, satisfied customers are generally a good thing to have. It could mean the customers will return, or they might recommend you to other people. Businesses often also want to know why customers are dissatisfied because those places could be improved, again to attract more customers in the future. As a data scientist in the product development space, I’ve worked with tons of teams on as many initiatives to “increase CSAT” or “use CSAT data to improve the product”.
I just want to vent today about how the vast majority of such projects… don’t really seem to have as much impact as anyone would want.
Now, “customer satisfaction” is a very rich field of study in business management and marketing literature. There’s tons of papers about different ways to measure CSAT in various industries and contexts, tons of discussion about strategies for utilizing CSAT data to improve businesses. There’s plenty of knowledge, theories, models, and discussions to be had over an entire career there.
But today, I just want to say that when it comes to brass tacks and I and my product team are charged with “improving CSAT by some amount”, I’m almost certain that I will fail at the task.
That’s not to say that I think CSAT is some immutable, near-fixed property of a product. I don’t think that whatever I’m working on is fated by the gods to hover around 70% satisfaction +/- 5%. It just FEELs like it.
Moving things around in the short term is pretty easy
There’s an endless number of shocks that could be given to a system to make CSAT move in a desired direction. You don’t even have to cheat and do something like bias your sampling method.
Probably the most dramatic way is to simply do something that will piss a customer off and then immediately afterwards, send them the CSAT survey. For example, I can guarantee that if you raise prices 50% on users and immediately send a CSAT survey in the same message, you’re going to get a ton of angry responses and an obvious drop. The timing is very important.
Another easy way to crash your CSAT is to have a giant service outage, but have the outage be juuuust small enough to allow users to still access your CSAT survey. A backend-only outage is perfect for this. Then you can enjoy having tons of people who are on your product trying (and failing) to use it, have lots of spare time to devote to your conveniently sent survey.
If you want to go the other direction, just love-bomb people like you’re forming a cult or something. If you send your survey out when people get valuable stuff for free, or a price drop, or service upgrades, you’ll see CSAT pop up. Bribes work to a decent extent.
Long term? Not so easy
While short term shocks are pretty effective at noticeably moving the needle for a time, things always wind up reverting to the long term trend once the shock wears off. People angry about an outage get their service back and return to their work, people who have been bribed with a freebie slowly forget about it.
So the theory is that the only way to increase satisfaction over the long term is to improve the fundamental product experience itself. It seems very simple, but I’ve noticed that it’s really easy to run into a wall of diminishing returns.
What I mean by diminishing returns is that, because users are not uniform, the strategy of “fulfilling user needs” will fall into a trap of needing to meet the needs of increasingly niche user segments, and it just becomes more difficult to make everyone more satisfied by any singular change.
As an example, imagine we live in a universe where hammers weren’t invented yet, and we’re the proud inventors and sellers of Hammer 1.0. It is a rock. Users of Hammer 1.0 are happy that they can now accomplish a lot of the tasks they need — flattening things, driving nails, general smashing and shaping of materials.
CSAT for Hammer 1.0 is great at 80%! You’ve revolutionized smashing things! But over time, as the new idea becomes commonplace and you’ve got more and more hammer users. People start seeing the flaws of Hammer 1.0, and CSAT drifts downwards into the 70s. It’s still better than nothing.
So you ask your users how can we make a better hammer. Studies are launched! People want better ergonomics, they want to apply more force and smash things more efficiently. Your product team gets together and decides to put out Hammer 2.0, this time with a wooden handle! More leverage, more power, and easier to hold! The handle also means you hurt your hand less because the wood absorbs some of the force of striking. CSAT jumps up again at the obvious improvement — we’re back into the 80s!
But again, time passes, and CSAT slowly settles down as people start applying Hammer 2.0 to their lives and find various shortcomings. The stone head sometimes breaks off when striking. Some people want to work with materials that are tougher than the stones being used. Others need much more precision for their tasks. So again, while people are happy to have a Hammer 2.0 in their home, it’s not ideal.
So, back to the drawing board. Hammer 3.0 ‘s got a metal head now. Now it’s much easier to create different sizes to suit different needs. Plus, metals are more consistent in their operating characteristics than the random stones we’ve been using before. CSAT is up again, though not as high as with previous releases. By now, everyone’s used a hammer and boy do they have opinions about what hammers they need in their lives. The people demolishing buildings want big heavy hammers. The random homeowner wants a small light thing for everyday tasks. The metalsmiths and geologists want these weird shapes to do specialized tasks…
But your Hammer 3.0 factory can’t handle all the specialization required. So you try to satisfy the most users as best you can. You add in a claw to the small hammer design to help with the basic homeowners and carpenters — they’re happier but no one else wants that feature. Any CSAT boost you get from that release is diluted by the proportion of the user segments. Same goes for making a big sledgehammer to satisfy the demolition types. You sell some wonky shaped metalsmith hammers to make the small metalworking community happy, but CSAT barely moves and everyone else who doesn’t even understand why those shapes exist are laughing at you on the orange Hammer News site.
Essentially, my point is that given enough time, a product that finds a successful customer base starts being able to only release features that can satisfy increasingly smaller segments of users. While the first few generations of improvements had near-universal appeal, Hammer 12.0, after a few millennia of feature improvements, can’t really improve on the basic formula of “heavy thing smashes other objects” any more. All possible hammer innovation winds up addressing some shortcoming that only one segment feels. A broad CSAT measured for all hammer users will never be able to detect any meaningful change. The product has become so specialized that benefits for one group of customers might directly conflict with another group’s desires. Demolition hammer users are never going to use the same ones as jewelers. If we only sold one type of Hammer 12.0, our CSAT would almost never change due to product improvements any more.
At this stage, it’s obvious that our Hammer company can’t satisfy everyone with the same hammer product. It needs to specialize its product lines to meet the needs of specific groups of users. Separating out those groups means there’s less conflicting use cases and needs, and so product improvements can once again move the CSAT needle… until the cycle repeats itself with sub specializations. That’s why there’s so many different types of hammers in the world.
But in software and tech, specialization along market segment lines is very rare. Companies would much rather “scale” by making one multi-featured product that just keeps bolting on new features over time. It’s cheaper and more cost efficient than building, maintaining and selling separate code branches for different market segments. And so, products get more and more “advanced options”, longer lists of features that only 10 customers ever bother to click on. And then teams are asked to improve CSAT and find that they can’t really do anything about it.
Segmenting the other way
Probably the clever data scientists out there have an answer already. If the problem is that we have lots of conflicting user segments feeding into the same aggregate CSAT value, then the solution is to segment the CSAT itself. Mathematically, its the only way to show movement in that metric.
But despite how easy that solution sounds, the implementation can be a pain. After all, CSAT measurements are survey instruments that rely on users being willing to respond. Survey setups almost always have a low response rate to them, and the same goes with CSAT. It’s very common that you barely have the sample size to report on the aggregate CSAT metric, let alone reporting across specific segments of the userbase. Not to mention, your CSAT collection system might be anonymous, or involve some other obfuscation that makes it hard to figure out what user segment a respondent belongs to.
All this can be very frustrating work. For teams that are given the task of “improving CSAT”, it can be a life-or-death struggle for the team.
Making customers happier generally does correlate with business success. But the mechanics of measuring and acting upon it are significantly more nuanced than it appears on the surface.
Stuff from the data community
Seth shared their Youtube channel and Podcast titled “Learning from Machine Learning”. It’s a bunch of interviews with folk about, surprise, Machine Learning topics!
Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.
Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
Support the newsletter:
This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!