Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
Last week the game developer community had a giant drama explosion. Unity, the company that makes a popular game engine announced a fee structure and the community did NOT take it well.
Quick background
Unity makes a game engine, the software that games are built on top of. They provide a lot of the common primitives and functions like rendering objects on screen and moving objects, networking, basic physics logic, etc.. People can and do build their own engines from scratch, but many people would rather build the unique parts of their game and not worry about the low level stuff. Unity’s main selling point was that you could develop your game on their engine and it can build binary releases for a huge number of platforms, from PC to all the major consoles, VR, mobile, web, etc. That cuts development time for small devs who can't maintain separate engine code branches for multiple platforms.
There’s a free/student tier, but the “pro” level pricing was primarily the per-seat subscription fee for the development tools — $2040/seat annually. Not exactly cheap, but at least it’s a predictable cost. Thanks to the relatively low price and convenience, a large ecosystem had grown around the tools, including various paid 3rd party plugins.
Then, Unity announced that they would start charging for game client installations. Past a certain threshold of installs and 12mo of revenue, Unity will bill a certain number of cents per install. Most importantly, all historical installations prior to the announcement are going to count towards whether a game meets that threshold (but thankfully won't be charged), so plenty of older more popular games are instantly going to have their games quality for billing for new installs come the announced Jan 1 effective date.
Since this fee can represent a huge amount of money, especially for games that are released on a freemium model that generate enough revenue to qualify but have tons of free-to-play installs. Developers are rightfully up in arms about it since game installs they get from deeply discounting the game, or giving their game away at events or promotions will count towards the threshold. The blowback and sense of betrayal has been fierce and as I write this, Unity is trying to “clarify”, if not outright walk back at least some of the details. By the time this post publishes, it’s hard to tell where things land, but I imagine they’ll keep updating their FAQ and addressing the many corner cases people are bringing up.
But I’m not here to rant about crappy monetization schemes from a public company that’s been running at a net operating loss for multiple quarters running. I’m here to write about the interesting counting problem it represents.
As of Sept 18, 2023, Unity is in the process of walking the policy back in some form, though details are still to be announced. It’s an ongoing dumpster fire and the trust has already been burned, but luckily for me, none of that affects today’s post.
Using our counting skills
I often joke that my job is all about counting stuff. More importantly, it’s about figuring out the right things to count, and the best ways to accurately count them to answer the questions I’m interested in. Unity has defined a clear counting problem — they’re claiming that they can get an accurate enough count of how many game installs have been made, and will start charging for installs made after Jan 1, 2024.
So the big question is… how is Unity going to count and bill for installations? Apparently with a “proprietary data model”.
Let's pretend we're on the data team at Unity given the task of figuring out how to measure installs. We can pretend this model started as some innocuous marketing/business metrics request that started years before the current fiasco. What is within our power to measure? Let’s speculate a bit.
Disclaimer: Just in case this wasn’t clear, I’ve worked with publishers and devs before, but never at any place affiliated with Unity and thus am pulling all this speculation out of you-know-where.
For a first pass, let’s assume maximal laziness on the part of developers — they do everything possible, including publishing to storefronts, through Unity provided tools so that we have metadata about every game possible. We “know” about every single Unity game that’s out there and it’s magically populated into our data systems so we don’t have to worry about identifying games to track. Use of such features isn’t required and I’d bet big developers tend to do things on their own, but everything is complex enough already and those big folk have bespoke licensing contracts anyways.
Obviously, the most direct and accurate method to count all installed copies of a Unity game would be to just put a “call home” function into the base engine runtime. The problem with that strategy is that Unity games are made to run across all sorts of platforms. While most of platforms are internet connected nowadays, there's no guarantee that any particular device will have the ability to call back. It’s still possible to have a completely disconnected console that you buy physical games for and never update, but I suppose we can just handwave that detail away as being “small”.
From the scant details we hear about their plans, it sounds like they’re going to bill for installations on different devices by the same customer. For example, if I install a game I bought onto my desktop, laptop, then again when upgrade my machine, I’d count for three whole installs. Usually this sort of tracking is done by fingerprinting the user somehow when the software calls back — web advertisers often do something like this using cookies and arcane JavaScript tricks. But once you use such tracking schemes, you're brushing up against laws like GDPR. Video games are often used by young children, so you also have laws like CCPA to comply with too. At the very least, unless Unity has been illegally spying on users for years now behind the scenes, this will require adding in all sorts of consent functionality to existing games that obviously does not exist now .
One potential legal source for installation information is the in-game ad service that Unity provides as an option to developers, since that effectively gives the software permission to call home to request ads and track unique identifying information. Mobile games tend to be OK with this since they run on freemium models, but many games we consider “standalone” aren’t so keen to do so. Any data we get from this source for estimation is probably skewed in interesting ways.
So with some exceptions, direct measurement seems to be off the table for retroactive data. There’s probably enough to get a basic attempt at a model going, but getting a complete picture would be difficult. That would be a design hurdle for future data collection that would require game developers to opt into it. This probably isn’t good enough to attempt billing the entire userbase. They admit as much because they're sourcing things from “aggregated data from various sources”. What the heck is that?
Somehow they are getting download/installation data from somewhere that's not the end user. The obvious source for such information would either be the publishers themselves (which I find unlikely since there’s so many), or the major storefronts and distributors. That would be Steam, the major console stores of Nintendo, PlayStation and Xbox, the Google Play and Apple App stores. But it raises the question of the many, many, many other stores out there that sell games from Walmart, Amazon, to random shops across the globe. While the bulk of the power-law distribution would concentrate in the top handful of storefronts, I haven’t see much data as to how big the long tail distribution is.
But even if we manage to enumerate all the important distribution points for game purchases and downloads, we would need to sign data exchange deals with all of them to get download information we need to bill people. That’s not easy because I don’t think it’s a very common use case, and certainly not one that a platform would see much economic benefit in implementing.
Steam has an interface for publishers to download their sales data. As far as I know, there no interface for the individual developers under a publisher to see a private view of their own sales data. It’s up to the publisher to provide it. I don't know of a way where a 3rd party engine vendor can get access to such data since the APIs for seeing users owned had been shut down many years ago, which stopped sampling methods that used to exist for estimating game sales rankings.
Mobile app stores are a pretty giant mess data-wise. The Play store at least lets the app developer/publisher account download CSVs with historic purchase/download stats, but again, this information is typically only available to the account that publishes the title onto the store. Apple’s store is even worse where they don't really provide any useful data at all and everyone seems to rely on a weird cottage industry of data vendors like App Annie to get rough measures of downloads. Everyone seems to agree those estimations aren’t very accurate, but there isn't a better alternative.
The same pattern holds for just about every storefront I can think of. There’s little incentive to make it better since “give the publisher access and let them figure it out” has been working well enough. There’s likely an industry organization that has negotiated many of these arrangements to aggregate sales data for the industry and would be willing to share that data for a fee. I just don’t know who that would be and whether sales data is a close enough proxy for installation data.
When in doubt, handwave. Hard.
Enough discussing failing to even get initial data to analyze. Let’s just wave our magic data science wand and assume that we’ve solved the data collection problem for now. We’ll just pretend we have a great legal team that met with all the big players and negotiated data feeds for every game sold on the planet that we can shove into our data warehouse to analyze and build our installation model. Our problems are still far from over.
A lot of the current walking-back of the proposed monetization scheme is that certain classes of installations “don’t count” for the purposes of billing after developers raised all sorts of well-founded objections.
For example, demos that use the runtime don't count if they’re a cut down build of the actual game. But if the demo can be unlocked (via in-app purchase or similar) to give access to the full game, then they do count. Angry mobs of users who used to review bomb a game but now opt to spin up thousands of VMs to install a game maliciously to hurt a developer (you know this will happen) are also not supposed to count. Charity donations of games aren't supposed to count. The list will probably grow by the time this post publishes.
That all sounds nice and perhaps even reasonable exceptions from a business standpoint. But all these activities just appear as normal users doing normal user things. It’d be represented as various form of “units sold/downloaded” in official reporting. We’re somehow supposed to tell these apart from the aggregate data with no direct telemetry — it’s a familiar episode of data science teams listening to the sales team pitch the moon. I struggle to see any sort of methodology that could distinguish this with any confidence, even with an infinite team of human classifiers, let alone an actual budget.
Even if we did some extreme handwaving and pretend all the games resemble the set of games we have good data on, that is, the ones who opted into the ad service, it’s a huge epistemological leap to link them to the games who opt out without supporting evidence. Maybe they made the connection from a data source I haven’t thought about.
Honestly, if Unity has managed to solve the “detect human intent behind an aggregated log entry” problem, they should stop selling game engines and just conquer all social science and industrial research. I’d gladly yield my day job to them and just use that magical tech.
When stats are vibes
At this point, I’m stumped as to how to measure how often a given game is installed given the little I know about potential data sources. Even if they have lots of information sharing deals with distributors and publishers, the edge cases are effectively indistinguishable from vibes.
As it stands, Unity can declare almost any number they want to bill for, so long as it’s proportional to the number of actual units sold. They can always hide the source of their data behind the magical veil of “proprietary technology”. The last time I worked at a shitty place that had “proprietary anti-fraud technology” some 14 years ago, some of the ad fraud was found by clever technology and bot filtering, and some was the fraud was “found” by an executive officer pointing to an entry that was paying out more than expected and declaring that X% must be fraud that slipped through the cracks and they’re deducting it. I’ve categorically stopped working in ads since I quit that horrible place.
When data is hidden and impossible to cross validate, it is fertile ground for all sorts of unethical, if not outright fraudulent things to happen. It’s like when Facebook had been caught lying with their video metrics and hurting a lot of smaller news sites. That’s not to say that everyone will always lie when they think they can’t be caught, but without external verification mechanisms, how could you be sure?
All these shenanigans is probably something that is hard for most data people to accept. We’re paid to care about counting things accurately. We’re certainly not supposed to be constructing fictions to be used in games of money and power.
And at the end, what might have once been an innocent (or not!) data science project started long ago has become a vehicle for the exercise of power. In making the announcement, leadership at Unity bet it had the power to push this through and squeeze money from devs. The community backlash and the ongoing scaling back of the messaging showed them what the actual balance of power looked like. It’s hard to see how it all shakes out yet.
This is a reminder that data work isn’t isolated from the messy side of business and politics. If anything people are drawn to using the magical power of numbers in such situations. Playing with numbers and data doesn’t not automatically give us neutrality nor innocence. Even projects that start out as innocuous research projects can be repurposed in ways we never imagined while starting out. It’s easy to think from an armchair that our own work won’t be used in similar ways, but the future is always murky and unpredictable.
There’s no easy answers here. But we can pay attention and object if things are going down a dangerous path.
Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.
Guest posts: If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
Support the newsletter:
This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!