Games, a playground for learning DS fundamentals
I love playing games. Even now when I’ve got a job, a kid, a weekly newsletter, and an ever-growing series of hobbies, I always manage to spend more time than I should on games. This has been true growing up, and as expected, all that gaming got criticism as being a pretty big waste of time.
But like a lot of received common sense from that era, I’m not sure if that really was a waste of time. Remember being told not to meet people from the internet, or get in cars with strangers? There’s multi-billion dollar VC and investor money burning apps for that. So, instead of thinking of games as just entertaining time sinks, what can we get out of them?
We don’t normally work with data starting from nothing
One of the jobs that data scientists must face is deciding HOW to approach a decision problem that they may not be the least familiar with.
In a typical setting as a student or a new employee to an existing company, you’re often given guidance. There’s existing data collection infrastructure, existing data sets, management has truths that they know, analyses that they already rely on and understand. While there’s a lot to learn, it’s still a good reference point to start.
There’s no shame in standing on the shoulders of giants who came before us. I’m in no rush to write ASM to do data science, and I don’t want to spend months or years doing basic data infrastructure engineering before I can run queries against a robust analytics data warehouse.
But eventually, you’re going to wind up working on a new thing, something that hasn’t been done before and there’s no existing infrastructure. Maybe you start that side gig you’ve been dreaming about, or get brought on as a founding member, or you just get put on a brand new product development initiative. How do you approach those situations? You need to pretty much build up everything from scratch, which seems quite daunting when you think about the sheer volume of knowledge you’re basing your work on every day.
I think that games are one of the best places to learn these skills, because games offer a very interesting set of conditions that are hard to replicate in the real world, and the “cost” of failure is extremely low.
Games create a disjoint reality you need to figure out
This is the coolest thing about games, whether they’re video games, a sport, a board game, or whatever — they create their own self-contained realities. Most of the things that apply in the real world have nothing to do with the game. This is important because it means analyzing and optimizing decision-making within a game is similarly divorced from our typical everyday experience. We all start on equal footing in terms of subject matter expertise.
Normally a subset of things in the business world follows fairly universal rules — making money is almost always a good thing, growth tends to be good, breaking the law is bad but finding loopholes is often okay, etc. There’s some nuance and caveats involved, but they usually point in the right direction. You can take advantage of these similarities and carry over a lot of assumptions and models. This means that business systems often look somewhat similar in form, tech stacks are organized in familiar ways, etc. With some tweaking for local conditions, you can get up to speed fairly quickly, and it’s one of the advantages of having a longer career.
None of those things really mean anything in a game. So how do you proceed?
Figuring out what you even want to do
Games are played for tons of reasons. Many people play “to win”, but similarly, people also play for enjoyment, for the social aspect, for the challenge. While some games like chess are clearly about one person beating another, other games like Minecraft don’t usually have any goals in mind. What does it mean to optimize decision making against this vast backdrop?
You’d first have to figure out what you want to optimize for. This is exactly like asking a stakeholder “what does success look like?”.
For example, I play a handful of mobile gacha games as a free-to-play player. These games ration your ability to play and entice players to spend money by dangling cosmetics and convenience functions in front of them. They also have a randomized loot-box system that is effectively a very expensive form of gambling for bits of data, typically a pretty png or a widget with improved stats. There’s quite a bit of controversy about these games, with “whales” (a term taken from casinos) spending upwards of hundreds or thousands of dollars a month on these games.
The rest of the population, like myself or many younger people with limited funds, fill internet forums and communities with tips and strategies to maximize their play time and fun for minimal (often zero) cash spent. An entire community collectively run simple statistical analyses to back out the reward rates of loot boxes (Overwatch) when they’re unpublished. They’ll publish extremely detailed tier lists of characters (Granblue Fantasy). Players will post detailed guides for beginners (Warframe) outlining detailed mechanics and where to invest (and what to avoid buying), often with an eye towards teaching new players how to have fun without wasting money.
On the surface it’s just a bunch of kids and young adults playing games without spending money, but if you look deeper, you’ll find the beginnings of data science even in this simple problem. Players analyze currency conversion rates — I’ve seen players say they play from certain geo regions that get local currency discounts to maximize the currency purchase. They’ll analyze when sales happen so that a dollar goes further if you do choose to spend money. The sheer amount of information sharing, knowledge, and creative experimentation is amazing.
The motivation to have fun without spending cash is very strong.
Learning to measure stuff
Once you figure out what sorts of decisions you’re trying to optimize for, you need to actually start measuring metrics to power the optimization process. With games, this isn’t easy because there’s no real a priori intuition as to what should be measured that informs the decision-making.
Take one of the most statistic-heavy sports out there, a game I don’t understand anything about— baseball. Baseball boasts a dizzying number of stats that are compiled and reported on. I have no idea what most of those stats means, or whether knowing them helps predict anything that happens in-game. But I do know that people have invented with these metrics in an effort to better understand the game. Some of those stats are thought to matter, some WERE thought to matter but later prove irrelevant.
In video games, you find similar metrics, like DPS (damage per second) being a very common measure to compare weapons. The classic RTS game Starcraft community invented “APM”, actions per minute, as a way to compare users. Apparently top professional players can play at 300-400+ APM (clicks and keypresses/minute) for 20+ minute games. I played with an APM measuring tool recently and I can average around 100-200ish when I’m in the middle of typing out a paragraph of newsletter.
This sort of creativity in inventing potentially useful metrics, and then testing them to see if they’re relevant, is a core part of data science work. It’s all the more fascinating because it is a very serious challenge to come up with something useful and then show that it actually is useful.
Think about what it takes, in a business setting, to have a business system, hypothesize what potential levers are needed to manipulate some aspect of the system (increasing revenue is a common target), then all the experimentation and verification needed to complete that analysis. Sounds very similar to the game context.
Data collection the hard way
Analyzing games brings about a lot of interesting challenges for data collection.
Since the game is under control by a third party, a lot of data collection has to be done manually, entering and tabulating numbers on spreadsheets. I remember when I played a MUD (Multi-User Dungeon), which were text-only multiplayer precursors to the MMORPG genre, I wrote text scraping scripts to collect damage data to do experiments on how character equipment changed damage. Good way to learn regression back in the late 90s. This technique is still used today, for example a reload time calculation is derived through empirical derivation of a formula here in Azur Lane.
Meanwhile, speedrunners, people who try to run through a game as fast as they can under certain community-defined rulesets like no-glitches-allowed, or 100% completion, etc, face very interesting problems of figuring out how to optimize their gameplay for clearing speed. This includes things like intentionally taking damage to move faster, counting animation frames to figure out that sliding is slightly faster than running in a game, or just finding crazy glitches.
The community of players around the game effectively share notes and videos of each other, competing to get the best times. When someone stumbles upon a better strategy, that information is quickly disseminated and incorporated into competing runs. There’s been at least one paper I'm aware of that looked at speedrunning and found that some aspects of speedrunning are NP-hard. (Lafond, 2018)
Other games, people will datamine the game files, opening them for inspection, to derive stats and mechanics information that would normally be obscured from the typical player. This of course requires some pretty advanced technical skills, often requiring doing things like hacking open proprietary file formats to inspect them, or snooping network packets on the wire.
For more serious esports where large sums of money is involved, entire BI analyst roles can be created for professional teams and gaming companies, Microsoft had this year partnered with Cloud9, a professional esports group, to provide analytics. They’ll combine data about different players, teams, strategies used, with analysis done by analysts and data scientists to try to give teams an edge in competition.
The community of League of Legends, among the biggest esport gamse in the world in terms of money, has built up this giant world of analysis around the game going into so much detail that it overshadows a lot of analysis done at many companies. Over years, the community has developed its own language of how to understand, analyze, and strategize about the game.
Similarly, analysts of Overwatch started pushing a concept called the “Ultimate Economy”, not a reference to it being the final economy, but the strategic management of “ultimate moves”, which is a limited use ability that could be considered a critical resource on a team. This concept did not exist early on but as understanding of the game became more refined, it came to the fore as an important aspect in analyzing the game.
Very frequently, for very grind-heavy games, you’ll find players optimizing for time spent with various forms of resource/unit time… for example, this World of Warcraft grinding guide breaks down expected experience points/hour to identify places that are good for levelling up efficiently.
Doing some of this yourself
I know that I’ve just spit-balled a bunch of wild examples about games and the analytical depth that invested players will go to in order to understand a thing that they love. But more practically speaking, how can we use games to learn about data science ourselves?
It all starts with picking a game that you like to play, and loving it just enough to want to improve at some aspect of it. Then, take that motivation use it to analyze the game and develop metrics and strategies around it. Flex that creativity muscle.
For example, take the giant sandbox game Minecraft. There are tons of information on the wiki on the various mechanics of the game already. But try to define a goal for yourself, like for example finding a good mining strategy for certain materials. Blocks of resources are randomly distributed in the world according to certain parameters like depth and location. How do you go about figuring out a good strategy? For those curious without wanting to do the work, there’s lots of preexisting data for this task, and also whole tutorials for efficient mining, but imagine if you didn’t have access to those. How would you go about this?
There’s also room to flex your existing data science skills in this. Remember how games aren’t usually in your control and often don’t have APIs? Some people have used machine vision/OCR models to pull information out of games to run ML on, or just to automate data collection. There’s plenty of room to play here to apply our skills to understand and play better, without going through the trouble of trying to build an AI that plays the game well (an often-attempted and completely different set of concerns).
The more obscure and less-played a game you work on, the less preexisting data will exist and the more you’ll have to develop things on your own. Maybe there’s a community around the game to ask questions and compare notes, or maybe not. All of this sounds increasingly like what it feels like to work on a completely novel project.
So while you’re having fun, if you suddenly feel the urge to go down a rabbit hole of figuring out a better way of playing, go indulge that feeling a bit.
It’s good for you.
Resources and fascinating rabbit holes I fell into while/before writing this
Many speedrun records, their corresponding verification videos, and leaderboards are viewable at speedrun.com
No discussion about Speedrunning can forget about Games Done Quick and the many similar events its spawned. Since the players explain what they’re doing to the audience as they play, it lets people unfamiliar with the games follow along. (Youtube channel)
Liquidpedia is a central resource for major e-sports tournaments and information. There’s lots of data in the form of tournament win history contained within, and it can probably be used to analyze some aspects of games from that alone
Dungeons and Dragons (and plenty of other similarly-styled RPG games, tabletop or otherwise) has a phenomenon called min-maxing which is related but isn’t quite the same as the mathematical term. In this context it means optimizing a character’s stats to maximize a desired effect, like putting exactly the minimum number of STR points to allow a certain action and nothing more. It maximizes utility/a benefit, while minimizing waste of resources, based purely on mechanics. Role-players can find this undesirable (imagine a wizard with unjustifiably high dex just to abuse combat mechanics).
About this newsletter
I’m Randy Au, currently a quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. The Counting Stuff newsletter is a weekly data/tech blog about the less-than-sexy aspects about data science, UX research and tech. With occasional excursions into other fun topics.
Comments and questions are always welcome, they often give me inspiration for new posts. Tweet me. Always feel free to share these free newsletter posts with others.
All photos/drawings used are taken/created by Randy unless otherwise noted.