Dashboards don't break themselves

Humans break them

Jun 28, 2022

I generally don’t comment much on politics and current events. But allow me to express my profound rage at the current US Supreme Court’s utter bullshit, both last week and for the foreseeable future.

In data science, there is at least one law we cannot avoid — dashboards and similar information displaying systems will break. It’s inevitable.

Dashboards sit at the very end of a long chain of systems that are always changing independently of one another. While this always seems like a problem that can be solved “if we only engineered things better”, I remain very skeptical that the problem could be solved in the general case with such simplicity. We’ll get into that.

The impetus for this week’s post was when @grimalkina (Thanks!) brought this thread to my attention and pointed out how this is probably just the tip of a giant iceberg. Someone had found that many huge tech giants and government web sites were failing to update their information about how 6 month to 5 year old children can finally get vaccinated against COVID-19 now, despite how relevant government agencies had announced the rollout in the US.

melody joy kramer @mkramer

It looks like Facebook changed its COVID dashboard to remove information about finding a COVID vaccine, rather than updating the information about 5 and under. Today:

The TL;DR for the thread is that a bunch of information sources for COVID information, from a public dashboard on Facebook, information boxes in Google’s Search, the HHS.gov website, had not yet been updated to reflect the newest information about child vaccines availability. More specifically, as of this writing, the CDC recommends children 6mo-5yo get vaccinated. Meanwhile other sites like, Bing, an DuckDuckGo were displaying the correct information.

I should note that the thread caught the attention of various people at various organizations and many things may have already been updated to better reflect current guidance by the time this publishes.

What we have here is massive distributed failure on the part of many different parties to provide the public the most up to date health information. In terms of “important things to get right as soon as possible”, this is pretty high up there. There’s no way to tell how many people would have seen those messages and come to an outdated conclusion that they won’t revisit for some length of time.

What’s important to note here is that there’s unlikely any kind of malice or even incompetence involved in the current situation. Instead, the situation is a visible example of how even extremely critical information tools and dashboards are often sitting upon a fragile house of cards where a single change can cause the whole thing to crash.

A house of cards gets built on shaky data sources

Like with many dashboard projects, COVID dashboards were an afterthought. Each individual system was engineered to handle certain tasks, and then a surprise “Show COVID info” requirement was bolted on later.

Back at the start of the pandemic in late 2019, early 2020, every individual organization took stock of the systems they had control over, and decided what “made the most sense” for their system to achieve the intended goal of providing relevant information to users.

Given the chaos at the start of the pandemic, everyone just slapped together something that gave desired end result without an extended development period. Those initial methods then evolved over time from quick and dirty hacks to more refined methods of sharing. Facebook created a dashboard page centralizing information, Google triggers a block of informative text in response to certain search queries, individual government agencies put up special pages and announcements with their current guidelines. Now, over two years into the pandemic, there has been plenty of time for organizations to run experiences and do research to have a more refined way of showing the information.

But even with the “how do we show this information” part of the dashboard mostly “solved” now, what information do we actually put in the dashboard? And more importantly, how would we keep the information up to date?

For a typical dashboard that data scientists create all the time, our answer is some variation of “just pull the data from the database”. Sales, user counts, almost everything we want to report on is stored in the database and there are usually guarantees around the freshness of that data. Depending on our use case, we can engineer and tap into systems to give us a near-real time analytics dashboard (nevermind that most people will never have a justified use for “real time analytics”). Whatever the use case, we almost never have to worry about our data being particularly stale.

We don’t have this luxury with COVID. The data has always been a mess, and changes in recommendations from places like the CDC or local government health departments are constantly changing.

The now-ended COVID Tracking Project has lots of blog posts about the work needed to compile COVID data together into something usable. There’s a whole post alone on why they don’t fully automate their data collection — humans needed to be in the loop to handle individual quirks that arose in every individual state’s handling and publishing of data.

So imagine if you’re the engineer tasked with building out these features at any of the aforementioned companies from day one. At the start, you’re in a rush and there’s no established source of truth yet, so it’s probably easiest to just hardcode the message and figure things out later.

But once the initial launch urgency has passed, what do you do? Assume you want to follow the latest guidance from the CDC, announcements from the CDC are usually in the form of public announcements, updates to their main COVID pages, or entirely new pages dedicated to new guidance. There’s no guarantee that there’s a standardized format, or that a method that had been previously used would be used for any new updates. It’s primarily a collection of free form text.

Do you use NLP tools to try to extract the information you need? Do you use some hacky heuristics that scrape the CDC’s page for updates? Do you try to solve the problem of Artificial General Intelligence to build an automated updating system? We’re probably better off simply keeping the text hardcoded, and then having a human keep up with the news announcements for big policy shifts. Having “a guy with a spreadsheet, refreshing a page” should work well enough, right?

So the manually curated hardcoding is to blame?

Not exactly.

There are plenty of systems that rely on manually entered data. Items moving around in a warehouse can often involve just a human entering a record into a system that the object was moved to another location. Cash sales from a cashier’s register in a store needs to be closed out and counted at the end of every shift or day. Grades for classes in school need to be entered into records systems.

So the mere fact that a human has to manually update data isn’t enough to cause data dashboard chaos. What seems to be the difference between COVID versus these more mundane data entry examples is the lack of process to prevent such highly visible data errors. The COVID dashboards like don’t have a well-refined process for what to do “when vaccine recommendations change months after the last vaccine policy change”. Processes for making sure that these COVID dashboards are always up to date have not had time to be established and debugged.

Was it someone's job to keep up with the news and announcements and update the hardcoded information? It’s guaranteed that isn’t their only job, nor even a high priority one. What, if anything, is motivating them to stay on top of things?

Are there people who are supposed to check and verify that changes are correct and the text copy is easy to understand? How many people besides “that one engineer who put the text on the page” are involved? What’s their work prioritization?

Since a number of months has changed since we’ve had major updates to the vaccine landscape, it’s entirely possible that whatever people had been responsible for building out the original dashboard features have moved on to some other project or even an entirely different company. If those people moved on but didn't designate someone else to pick up where they left off, things could easily fall through the cracks.

I’ve seen instances at various places where some obscure system was mostly maintained by one person and they just happened to be on vacation for a week when a change was needed. It was sometimes easier to wait for them to come back than try to understand how to do some obscure task. I don’t think it would apply to something as important as these mass healthcare information updates, but I’ve definitely seen it for smaller non-critical things.

Whatever the reason, the humans involved in the system weren’t on top of the vaccine news to update the dashboards in a timely fashion. It’s a failure of process. It usually takes many iterations of failures like this one to point out the process issues surrounding such systems. Even the best systems fail and are improved with post-mortem analysis.

Distributed systems are going to do distributed things — to our detriment

Finally, the COVID dashboards are an extreme example of a distributed data system. You have different, completely independent groups who are either publishing or consuming information and there is practically zero coordination between them. Health departments don’t really consult with tech companies about what’s the best way to publish data and what guarantees exist. Facebook’s handling of the data and display is completely independent of how Google, or Microsoft, or Apple decide.

All this strict independence means that issues that would normally be solved with a meeting between teams to come to agreement on how to share data simply doesn’t happen. The awareness that “someone is relying on our data to be in this specific format, so we shouldn’t change it without warning” is very low. The end result is that building a dashboard that’s supposed to pull and aggregate and synthesize information from all these places is subject to the whims of everyone involved.

Imagine if all this COVID dashboards and official recommendations were being handled internally in some company somewhere. The Search Info Box team would be able to meet with the CDC team and request that when big policy changes are coming up, that they be notified ahead of time so they can coordinate updates. The various health departments could be mandated to use a centralized data warehouse with a uniform schema to make analysis easier. Alerts could be set up to notify everyone if data quality issues are spotted. If someone’s thinking about changing how they do things, they’re aware that others are relying on their information and can choose to proactively reach out.

Everything would be so much easier!!!

Note that much of the lowering of difficulty isn’t due to technological or engineering concerns. Things are easier because the human communication element is simplified.

If everyone agreed to notify each other if breaking changes were to occur, we wouldn’t have to write so much error detection code to begin with.

If everyone agreed to use the same general data schema and rules to report things, we wouldn’t have to have humans manually monitoring over 50 health department data feeds.

If teams could agree on how vaccine recommendations would be communicated out, a messy process that involves a human manually making changes could actually be automated.

Just as much as humans are often the source of the problem in most situations, humans are the solution.

Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.

About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.

Curated archive of evergreen posts can be found at randyau.com.
Join the Approaching Significance Discord, where data folk hang out and can talk a bit about data, and a bit about everything else.

All photos/drawings used are taken/created by Randy unless otherwise noted.

Supporting this newsletter:

This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:

Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!

Counting Stuff