The many faces of "Production"

It's not all about big, complicated systems

Nov 02, 2021

A lot of hand carved pumpkins at a display in Westchester, NY

Every Wednesday (around midday East coast time) I’m making an effort to create a “No stupid data questions” thread (example) where I do my best to seriously answer any data related questions, or find someone who can better answer them. If you happen to see one, jump in!

Quick association test. What’s the first thing that comes to mind when you hear the words “in production”, in a data context (and not related to making a film, theater, or similar).

Most likely, one of the first things to come to mind is an image of code and software being pushed to “production systems” and will get used by millions of people across the world. It's a collection of systems, a special state, that we are supposed to be careful about touching because we might accidentally break it, with Big Bad Consequences like lost revenue or reputation.

But that notion is really nebulous. Is it just an engineering notion that many data people don't ever have to think about touching, a place mostly left for engineering? What are the boundaries and properties?

As I kept poking at the concept of “production” in my mind, I started hitting semantic satiation and had trouble even understanding the word itself. Why the heck do we call software in a certain state to be called “in production” anyways? Some folk on stack overflow were debating it with wildly varying theories and zero consensus.

My personal head canon for the etymology, with absolutely no research or proof behind it, is that software engineers borrowed the concept from manufacturing. Designs can be put into production to make widgets and they're the direct thing that users will experience and they're hard to change. The production lines are distinct from everything else, and are directly related to the success of the business. Data science, taking much of our terminology from engineering, just inherited that concept wholesale.

But in the modern usage, I think that “production” can take a simpler form.

Production for a system is the state where others (people or systems) will be relying on the function, or output of the system.

That is to say, there’s a promise of functionality made that must be kept lest negative consequences happen. A web site's production environment is the one our users see and rely on. No one really cares in the same way if our dev environment crashes, because there's no promise or reliance upon the dev system.

This seems a simple definition but I'm going to run with it as far as I can and see where it goes for data people.

“Production” for systems with users and clients

Back to the most common example of a production system for a data person, the ML model. The thing the data person builds is exactly the thing that other people will be using. Here is this complex set of code and software that needs to run on a regular basis, and other software relies on it's output.

If the ML system goes down, other parts of the business suffers. If the system can't handle the request load, it goes down and again the business suffers. Expectations about performance and reliability are set by others.

This expectation is the source of all the pain and work involved in bringing an ML system into production-ready state. We have to make the system scale to handle the load requirements. We need to account for as many edge cases as possible that could cause the model to generate pathological results — things some might consider malicious attacks. Then there’s all the practical considerations of making sure that the ML system can even successfully interface with the other systems it needs to.

Solving the many issues and edge cases around these systems is why “productionizing” an ML system is a special skill set on its own. Not everyone that works with ML can be an expert in all aspects of performance tuning and scaling such a system, it’s often a team sport that involves engineering effort from all sorts of places.

Of all the data subfields, this is probably the one they demands the most sheer engineering ability.

But what is production for an analyst?

But what if your primary job isn’t to make software artifacts like an ML engineer or model builder? What if your role is as an analyst, being the eyes and ears of the organization?

Now instead of delivering a functioning piece of software, people are relying upon you to bring a steady stream of useful information. Very often this comes in the forms of dashboards (gasp) and regular reports like emails or presentations. Just like with highly engineered data products mentioned above there is an expectation of correctness, reliability and timeliness here.

Just like how people need to know that their windshield is showing them what’s actually happening outside of the car when they’re driving, the primary maintenance work of production in this context is making sure that the information presented to people who are using it to make decisions is good enough to be used for that purpose.

Unlike the big complex ML models in the previous section, these problems are usually less complex from an engineering standpoint. Don’t get me wrong here, there’s still a ton of very difficult work to be done. Data measurement and quality can always break down and need fixing. Metrics are constantly being swapped in and out as the business changes. There’s always a chance that people can misinterpret what’s being shown despite your best efforts to communicate. But even with all those issues (and more), the problem is better defined. There are lots of products and vendors out there promise to help, if not solve, many of those problems.

For handling data processing and repeatable analysis, there are tools like dbt, Airflow. Such software will help you manage all the complexity between the maintaining the back-end processing pipelines. There are also endless tools for the display layer with things like Tableau, Data Studio, etc.. You can also just opt to build your own stack of tools from many off the shelf parts. This is currently a very hot space and new things keep popping up, and many places wind up using multiple tools on top of building some of their own to meet all their needs.

Okay, now what about insights-focused people?

So far, there’s been a pattern here of “production equals making some code run and then keep it running”. The scope and skills can differ pretty wildly but it’s the same basic idea. Time to go off in another direction.

Right now, I’m a UX researcher. Prior to that, I used to be an extremely insight-focused analyst. This meant that half or more of my time is dedicated to understanding a new and unique problem and providing a single answer to it. Days, weeks of work is crystalized into a single speck of knowledge like “people prefer it if there is a human face or name somewhere in support email responses”. This bit of knowledge, once crystalized, does not change very quickly if at all. It has no computational infrastructure needed to support it once the analysis is completed. From a technical engineering standpoint, things look easy.

But, in a very peculiar sense, the production environment has shifted to become… the minds of the people in the organization. Knowledge is only useful when people have access to it when they need it, so they need to have learned it previously. It does no one any good if I discover the secrete of making unlimited money and eternal life, tell it to one PM, who the very next morning leaves the company or gets hit by a speeding car. The knowledge must be spread far and wide to reach its full potential.

This dynamic shifts how everything is done. Deploying our work is effectively all those presentations to important stakeholders, as well as all the encore presentations to people in various places in the organization that should be hearing the message too. Hopefully this highlights the absolute importance of disseminating findings.

I know that many people, myself included, can feel that the hours spent preparing to present and share findings is unproductive compared to “Real Research”. It feels like wasting time fussing over the content of slides and making presentations clear and visually appealing. But if you consider how we’re not doing our jobs if we can’t get people to actually use the insight we generate, it’s one of the most critical parts of the job.

The same reasoning applies to sharing the knowledge to wider audiences beyond immediate stakeholders. This can feel like even more work “wasted” gathering people from seemingly unrelated parts of the business to see a talk that needs to be tuned to an audience that doesn’t have much context. The reason why there’s value here is because knowledge tends to yield unexpected benefits when combined with other bits of knowledge. Serendipity is a ridiculously powerful force when it chooses to appear.

So yes we need to share, share, share! Even when it feels like self promotion!

But guess what? That’s not the end of it. Organizations, especially larger ones, are constantly churning people. Individuals and teams are constantly coming and going. We're not only fighting the passive network forces blocking our brilliant works of pure Truth from reaching people, we're fighting to overcome an active and constant organizational brain drain.

Us vs the Tide

Battling both of these forces is the central theme to “production” for knowledge. Just like we must plan for our data pipelines to fail, we must plan for our colleagues to forget (or need to learn for the first time) important insights.

I don't think very many organizations have a unified strategy for this kind of thing. It's solved with a patchwork of solutions, like prepping onboarding materials, and codifying much of the knowledge into the tacit knowledge of the organization, what I once called our collective organizational literature. It’s just a collection of ad-hoc chaos and unsatisfying fixes.

For some of the teams I've previously worked in, we've taken things a step further and peppered important reminders in various places like in all-hands presentations. Whenever discussion of a strategic initiative or project comes up, we try to make sure the underlying reason, the nugget of truth that drove the decision, is also mentioned so that people don't completely lose their focus. While this method isn't perfect, and you can't remind everything of everything forever, it is definitely better than nothing.

So I think the state of the universe is that “prod” is the most fleshed out when it most resembles engineering, because engineers have been doing that exact sort of thing for years and we’ve learned from them. Meanwhile, the “knowledge-prod” part of the world… hasn’t really changed because I don’t think we’ve really looked at it in the same way? I certainly hadn’t until I started writing this.

So yeah, I think a bunch of us have a project on our hands.

About this newsletter

I’m Randy Au, currently a Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. The Counting Stuff newsletter is a weekly data/tech blog about the less-than-sexy aspects about data science, UX research and tech. With occasional excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise noted.

Curated archive of evergreen posts can be found at randyau.com

Supporting this newsletter:

This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:

Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s supported me!!! <3
Buy one of my photo prints
If shirts and swag are more your style there’s some here

Counting Stuff