More Learning in (Semi)-Public

An 3-month dogfooding update

Aug 11, 2020

Three months ago, I had written about a process of learning in public I had been working on where someone would write a journal that described, step by step, how they learned how to do a task, with an emphasis on providing data that someone interested in UX and product work would find useful. A kind of self-narrated usability session.

The inspiration for that post was an educational dogfooding project I was starting at work with a bunch of my coworkers. The task was something completely non-trivial using cloud tech. I currently work on quant UX for a small collection of of cloud products, and one thing was obvious over the past two years: no one who works on “The Cloud” fully knows all the of the products that constitutes the modern cloud market and this can cause problems.

Note: Since I don’t really talk about my actual work, I’m generalizing a level up so that statements I’m making should apply more broadly.

The problem to solve

Cloud offerings from all providers have expanded massively over the past 10 years, from infrastructure to managed services to full AI products. There’s often many “similar but slightly different” offerings from even the same provider. It’s impossible to know how to use all of them, and few even know what all the products are for.

The actual developers of cloud services themselves need to focus on a single problem space to develop their products, so they won’t have time to learn everything. Super-all-in enterprise users of the Cloud can’t be using everything that comes out because there’s no need to. Hobbyists are constrained by cash and won’t just spin up one of everything all the time.

Software engineers can be expected to have at least some passing familiarity with the command line, the concept of a virtual machine, things like clients and servers and software stacks. Since The Cloud is supposed to provide services that support those activities, that knowledge provides a starting point for any new engineer being placed onto one of the development teams, even if they’re fresh out of school with no industry experience.

Someone joining a cloud services team from the UX field usually doesn’t even have that basic level of knowledge to build on. Many of these talented folk come from a design background, or are entering qualitative UX research from the social sciences. They have no reason at all to touch and maintain software stacks and servers. Lacking this foundation makes it hard to communicate and understand the needs of users and what role a product plays in the market, important stuff.

How do UXers deal with this knowledge gap? They’re forced to play catch-up. It’s a very long, multi-month process where they ask lots of questions, do lots of reading, and make plenty of mistakes before they “get it”.

Imagine you were dropped into a DNS product with no knowledge of what a DNS server was or why anyone would need the service. Think of the complex web of knowledge you’d have to pull together to get up to speed. It’s like taking someone who’s only used a microwave before and dropping them into a professional kitchen and telling them to make meal for 10.

You’d have to first know what DNS is for and why anyone would need it. What do people usually do with DNS. What’s on the cutting edge of DNS tech. How do people usually set up DNS. What are the common failure modes and resolutions, etc. You need to have a basic grasp of such questions before you can effectively design what the interfaces of the DNS product should look like. Reading documentation and wikipedia pages only allows you a limited amount of insight, the rest must come from talking to engineers, product leads, and customers.

The people put in the effort to understand the thing they’re working on, and after a few months of work when they finally do get it, that improved worldview expands a bit more and now they find they need to get up to speed on adjacent technologies. Once you understand DNS, then questions come up about how that interfaces with networking, firewalls, BGP, etc. Places like Cisco has a massive array of certifications around these topics alone. Obviously there is no end to this rabbit hole and it’s limited by time and energy available.

I’d like to note that it’s not unheard of that designers of a system are not part of the intended end user. Lots of specialized equipment falls under this category, for example, F1 racing steering wheels are ridiculously complex and I’m sure none of the designers involved are current drivers. The methodologies employed by designers and researchers can apply to these products, and these people were hired for those exact skills.

But all those examples involve working very closely with experts who actually are the intended user, getting detailed feedback to refine the work. That involves a lot of time and coordination to do, which necessarily slows things down. In a very rapidly moving environment, that winds up being more stress to deal with.

Shrinking the knowledge gap

Seeing that there exists huge gap in knowledge between new hires and the product teams they’re assigned to, I wanted to help shrink it a little bit. Having previous data engineering experience meant I rare in my local group of having prior experience with the tech before coming to work on it.

I’ll preface this whole section by saying that I’m not some master of cloud tech either. I’ve spent my data career merely learning how the complex systems I’m analyzing function so that I could measure them correctly. Sometimes I’ll write simple data pipelines and data warehouses. While I’m familiar with how a company could potentially stack nginx, RabbitMQ, Python, S3 and EMR together into an analysis chain, I don’t know the detailed specifics of the full implementation. I’ve always had wonderful devops teams to lean on when I had issues. I’ve never personally administrated systems more complex than hobby servers before.

Things first started out as just fielding random questions, or explaining confusing topics like why were two similar-ish things like Redis and memcached existed and what they were for. What was the difference between Hadoop, Hive, and NoSQL, how did they all relate to each other?

Then things got more complex. Stuff that didn’t have straight answers you could just look up and I didn’t have direct experience with. How did people move many terabytes, even petabytes of data. What does it actually mean to have to “manage a data pipeline”? How do people avoid security breaches? How to Compliance?

After a lot of thinking, the only conclusion that made sense was we had to learn to use our own technology. Hands-on. Because it wasn’t clear exactly where our knowledge gaps were, we had to discover the holes for ourselves, before we could get them filled.

Knowing that we had to roll up our sleeves and actually build something using the stuff we were working on, we had to decide what to actually do. Whatever we chose to build couldn’t be something as simple as “create your own Wordpress blog on AWS/Azure/HVP”, the sort of typical “intro to the cloud” stuff you’d see on the internet as a blog post. We needed to work on the various products we were actually developing on and that takes some creativity. (That said, I’ll admit that AWS’s post on best practices for hosting Wordpress on AWS is… significantly more complex than spinning up a VM.)

Imagine if you were working on Cloud-based AI tools, or a data analytics pipeline product, or a queuing system similar to RabbitMQ. Now sketch out some kind of architecture that makes use of the product like a typical user would. It takes a fair amount of technical understanding to even sketch out something that makes sense.

Then taking that napkin sketch, actually attempt to build it from the ground up while pretending to be a newbie engineer given a mission by your boss and you have few engineering friends to do the work for you. That’s the magnitude of what we were attempting to do.

Building an IoT stack as a project

Given the minimum required complexity constraints we (well, I) forced upon ourselves, I decided that we should attempt to create a kind of toy Internet of Things system. We’d build fake telemetry sensors in software and scatter them around the world, then they’d send their data over into an analytics pipeline, which stores, processes, then displays results.

Even the initial concept sketch alone covered 4 major products since we needed VMs, data storage, a queue, and analytics just to get the basic functionality. As the concept got fleshed out more and more, we found out we needed more services, different flavors of storage, etc. Three months into the adventure, we’re perhaps halfway through building out the design and have already touched 10 different product offerings. Only as of last week, we can finally generate fake data and send it out into the pipeline for further processing.

We haven’t even started on the data analytics and display half of the stack yet!

I’m lucky that I’m not on a team that handles something way up in the stack like AI, because we still have a lot of work to do before we can get there. That said, I probably would’ve picked a different project if my goal was to use AI tools which would cut out a lot of the underlying infrastructure work.

I’m honestly not sure what kind of project I would’ve created if I had been tasked with something like networking.

What we learned along the way

Randy Au @Randy_Au

for this toy demo project.. I just wrote a cat-on-furniture simulator... picks a furniture object bed/sofa/table/human randomly sets a cat population according to an uneven distribution random-walks the # of cats on the object >_>;;;

Building these systems is… tricky business.

One more constraint that we had placed upon ourselves during this project was that we’d do our best to NOT rely on our internal corporate position to learn. We should pretend to be tiny external users who don’t have privileged access to a sales manager or dev advocate. Most users can’t just pop a quick message to the engineers who wrote the system we’re failing to use to ask questions. Leveraging that perk would make our experience far too different.

Early on, while we were setting up and things were largely doable purely with the web UI, people could join in and actually pick up tasks to do. Someone who wasn’t an expert could definitely decide which datacenter region we wanted to be in. Someone else could volunteer to take up the role of security manager and make an honest attempt to muddle through the mind-blowing complexity that is security and access permissions for any cloud system.

But as the low hanging technical fruit got picked, fewer and fewer people had the skills to directly contribute. Not everyone can just sit down and write a program that interfaces and sends IoT telemetry data to a central server. Not everyone can build out a system that can scale that out hundreds of these fake sensors. For someone who has never even wrote a program before, that would take potentially weeks of study.

As time went on, our weekly meeting turned into a “So what did Randy figure out how to build and why did he build it” talk + Q&A session. While I’d love it if someone else would take the time to attempt to write some software, or figure out how to connect to a system, I understand that my colleagues had other more important work to take care of. So we use my learning experiences, and my failures, as the jumping point for having Q&A about the systems involved.

That said, the sense of imposter syndrome is very real here. I feel like I’m playing data engineer on TV every moment I’m working on this project. I’m positive that some of the architectural and software design choices I’ve made are massively suboptimal. There’s even a few things we do that are deliberately bad but allows us to use some products we wanted to try. My code style is obviously garbage and wouldn’t pass muster in the official codebase. There’s no unit tests, or ANY tests, to speak of.

I constantly have to mute the internal voices of doubt over all this stuff and more, because the goal isn’t to be technically correct in every way, but to get something working and learn. It’s completely like building an MVP for a startup.

Failing in public definitely feels weird

Since I’m learning many of the things I’m using along the way, I’ve had a bunch of rather spectacular failures that I had to document and then give entire talks about. It feels akin to reliving those embarrassing moments from high school, but then being forced to go on camera to discuss it. In detail.

The most facepalm-worthy incident involved me wasting an entire day trying to get a new VM to execute a new startup script I wrote. I had a separate machine template made already that did this exact thing, so how hard could it be?

Long story short, in my overconfidence I had confused the commands for executing the script from a remote URL vs script directly pasted in to the configuration. Took about 6 hours of debugging, thinking it was a weird permissions/access/storage/whatever issue and running around in circles before figuring out I needed to tweak a single creation command by about 4 characters.

I think anyone who’s worked with software will have many similar stories about simple mistakes like this. The notorious semicolon or quote mark that takes hours to debug. Normally that’s the kind of stupid mistake that make you just throw up your hands, close your laptop, and grab a strong drink to end the day. But I had to come back on Monday and talk about it. I could have cheated and pretended it never happened, but that goes against the promise of learning in public that I had made to the group.

And so I gave a good laugh and presented with a smile the day I spent a day going in a big pointless circle.

The UXers found it fascinating that someone could so easily use up so much time on such a little detail. They were aware that such errors could happen. you hear examples on the internet all the time, but never actually encountered the phenomenon in a way where they could ask questions about it. Lots of questions and discussion was had over how I might have accidentally gotten the wrong mental model, how I debugged things, what it took for me to get back on track.

The whole discussion was valuable exactly because this was the sort of thing that gets swept under the rug. It’s a complete positive that this thing happened. I just wish I didn’t feel like I was walking around without a shirt on while explaining it. (I’m from NYC, we don’t normally go around without shirts).

We’re building empathy here

If you google around, you’ll find that the concept of building empathy with users is an important part of creating good UX. It’s how we can understand of what users are trying to achieve, as well as understanding the pain and frustrations when things don’t work out.

Everyone’s building lots of that by experiencing either for themselves, or through me saying things like: “I didn’t have time to figure out security settings yet, so everyone gets admin-like roles until I find time. And yes this is bad practice but we’re doing it anyway.” They also get to see my face on voice chat as I groan about how viscerally painful something is to figure out.

It remains to be see exactly how this empathy will affect things going forward. It’s hard to directly attribute these things. But I’m hopeful that there’s an effect, somewhere, in the fture.

As a kind of bonus, we’ll (very rarely) find bugs and issues and get to file them. That’s something a typical end user will never be able to directly do.

Despite it all, things seem to be going well

While it first seemed outlandish to drag 10-15 non-technical people along on a journey of using a highly technical product space, things are working out. There’s a lot of stretching and learning for everyone involved, but we’re not crashing and burning like my worst fears.

So the method seems to be working. So I’d like to encourage people who are thinking “Huh, that sounds like a cool idea” to seriously consider rallying a group of people to give it a shot. A sufficiently motivated group, and possibly a very motivated lead, will be able to plow through the obstacles that pop up.

The one thing I should note is that things took a lot more effort than I had expected, but not more than I planned for worst-case. I almost dedicate roughly a day of work a week to this (COVID-19 is making time a squishy concept). For my schedule that seems to work out right.

If anyone does try it, hit me up, I can probably give some more specific advice. At the least, I just want to hear if other people are doing similar cool things.

Counting Stuff

Discussion about this post