Optimizing for personal portability

Instead of mastering a thing

Jul 05, 2022

Shorter post this week due to the big US holiday weekend.

For the first time ever, I went up onto our roof to see all the fireworks exploding within a couple hundred meters from us tonight…

When it comes to how we set up our computer work environments, there seems to be a spectrum that we all fall onto.

On one end of the spectrum is the “heavily customize EVERYTHING” group. Their VS Code/Emacs/Preferred Editor is loaded up with a ton of plugins, productivity aids, custom shortcuts. These folk are often aiming to squeeze as much of productivity and convenience as possible out of their setup every day that they’re working on that setup. The price that these people pay is that they very often forget how to use the stock versions of their preferred software because the muscle memory is completely different.

On the other extreme of the spectrum is the “NEVER customize anything” group. They learn how to use everything in a default state, eschewing adding any plugins and extensions, changing shortcut configs, or even installing extra software into the OS. They get their work done using as much basic tooling as possible because it means that they can use practically any computer system they have access to with very little confusion. The price these folk pay is that some software can be horrible to use without customizations, so they have to balance a desire for portability with having any basic level of productivity in the day-to-day.

Everyone lives somewhere between these two extremes and I find that pretty fascinating to see where individuals sit.

Personally, I sit towards the “use defaults” side of the spectrum. I generally use vim and the much more limited nvi on BSD systems to do the simple programming tasks that I have to do every so often. I stick to using command line tools that are commonly available to get much of what I need done. But I do maintain a very short Github repo with my basic .bashrc, .screenrc and .vimrc files to carry over basic things like my prompt and status lines settings. I very often work without those files but it’s convenient to have them around.

The one thing I’ve never seen is anyone explain why people choose the strategy that they do. Maybe it’s personal quirk and preference, or maybe there’s something more to it. If you happen to want to share your own style and reasoning, let me know! I’m genuinely very curious what that perspective is!

Why I stick to defaults

My primary reason for sticking to defaults is because I place a high premium on the ability to be able to function while being thrown into random systems and environments. As I’ve mentioned before, I’ve been hit with layoffs 3 times already in my career, and each time means having to find a new job, while also not having time to get copies of any simple little utility scripts I’ve written at a position (I’ve written a “send a report email in Python via SMTP” script 3x and it’s an annoyance every time). Each new job means getting thrown into completely different database, OS, and production environment.

So instead of becoming a super master of a single toolchain like Linux+MySQL+Apache stack, or mastering all the tiny performance minutiae of using BigQuery or Redshift, I’d rather be a generalist that can use an ever-growing list of of tools to a competent level. By now, I’ve touched enough environments that I’m pretty confident I can be dangerously competent in a couple of days, which definitely helps me work closer with engineering to be at the forefront of data collection. Raw text log dumps tossed into HDFS? I’ll work with that!

Effectively, I’m optimizing for my personal portability. I’m never going to be as good as someone who’s deep into the weeds in their SQL database engine of choice, or a full time Kubernetes admin, but most jobs won’t need someone with that level of expertise anyway.

This personal portability optimization also extends into how I do things like write SQL queries. I try to avoid using too many macros and saved snippets — at least for the most common handful of tables I interact with. I try hand-write common queries from scratch if time allows. I find it prevents me from going into auto-pilot mode where I just copy-paste snippets and then forget how to actually do original work on my most used tables. This is especially important for tables that are constantly evolving or there are special combinations of flags that need checking depending on what situation is involved, it’s a way to keep forcing myself to actively think about what I’m doing.

Obviously, it’s not practical to write absolutely *everything* from scratch all the time, especially when the situation requires duplicating the exact results of a previous query, or when time and agreement between queries is important. For those situations I’ll actually tap into pre-written queries to match conditions and query strategies.

One unexpected benefit of being a far-ranging generalist is that I’m currently making use of the my knowledge of all sorts of systems to help the UX team I’m on understand how all sorts of different customers wind up using extremely different software stacks to accomplish the same infrastructure task. It’s easier to show how easy/hard it is to set up a given environment than it is to explain it with just words.

Perhaps our preferred strategy depends on our role

First and foremost, I’m an exploratory data analysis expert. That fits strongly with my role as a quantitative UX researcher. I do a lot of one-off, throwaway work trying to make sense of data before finally figuring out how to best make sense of the data I’m working with. The data sources themselves might be hacky, unrefined output that’s not “production ready” in quality by any stretch. It’s natural that I place relatively little importance in saving SQL snippets and code for the future.

Things would be much more different if I were doing data engineering in production systems. It’s a bad idea to just throw things together willy-nilly without proper code versioning and optimizing resource usage.

What I don’t know is which way the causality arrow goes. Am I in an EDA-heavy role because I strongly prefer to have a more flexible skillset? Or did EDA nudge me towards the flexible skillset?

I have no idea.

Stuff shared from the data community

Someone linked me to something they wrote a while ago about how data “Cleanliness” is an opinion. They shared it in reaction to my “data cleaning is data analysis” post. The overall sentiments start from the same place, that cleaning data imposes analytical decisions upon a data set. Towards the end, the author goes off in a separate direction than I, discussing how there’s different types of data problems that data cleaning is meant to address for different reasons. I find that breakdown pretty interesting.

Will Crowley, also linked me to a post they wrote about how nonprofits can use data and analytics to help understand their top doners more. The post includes a good sampling of the most useful research questions the non-profits should be asking of their doner data, with code examples showing how to generate the analyses. The example Python notebook code is probably a good reference for non-profits to use, but honestly I feel that so long as the non-profit realizes what question they should be asking, they could answer the questions in Excel or their own preferred language/toolchain.

Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.

About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.

Curated archive of evergreen posts can be found at randyau.com.
Join the Approaching Significance Discord, where data folk hang out and can talk a bit about data, and a bit about everything else.

All photos/drawings used are taken/created by Randy unless otherwise noted.

Supporting this newsletter:

This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:

Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!