Earlier in the week, at the behest of a friend, my wife went and signed up for one of those meal kit services where every week a company sends you a box of ingredients and recipes, the idea being that you follow the instructions and after maybe 30-60 minutes of work in the kitchen you wind up with a home-cooked meal. The cost is…. marginally… cheaper than eating out.
For people who are too busy to plan meals or grocery shop, or people who are otherwise not confident in their own cooking skills or learning, I think such services are worth a try since it’s a good gateway to learning to cook things on your own since they do require users to learn the most basic of knife and heat control skills to put together a simple dish.
But for someone that enjoys working in the kitchen, completely nerds out over food science and kitchen gadgets, and has made the vast majority of the family’s meals over the past 9 years… the value proposition is much less convincing. It’s an interesting way to try out new recipes on occasion, but the cost per servicing is still significantly higher for someone who is used to all the work surrounding making food anyway.
But here’s the thing about the service that inspired this week’s post — the various recipes are labeled with a difficulty: beginner, intermediate, expert. The expert recipe when you abstract the steps down, had three “big” cooking actions that must be done separately. It seems that easier recipes would have fewer disjoint steps, going all the way down to simple 1-pot meals where you just throw everything in and heat up. “Expert” was likely a measure of the time needed and how many things you needed to juggle to get it down.
The thing that struck me about the difficulty rating is that they didn’t seem all that involved as I was working on them. Of course you needed to sear the chicken before shredding it (raw meat doesn’t shred the same way). Obviously the vegetables needed to be sauteed separate from meat because you’d otherwise get a soggy mess. Someone who knows all these minutiae would be able to make the dish quickly, probably even doing steps simultaneously because they’re used to juggling two hot pans as they go. A more typical user would be working on each step carefully to avoid being overwhelmed.
So an expert would say that the dish is pretty easy because they’ve already internalized the billion minor details to the point where they’ve forgotten they’re even using the knowledge.
I once showed a friend how to make seared scallops — take a hot pan, add butter, plonk the scallops down, brown on both sides, plate, make a sauce with some water and the fond in the pan. Despite demonstrating and explaining, somehow that explanation fell short and they couldn’t do it on their own (and I had already warned them to get dry, untreated scallops). I have no idea what details I’ve failed to communicate to get the recipe across.
No amount of introspection can save any tutorial writer from themselves
As an example, I dug up a tutorial about using the GLM package in R to do regressions. It’s not especially bad for the genre, if anything it’s fairly middle-of-the-road. It even gives a very short explanation to when you’d want to choose the Poisson model versus the Binomial one. They also have a much longer post about various forms of linear regression in R (via lm()
). Either way, when you read through them, the tutorials have a really uneven amount of knowledge assumption. They’ll bounce between basic fundamental theory, occasionally flashing a fancy-formatted math formula, jump to brief examples of building a model, then jump back to checking the standard error and F-statistics for the model (without explaining how to interpret the values for the example at hand).
If I had no statistical background, I’d be a bit overwhelmed at how I’m supposed to know to check how my data fits model assumptions like “residuals are distributed normally” (despite how many of us bend normality assumptions quite a bit in practice ). I’d also be at a loss at reading the summary()
output and interpreting the associated p-values. Luckily, the code presented is extremely simple thanks to how lm()
/glm()
are designed, so there’s not much to say about it besides “yeah, this is just how you execute these operations”.
What frustrates me about these tutorials is that they’re supposed to be for people new to data work, but it assumes a pretty significant depth of understanding why you’d want to use various methods and when how to interpret the output. Meaning if the post made sense to you, you very likely didn’t need the post except to perhaps check syntax.
In trying to be concise (and not have to rewrite an entire statistical methods textbook), the authors are having trouble reaching the very people they’re supposed to be reaching. Just like I’m not conscious of the mountain of knowledge I’m relying on to cook a single scallop, these authors aren’t aware how much distance stands between them and their audience. They needed to have these tutorials tested against real people who are trying to learn data analysis, then take in feedback as to what made sense or not.
If we, collectively as data practitioners, want to welcome more and more people into our ranks, more of us need to do better at creating materials for those newcomers to reference and learn from. We can’t keep churning out the same tired “Become an data scientist by learning XYZ!” posts that barely explain anything and are oversaturating the SEO space. These tutorials that are of dubious value are also up there in terms of things we need to reconsider.
A better example
Here’s an example of a tutorial meant for “beginners” of a topic that is significantly better — basic image classifcation with Keras. What impresses me about this tutorial is that it assumes an extremely low amount of prior knowledge while still maintaining brevity. It doesn’t assume readers know any specialized AI/ML terminology at all. It goes so far as to explain what the Fashion MNIST dataset it’s using is, how to set up the training/test sets, how the data comes and needs to be preprocessed (as the reasoning for doing so!), all the way to building out and even evaluating the model. Working through this tutorial, I’d have slightly more confidence that I could try to customize it on my own and things might actually run. You can tell that a significant amount of editing and effort went into the tutorial.
At work, I work with awesome UX Writers who have long conversations with engineers, researchers, and customers to figure out what ‘s the best way to document and explain highly technical topics to end users with unknown levels of technical skill. It’s all the more impressive because those writers often don’t have engineering backgrounds and had to carefully learn all the wonky ways tech people name things odd ways. If every one of us had access to these folk whenever we hit the big “publish” button, we wouldn’t be awash in “meh” posts.
But the sad truth of the universe is that we, creators and writers of data and tech massively outnumber these skilled writers. We can’t rely on them to do our heavy lifting for us. We must learn to help ourselves by helping each other.
Everyone needs help being clearer, luckily, everyone can help
Since all authors have blind spots surrounding knowledge they take for granted, the only way out is to get someone else to read the thing and point out places that confuse them. This can be done by simply asking for a bit of clarification on a specific point or term, or maybe requesting a link to some kind of reference. Assuming the author incorporates the feedback, the tutorial gets better for everyone. Yay!
In an ideal world without malicious/argumentative people and everyone being forever willing to maintain a 15 year old blog post, this would totally work out! As with many things, there’s a lot of external issues involved.
My personal feeling is that there is definitely appetite for getting good feedback from readers, especially for any recently released writing work. So if things are kept within good social norms for politeness and helpfulness, everyone reading should feel like they should reach out to writers with constructive comments.
Yes, this includes my writing here. I’ve got a relatively thick skin thanks to flame wars of the bygone internet, so I’ll gladly take anything y’all throw at me.
Side note: while searching for an example of a bad tutorial, I came across one that came out of a content mill, masquerading as some online course site, that had an author that was completely clueless about everything. It included such gems as “the co-efficient[s for the glm model] are non-significant as their probability is less than 0.5.” I wanted to write about “poor tutorials” and not “tutorials that should have never been written” so I found another reference. I’m not going to link to it since that’d just serve to give it traffic it doesn’t deserve.
In addition to giving good feedback, we really should find mechanisms for calling out BS on garbage.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.
Curated archive of evergreen posts can be found at randyau.com.
Join the Approaching Significance Discord, where data folk hang out and can talk a bit about data, and a bit about everything else.
All photos/drawings used are taken/created by Randy unless otherwise noted.
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!
We need to help each other write better
The curse of knowledge is so real when it comes to writing documentation! I think it is the wrong approach to say "the only way out is to get someone else to read the thing and point out places that confuse them."
You state earlier in the article that we can't expect to rely on the fewer-numbered technical writers in the world to do all the great writing for us, and it's also true that we can't rely on asking for help from whoever is closest to give us helpful, actionable, relevant feedback on our writing.
It's crucial to consider the audience of what you're writing, as well as how representative someone is of that target audience when asking for feedback. Feedback can still be useful when it's given from someone that isn't the target audience of your piece, but it still needs to be evaluated in context (just like any data!) Bob Watson has a great blog post on how to evaluate feedback (documentation user research): https://docsbydesign.com/2022/04/04/tips-for-conducting-documentation-research-on-the-cheap/
I also wrote a few blog posts that can be helpful in this context:
- for non-professional writers who want to get better at writing, read this: https://thisisimportant.net/2020/12/23/how-can-i-get-better-at-writing/
- for people who want to document a product from scratch, read this: https://thisisimportant.net/2021/09/21/from-nothing-to-something-with-minimum-viable-documentation/
- for people who want to rethink how they approach documentation and what it's for, read this: https://thisisimportant.net/2022/07/25/write-better-docs-with-a-product-thinking-mindset/
I have a lot of opinions about documentation, and I'm really grateful that you're speaking up about the importance of good documentation, clear writing, and that writing well is a shared and valuable goal!
Documentation is hard. One of my issues with R is that package documentation often does not go very deep. I am currently struggling to find a path to create filled contour maps and overlay them on openstreet map basemaps - and it is super frustrating. Lots of online tutorials, but no one explains the data structures very well, or what the constraints are, or what the commands are really doing under the hood. So I'm having to pick that all apart with lots of failed attempts. Years ago I participated in the SciPy documentation effort (I got the T-shirt!), and it was a lot of work, but hopefully has helped people out since. I want to have a model in my head of what a command is doing. Then I can figure out how to use it. If I know that "everything in *nix is a pipe", then I can usually guess how something will work.