Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
Data mishaps night (Feb 23, free, no recordings) has announced their speaker lineup! I AM SO EXCITED!!! Go see and register!
Typically, this newsletter writes a lot about the craft of doing data science work. But this week I'd like to reflect a bit on the craft of commenting on the work of others.
As data teams grow beyond solo positions and as data scientists themselves get put into various degrees of leadership roles, the work of providing feedback and doing reviews becomes increasingly important. This is true even if you don’t pick up a management role — there will come a time where you’re a more senior person on a team and will be expected to use that experience to help junior team members with their work.
Feedback processes exist all over the place
In the software engineering world, the most concrete form of feedback giving is the code review. There are many guides and resources available now, like this one from Google’s eng practices, with advice on giving good code reviews. The UX design teams around me have a process they call “design crit” which are effectively review sessions where a design is critiqued and feedback given by peers and more senior folk. These are generally very similar processes that puts emphasis on the collaborative nature of modern software engineering and design. While the terminology and processes behind them differ due to cultural differences within the fields, they are surprisingly more similar than not to my eye.
But for over a decade of my early career, I was on neither of these sorts of teams. I was rarely even on a team with more than one data person. I only worked with SWEs and designers tangentially and wasn’t involved in their nitty-gritty day to day work and thus didn’t have much chance to participate. Instead, I got thrown into learning about doing reviews through another, older, route — editing and translation checking. Plus, I had to reinvent things from scratch there. But then, there’s some good things about having to (partially) reinvent the wheel — you get to see a bit of why some things become best practice… by doing it wrong first.
I should make a note here that by “editing” in the “editor of a book” sense is a specific job role in the publishing industry. Unlike what many assume, it’s not the primary job of an editor to go through a text correct relatively minor typographical and grammatical mistakes, that is primarily left to proofreaders to catch. Instead, editors are tasked with improving the overall structure, clarity, and flow of a work. They can suggest rearranging sections, adding/cutting sections, and suggest rewordings and other deep changes. A wizened old book editor I once spoke with cynically described it as taking the “confusing word soup coming from a author and turning into a clear and easy to read book”. I’ve been told the variation in readability between authors can be massive.
The process of editing is often a collaborative process, with comments and marked drafts passing back and forth between author and editor. But it’s not a requirement, for example, when an editor is gathering and arranging a collection of papers or poems for a compilation, their work is creating the flow and readability through the selection and ordering of texts without touching the specific things within them.
Translation checking is a not-well-defined process where a second translator, or sometimes a bilingual non-translator, checks over the translation of another to provide feedback. The main goal is to find mistranslations (like saying a person had or hadn’t eaten), as well as making sure the necessary nuance and meaning is carried over. These people do something that a typical editor can’t do, which is to bridge two languages together, and so sometimes a TL-checker can act as an editor, or will pass the end result to a separate editor.
Learning to give feedback
When I was first thrown into these editing/TL-checking positions, I didn’t really have much guidance on how to do it other than “go, make stuff better” because of “LOL startups! 🙃 ” energy. Since I have a tendency to overthink things, I wound up reflecting a lot on the whole experience as I went along.
It’s them, not you [doing stuff]
Probably the biggest mental shift I had to make was understanding that even the work I’m reviewing is stuff I used to do directly, and thus I have the direct ability to do the work myself, that’s not what I’m there for. The whole point of the exercise is that someone else is doing the work and I’m there to provide a vital cross-checking function to make sure quality is maintained. Otherwise, we’d just be spending double the man-hours doing the same work.
While there’s tons of other reasons why review processes exist, probably the most central concept is that it’s a review of other people’s work, not an excuse for you to completely re-do the work in compressed time. That means that without a good reason, deference is given to the author, not the editor.
In a typical translation, or the even the writing of a paragraph of text explaining something, there are multiple ways to render something. Maybe you use more jargon when you have a highly technical audience, like in academic writing, or you use less because it’s aimed at the general public. Everyone has their own unique ways of phrasing stuff, and are more matters of personal preference than any violation of writing rules and style guides.
When being the reviewer, the hardest part of the job is distinguishing what’s an actual violation of the quality standards you’re supposed to defend (style guide, rules bout design and architecture, etc), and what’s merely a personal preference of yours. One is valid feedback, while the other is just a nitpick and possible nuisance.
Is a line merely confusing because you can think of a different way to word it that comes more naturally to you, or is it really confusing to anyone and must be changed? Is that code idiom hard to read because you rarely use that pattern, or is it really an issue? Are you just trying to show off to an audience of one? (Hint: don’t do this.)
It doesn’t help that many issues sit in a grey zone between the two extremes. Clarity of text will vary based on the reader and intended audience and there’s a limit to how much it can be measured without a formal test on actual readers.
The same applies to reviewing data science code. You get to have discussions on the merits of various methods and whether it’s worth the time to implement them. You can argue over the quality and style of the code. You can even go down a rabbit hole into how data is presented and whether people understand it. There’s so much room for personal preference that it is guaranteed that no two people will ever agree on every one of them.
You also have to learn to communicate the feedback
There are countless ways to give feedback poorly, and I’m sure we’ve all been in situations, either on the giving or receiving end, where perfectly valid feedback just fell on deaf ears because the delivery made reception a non-starter. Other times, the feedback itself is ambiguous or incoherent, and people are left stumped out to even incorporate it. And we can’t ignore the “this comment is completely nuts and is itself wrong” type of comments.
What many people starting out have trouble grasping is that giving feedback isn’t the same as giving orders. Sure, as a more experience data scientist, you could in theory tell a junior team member to do an analysis a specific way because you know better from previous experience. That can work in situations where you’re expected to give direct orders for specific tasks. But that doesn’t work in a review situation where such power dynamics are different. Going back to my point above about giving deference to the author, this also means giving the author the space and trust to find ways to correct issues that are brought up in their own way.
Now suddenly you have to do awkward things like… articulate what you think is wrong without just blurting out what you think the solution is. Then you might have to watch as the author earnestly tries to incorporate the feedback and struggles with it for a time until they find a way that works. You might have to word your feedback according to how that author understands things, or find ways to say things that encourages them to listen instead of being defensive.
All this is hard! Especially for people who aren’t good at communicating without lots of thought and consideration! I can’t think of the number of times I’ve screwed many parts of this process up.
This isn’t management, but it’s a small step towards it
It’s safe to say that doing this work doesn’t make you anywhere near being a manager. But it’s probably not an exaggeration to say that you’re going to need these skills if you plan on sticking around anywhere for a length of time, and you’re definitely going to need these skills if you do get people reporting into you. Being responsible for other people’s career growth and output necessitates the ability to give feedback.
Best to learn it when the stakes are low instead of being thrown into it one day and make a mess.
Stuff shared by others in the data community
A week ago, the author of “Telling Stories with Data”, Rohan Alexander, reached out to me to take a look at the book they recently finished. It is an actual book that you can go (eventually) buy in dead-tree format once it’s published by CRC Press. The online version is free, which is what I linked to.
It is a weighty tome that seems appropriate for use in a class setting (probably?, I don’t teach). It has lots of links to articles and papers relevant to all the aspects of working with data, from data acquisition, prep and cleaning, modeling, and communicating results. While I didn’t read the whole thing, the chapters I looked over go deep into the weeds of dong the various steps with reference to real live data and the foibles therein. I like real-world examples. The code samples are in R, though the concepts apply generally.
There’s even an appendix of cocktails inspired by each chapter.
If you’re looking to (re)connect with Data Twitter
Please reference these crowdsourced spreadsheets and feel free to contribute to them.
A list of data hangouts - Mostly Slack and Discord servers where data folk hang out
A crowdsourced list of Mastodon accounts of Data Twitter folk - it’s a big list of accounts that people have contributed to of data folk who are now on Mastodon that you can import and auto-follow to reboot your timeline
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
New thing: I’m also considering occasionally hosting guests posts written by other people. If you’re interested in writing something a data-related post to either show off work, share an experience, or need help coming up with a topic, please contact me.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.
All photos/drawings used are taken/created by Randy unless otherwise credited.
randyau.com — Curated archive of evergreen posts.
Approaching Significance Discord —where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord.
Support the newsletter:
This newsletter is free and will continue to stay that way every Tuesday, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Share posts with other people
Consider a paid Substack subscription or a small one-time Ko-fi donation
Tweet me with comments and questions
Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!
I relate to a lot of what you said about the art of giving feedback. I found two things to be helpful:
1) establishing trust with the author. It’s both more effective and efficient when I can be quite direct without spending too much time on “how should I word this so that this doesn’t come across badly” and the author can take the guess out of the equation as well “what did they mean by? was it me? Was this personal?” Etc . Of course it takes time to foster that kind of trust between people.
2) a culture of reviewing / editing among the team. In my first industry job at a market research firm, we always have two people on a project, one primary and one project lead. The project lead mainly does the review (codes, models, and reports). And we rotate the primary and project lead so everyone more or less gets to experience both sides. We produced client facing work so reviews were very harsh to ensure quality. I learned a lot from getting and giving feedback from that experience.
I probably edited about 8 books and dozens of technical papers. I remember one paper in particular, written by an Italian scientist, that I had to almost rewrite, the grammar was so terrible. If it hadn't been such an excellent paper I would have rejected it. But you are absolutely correct, it is a tricky balancing act. My mantra was to always allow the authors voice to shine through, and try to keep my edits focussed on improving clarity (and technical correctness!). Easier said than done. We did have some editors who seemed to want to change the voice to theirs, but I always tried to walk that balance. The other aspect that came up a few times (which I am not very good at) was acting as a coach. Especially with books, sometimes the author needs a lot of encouragement and emotional support to keep going. After editing a few books, I am in awe of anyone who can muster the sustained energy needed to write a book.