Attention: As of January 2024, We have moved to counting-stuff.com. Subscribe there, not here on Substack, if you want to receive weekly posts.
World still continues to be madness for the foreseeable future. Be safe everyone.
Many years ago, around 2018ish, I had been randomly chosen to be part of the famous(?) Nielsen TV ratings sampling thing. The odds of getting picked can be pretty low so I guess it's a pretty unique experience, especially for us data folk. I figure it'll be fun to describe how the whole thing felt like from the perspective of a data nerd.
Note: I'm telling this story primarily from 4+ year old memories now, so many details have faded, technology has probably changed, and I'm likely to get things slightly off. I'll do my best, but do take things with a grain of salt.
Overview
For those who aren't familiar with Nielsen's TV ratings thing, it's a data collection empire and product designed to answer the question of “who is watching a given TV program”. This process is part of how TV viewership numbers (and the associated demographics) are measured and the numbers are a very important component in determining the price of advertising slots and by extension potentially whether a show is allowed to continue or not. This is how a show knows that “8% of the 20-35 age bracket watch this show show”, and how advertisers know where they want to advertise, as well as determine how much they’re willing to pay to reach that audience.
The overall methodology is relatively straightforward, get a representative sample of the population, measure how they watch TV, use the power of statistics to figure out what percentage of the population is watching a given show.
Technical note, Nielsen is not the only company that works in this space, there are many companies with competing data products. It appears that all such companies go through auditing/accreditation from an industry group, the Media Rating Council to make sure they follow some minimum set of guidelines and industry practices. In 2021 Nielsen had their accreditation suspended by the MRC due to claims that they undercounted viewers during the pandemic.
The problem of course is in the details.
For example, how do you know when someone is watching a specific show? What about demographics? What about multiple people in the house? How do you handle cheating or lying? How has modern streaming and viewing habits changes things?
We'll get to those as we go.
First, getting picked
I think first contact happened from a ring at the doorbell in the early evening by a recruitment representative canvasing my neighborhood (though there might have been letters sent out too). The person identified themselves, and since I knew what the company was about I took the time to talk to them instead of simply saying no.
One thing I do remember about the meeting was the recruitment rep asked whether I had heard of Nielsen and what they did and they were a bit surprised that I said yes. After a brief conversation about the incentives given and what the process involved, they asked how many TVs and what services/devices I had (just one, but also I mentioned a PS4) and we set up an appointment for tech people to come in and install all their measuring equipment.
What were the incentives? Up front, they gave me a prepaid visa for about $200 (ish). Plus there might’ve been a check for a smaller amount too, I don’t remember exactly. Every month after there was a very small nominal check (I think in the ballpark of $15ish) for participation (which was roughly a 12 month stint IIRC). There was an option where if I allowed tracking software to be installed on at least one home computer (that was not used for work due to security/NDA concerns), they’d offer a little bit more cash per month. I didn’t opt for that because I’m not letting anyone install stuff on my machines, even a junk old unused laptop.
Then, we had the installation appointment. A team of two people came to the house with a lot of equipment. They then saw my TV setup and immediately said they’d need to get some more gear from the van because I had a LOT of unmentioned stuff connected to the TV — a PS3, a Wii U, an Apple TV, plus an old laptop. Apparently every device hooked up to the TV needed to be hooked into their equipment, even if I (truthfully) claimed that I haven’t used some of those things in years. Because I might use that equipment at some point, they needed to monitor it. They even asked if I had any old consoles and devices in storage and I mentioned I had a Nintendo DS and a PSP somewhere. Luckily they didn’t need to monitor those.
Basic data collection bits
What did ‘monitor’ mean in this instance? Well, the short of it was they needed to put in a device that intercepted the audio/video signal going into the TV.
It’s a box took in a ton of HDMI feeds and then passed it on to the TV. It had cellular internet to send data back to the home base. I asked how the system knew what I was watching and the rep said that it listens to the audio feed for inaudible signature information about what’s being played, and it uses that to identify the program. Wild stuff. I assume the system just gives up and says a generic entry when I play those downloaded new anime episodes from Japan occasionally on the TV…
To solve the “who’s watching” thing, they hooked a device called a People Meter up to sensor package and sat it on top of the TV. It had a green segmented LED readout that scrolled basic English prompts (“Who’s watching? *press button* Hi Randy.” type stuff). We had special remotes in the house coded to identify the people living at home. I had a button that represented me and (via questionnaire) knew my gender, age, education, profession, (self-reported) income. There was another button that represented my wife.
Everyone who’s watching TV was supposed to use the remote to tell the device who was watching at the time, and you were supposed to press the button again to sign out when you stopped. There was even a way to use the remote to enter basic age/gender information into the system for visiting guests, so if people came over I could tell the system that an anonymous 30 year old male was watching TV.
The device would also know to turn on if the TV was turned on (probably through the HDMI connect), so it’d blink and prompt you to enter information whenever the system was powered up.
Every so often (probably between one and two hours) the meter would flash and ask if the same people were still watching, as a reminder to update the data if someone leaves and stops watching. Also to guard against people falling asleep in front of the TV or walking off for too long.
There’s also an infrared lens on the front of the meter, which I assume tried to give some sense of whether there was a warm body in front of the device or not, but I suppose it could’ve just been there to make sure it picked up the remote signals well (to minimize user friction, we’ll get to that later)
Regular check-ins
As part of being in the whole program, you agree to have pretty regular check-ins from Nielsen staff. Once a month or so, you’d get a call to ask a standard questionnaire panel to update any demographic information, as well as report any issues. Once a quarter or so our assigned tech rep will come over and check on the equipment, while also doing the usual questionnaire stuff. Since they visit our home, they can do a visual inspection to make sure I haven’t secretly bought another game console or TV or something (if I did then they’d have to go hook it up to more tracking hardware.)
Amongst the demographic data, I did remember a few questions that stood out as “interesting”… one set of questions involved wanting to know whether I had purchased certain beverages, soda, coffee, tea, in the recent past. Another was whether I had told anyone that I had been participating in the program (it was made clear repeatedly throughout that you’re not supposed to do that since that’d introduce all sorts of potential issues).
Also, you’re asked to let the rep know when you’ll be away from home for an extended period (like, over a week). Once, I was on vacation for two weeks and forgot and got a text from the company saying that there had been no activity on the system for a period of time and if there was anything wrong. They were totally fine with me going on vacation, but wanted to know so that they presumably could exclude me from their sampling pool for that period of time. Once, they caught us in a ridiculously busy period and we literally hadn’t turned on the TV in a week. Another time, the data collector thing had a networking issue and couldn’t call home with data via the cell network and it needed a part replaced.
This was all a relatively minor inconvenience, but they did mention it was part of the terms of participating.
Minimizing friction
As you could imagine, participating in this thing is a ton of friction for the participant.
My wife frequently griped about the extra remote and buttons to our existing setup which involved a TV remote + complicated stereo receiver remote. Luckily, she spent more time watching overseas dramas on her laptop so she rarely used the TV and thus bypassed the process.
Prior to the modern technology that allowed tracking what shows were being shown, it seemed people had to submit diary studies(!) with journal entries for what they were watching. Just imagine the extra bias and memory errors involved in that process, not to mention the intense manual data entry needed to ingest that data.
So, to the extent possible, Nielsen tried to make things as low friction as possible for us. We were literally the product that they were selling to the media and advertisers after all, and if we got pissed and stopped bothering to give useful data, they’d have no product to sell.
To make things easier, we were assigned a local representative that we could contact at any time if we had issues. If the equipment somehow failed, we could call. If we bought a new device, or TV, or wanted to get rid of a device, we could call and have the setup modified. If we so much as need an extra battery for the remote or a power strip, we could call and get one ASAP. Appointments could be scheduled around whatever schedules we had, so early evening after work was usually doable. We actually had that rep with us the whole year we were in the program, and our child was born partway through the process and we got to know each other a bit.
Overall, there wasn’t much that a rep could do to alleviate the toil of having to participate, but I honestly couldn’t see how they could minimize it much further.
Data collection quality
Hopefully, as you been reading this story, you’ll have all sorts of question about data quality. After all, practically everything about this whole thing is self-reported.
There’s at most a few technical guard rails, like the audio-detect-the-program thing, as well as any sensors related to checking who’s sitting in front of the TV (if such a feature exists, which I’m not sure on). Everything else from the demographics to the actual viewing behavior is largely self report survey. There’s a fair amount of room for lying (though, since I was obviously Asian, male, and early middle aged, the rep taking my answer could easily stop me from saying I was a 78 year old female). Having someone physical there taking the initial survey responses places some basic safeguards on lying.
Another thing was, since this was effectively a kind of longitudinal study, the primary tool that Nielsen would use was simple regular contact and reminders. The regular check-ins and contact if there had been too many days without data would nudge people to continue participating. If we stopped being cooperative (or just said we wanted to stop), they would simply arrange to come collect their equipment, put all my stuff back, and stop sending the monthly participation checks.
The regular question as to whether you shared your participation w/ others was probably a way to remind people of the agreed-upon non-disclosure while also potentially allowing them to remove bad data retroactively. Obviously, there’s no way to outright stop me from lying, but I’m pretty sure it has a social pressure effect to keep more people doing the desired behavior.
Overall, they make the assumption that I’d be more or less truthful because I probably don’t have any incentive to lie. Plus, my data should point in the direction of useful truth when used in aggregate with the rest of the sampling pool, so a couple of bad actors should have an outsized impact on their results.
Probably the most damage I could somehow do to the dataset would’ve been to lie about what was being watched by turning on some program, mashing buttons to lie about who was watching, and then maintaining the fiction for whenever the system would ask for an update. But that’s a lot of toil for something I would get no benefit from. It’s not like I somehow know a TV producer or marketer that’d write me a fat check to subtly screw with the numbers. From this perspective, deliberately messing with things over the long haul isn’t worth the effort involved.
I do think that I did confuse their system a bit because my computer workstation is at a 90 degree angle to the TV setup. When I worked from home and wanted background noise, I’d have the TV on for hours but would be sitting off to the side. Someone must’ve noticed that the TV was on for a long period during the day, because someone called mentioning the peculiar behavior and wanted me to explain what was going on. Once they learned that this was actual human behavior, they didn’t seem to mind it happening.
So, what did you learn?
Probably the biggest thing that left an impression was the absolute lengths and expense they had to go to in order to collect their data. There was a whole infrastructure of call centers, technicians, recruiters, and contractors needed to keep such a program going at a national level while maintaining a balanced sample of the population. These things needed to exist across any TV markets they planed on measuring. Between incentives and the infrastructure, I’d guess that maybe a thousand dollars must have been spent just to support my one account, and I live in a very dense city area.
What’s pretty mind blowing is that despite the expense, this data is far from “perfect”. It’s a lot of survey and self report data which have to be verified and/or monitored for weird anomalies. The recent episode where Nielsen had their accreditation suspended due to complaints about potential audience under-counting during the pandemic highlights how much of an inexact process it is.
That said, there’s likely no better way (as in, cheaper, less labor intensive, less susceptible to unknown bias) to do a similar measurement for the question of interest. There are plenty of companies competing in this space, and if someone had developed a better way already, they’d be swimming in money.
You could also see years of industry experience built into the protocols.
They try to avoid having expensive participants drop out with the monitoring of telemetry, regular check-ins, and making things convenient. They tried to avoid people forgetting they’re not supposed to blab about participating by no-so-subtly reminding you at almost every opportunity. The People Meter was very simple to use, even a kid or the elderly wouldn’t have much trouble with the basics (honestly just press a button, the system auto-detects if the TV is turned on.) There different people doing the occasional check-ups so that local representatives couldn’t game the system to hit their recruitment quotas. I’m sure each protocol came from an expensive lesson in the past.
Was it worth it?
Since I got to completely nerd out on the whole experience, sure. For someone who doesn’t care nearly as much (for example, my wife), it’s somewhat more neutral. In exchange for a few hundred dollars, you do have to commit to being bothered for an extended period of time. We don’t watch much TV at home so it didn’t really affect us as much, but for heavier watchers it might be a bother.
I will say though, the most satisfying moment of the whole experience was when the State of the Union address for the year rolled around. I very deliberately had the whole family sit down… and watch a movie or some other program on the TV. I knew that President Trump cared about TV ratings to his speeches, and I sure as hell made sure that one tiny datapoint in a TV market in NYC would clearly show that a sample of people were most definitely NOT watching his ridiculous speech.
Just my very nerdy and statistical way to express my dissatisfaction at that whole mess of a presidential administration.
Totally worth it.
Standing offer: If you created something and would like me to review or share it w/ the data community — my mailbox and Twitter DMs are open.
About this newsletter
I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With excursions into other fun topics.
Curated archive of evergreen posts can be found at randyau.com
All photos/drawings used are taken/created by Randy unless otherwise noted.
Supporting this newsletter:
This newsletter is free, share it with your friends without guilt! But if you like the content and want to send some love, here’s some options:
Tweet me - Comments and questions are always welcome, they often inspire new posts
A small one-time donation at Ko-fi - Thanks to everyone who’s sent a small donation! I read every single note!
If shirts and swag are more your style there’s some here - There’s a plane w/ dots shirt available!