What if you were an evil data scientist?

How disturbing would it be?

Noribestu Jigokudani (Hell Valley) - A volcanic valley near Noribetsu, Hokkaido Japan. It’s got sulfurous pools, steaming hot streams, and steam vents. This shot was taken in July. Not to be confused with Jigokudani Yaen Koen in Nagano which has the monkeys that bathe in volcanic springs.

Apparently I’ve got prescient timing because NYC, my hometown and current city has been declared an anarchist jurisdiction this week. What a perfect time to talk about being evil!

This past Thursday, Vicki Boykis, in her usual brilliant way, posed this question to to Data-Twitter:

To which, I somewhat casually responded with this reflection upon the times I felt I held a ridiculous amount of power relative to my role and seniority simply because I was the lone data guy.

Then it started resonating with people, which planted a seed in my head that others have had similar thoughts. I’ve previously written about the responsibility inherent in being the “eyes of the organization” that gets thrust upon even junior analysts. But a hypothetical question just wouldn’t leave my brain.

What would an *small-time evil* data scientist look like?

What could they get away with? Are there checks and balances? How bad can it get?

Why did I specifically add “small-time evil” to the question? What does “small-time” even mean? Because from my surface-level-only understanding of the topic of “Ethics in Data Science”, the VAST majority of discussion is on “Big Problems” that can affect society as a whole. Things like privacy issues around Big Data, bias and discrimination in algorithms, reinforcing existing human bias, killer AIs, enabling fascism and crimes, and trolley problems. Y’know, big serious questions.

The individual actions can be small, like building an app, writing an algorithm, or launching a product or service, but the impact can be huge. The ethical discussion is over whether those acts should be done, how can unethical behavior be avoided, what does it mean to be ethical, etc. These are hard questions for smart people.

I’m not well read enough to have well-argued opinions on these topics to present to you. There are people infinitely more suited to that discussion than myself. I'm not going to discuss these topics today.

Instead, imagine a data scientist, “Evil-DS”, that is a complete clinical psycopath. Someone with no remorse and are primarily concerned with satisfying their own personal needs and goals, unfettered by norms, ethics or morals. Evil-DS is intelligent, capable, well-spoken, a crystallization of Machiavellian thinking, and only interested in getting ahead in life.

These people are a different form of evil than the kind we worry about in the Big Ethics questions. They don’t care to specifically commit “Big Evil” acts out of ideology, but they also don’t care to NOT commit such acts if it benefits them either.

But for this discussion, they’re not trying to do any kind of super evil at the moment. Right now, they’re just passing through your stepping-stone of a tech company to collect a paycheck and a title until they move on to bigger things. So Evil-DS is currently just sitting on your team, helping you set up the next A/B experiment for your meme web app.

What if such a person was a data scientist. What could they get away with?

Thinking about this makes my skin crawl a little. I don’t pretend to be a master evil person, I’m not even an amateur evil person, so I’m very likely not even scratching the surface of the full extent of what an evil person could do.

But please join me on this little thought experiment. I’m going to start out with relatively small things and we’ll dial up the evil-o-meter as we go.

Oh! Before we start, I do want to emphasize that just about everyone I’ve ever interacted with in the data industry has been generally well-meaning, and takes ethical behavior very seriously. In general the practitioners in the field are great and I have never directly witnessed anyone employing any of the tactics I’m dreaming up below. But I’m absolutely convinced those rare evil individuals are out there.

This is an reflection on trust

Essentially this whole article is a reflection on the trust that others have placed upon me over the years and wondering “if I were evil, how could that trust be abused within the context of being a data scientist?”.

Obviously there are countless other ways to take advantage of trust beyond merely using data unethically to further a career, but I’m going to explore that a bit because I haven’t found anything around that discusses the topic.

The one key theme in all this is having autonomy and limited oversight can be problematic if given to the wrong person. In many of the positions I’ve been in, I was the only person in the room, or company, that specialized in working with data. That meant that my work wasn’t really examined by another person beyond a quick glance.

Since I AM actually ethical it worked out for my employers because the trust allowed me to be very effective, but on that 0.1% (or whatever the actual percentage of psychopath data scientists are) chance that I was evil... Who knows.

Evil data shenanigans get exponentially more difficult if I were trying to manipulate data while having work being closely evaluated. Of course, that’s not to say it’d be impossible. Everyone’s usually so busy with their own work that they’d rarely want to spend the time to do a full audit of my work. So someone who very selectively applies unethical tactics may fly under the radar for a long time.

So remember throughout this piece that sunlight is still the best disinfectant for corruption.

0) The power to play favorites

Data teams in general are often understaffed and therefore have latitude to choose the work that they work on. This is a pretty normal state of the world right now. Since there’s more work than there are people to do it, teams are trusted to varying degrees to prioritize working on some projects over others. I've often been given pretty wide discretion over what I wanted to work on. This by itself isn’t unethical behavior at all.

Things can start creeping into unethical (or just plain abusive) territory when Evil-DS recognizes that the ability to choose what to work on confers a certain amount of power. Since data often brings added value to a project, it could be used as a bargaining chip in exchange for favors of some sort. I’m pretty sure an unethical person could find ways to leverage that power to their advantage.

But wait, managers are supposed to handle prioritzation?

Yes. It’s also hard to just say no to a team that’s working on what an entire company believes to be the biggest, most important project. But there’s plenty of shades of grey below that level of scrutiny.

1) Abuse the power to argue almost any position

This is the thing I was thinking about when I tweeted.

I’ve noticed working at startups and being usually the only data person in the room is that I have an HUGE amount of trust placed in me to be the expert in the room to handle all aspects surrounding the usage of data. They of course hired me specifically for that expertise and experience, but at the same time, the list gets really long once I thought about it.

I had the power to collect and analyze data, access to just about any data I wanted with some justification, could declare if a data set is “good enough to be used”, and define how to measure things. No one would really review my code or SQL queries for accuracy, because no one else had similar skills to do so. At best, people just wanted me to describe in plain terms what went into the calculations.

That huge amount of latitude meant that I could, with some careful planning, argue just about any position I choose to argue for. I’d just have to work backwards and then carefully bias my queries and analysis methodology to favor one predetermined result or the other.

In normal day-to-day operations as an ethical data scientist, I’d spend most of my time trying to make sure my methods were fair and unbiased. There’s always latitude for reasonable assumptions to differ, which can yield slightly different results that others would find reasonable.

Evil-DS could just take this to an extreme and start distorting things in service of a determined narrative. Backed with “data” in the form of charts and tilted insights, Evil-DS would be able to provide strong arguments to convince most decision makers to take certain actions.

Aren’t there checks and balances to this, like reality itself?

Some, of uneven effectiveness. Decision-makers aren’t stupid and have deep domain knowledge. They have their own internal ways of triangulating results against other “known truths” of the system. You can’t just flat out lie and create data to convince them to do something bad. You need some skill and strong numbers to make a convincing case for counter-intuitive decisions.

Similarly, some decision-makers will also want to be walked through an analysis and are sharp enough to spot subtle flaws. You’d have to be pretty careful about any deception employed, but I don’t think it’s impossible. Again, there’s a lot of places where if you take multiple reasonable assumptions, the cumulative effect is biased in a non-obvious way.

So you can’t get away with everything abusing this trust, but someone unethical can pull it out when they’ve built up trust to tip the scales in their favor.

2) Abuse the power to define reality

A lot of data work involves setting metrics to measure whether Decision A has accomplished the goal we set for it. The process is usually a very involved, working with stakeholders to figure out what is important to measure for the business, what’s important for the product, etc. There’s a lot of debate over what should be measured.

Normally, in this position, the goal is to make sure we measure things that truly reflect success: people are using our feature, people are paying, people are retaining after purchasing. It’s very easy for teams to focus on vanity metrics and other biased views of the world. This work ultimately defines the landscape upon which a decision is judged. Going through this process is very much defining the reality that the project lives in. Doing many of these projects and suddenly you’re defining reality for the whole organization.

All this usually means that the data scientist has a very big influence on how success is judged for teams, sometimes for entire organizations. Someone unethical would be able to decide to measure things more strictly, or loosely, depending on whether they want things to look good or bad. Someone completely evil could use this power as leverage and a bargaining tool, if not outright sabotage.

Combine this problem with the fact that the data scientist is often called upon to do the actual implementation and monitoring of the metrics and the situation is can go undetected for quite a while unless other people actively review and audit the whole process.

Armed with this, you can make just about any project you want look pretty good for a period of time, perfect for someone looking to get ahead. Use the halo of a good project to get a promotion, a bonus, or use it in a resume to jump ship to another position.

Surely people would eventually notice?

Sure, for certain high-stakes metrics this is impossible. It’s hard to fake things like revenue — there’s too many eyes on those numbers for that to fly.

But things like “clicks on a button”? Conversion percentages? An unethical person can manipulate the measurement of the nominators and denominators to suit a desired narrative. A couple of biased filters to exclude/include certain groups, taking advantage of the latitude to define ‘outliers’, and that can usually tip the scales.

The main defense against this sort of manipulation would be triangulation. Why is this product’s activity going through the roof, but we aren’t seeing a corresponding change in other important metrics like revenue?

But very often, I’ve noticed that people are willing to let mismatches of triangulation slide for a month or two. “Oh, the change is still new, we won’t know the true effect for a while.” Most experiments do take a certain amount of time to settle down, so it’s a reasonable assumption to make. However, by the time “a while” comes along, people will have moved on to other projects and have forgotten, so the check is never really done.

The effects of certain types of changes also naturally shift over time as users become acclimated to a certain user experience, so with all the factors moving around, it’s often not obvious what effects have truly endured. This ultimately just give more leeway to get away with shoddy analysis.

I don’t think you can get away with everything like this, but someone really bold could probably get away with a ton.

3) Build biased instrumentation and pipelines

Up until now, the assumption was that the data being collected was generally accurate, but the analysis was deliberately manipulated. Let’s ratchet things up a bit.

If Evil-DS had an ongoing agenda for looking good to management, they wouldn’t want to keep doctoring reports and analyses to maintain a fiction of high-performance. That can be prone to issues if a second person were to ever check the SQL and analysis code since the manipulation would be sitting right out in the open. Instead, it would be better to start manipulating things at the source and only collect biased data. Now we upgrade from just lying to stakeholders to lying about the actual fabric of data being collected.

Finding issues with data collection takes significantly more work than finding issues with existing data. Unless you know what to look for, it’s not obvious that data is being manipulated at all. Data collection issues are one of the most dangerous ones around because they’re so subtle unless you’re deliberately looking for those issues. Graphs of distributions that have missing data don’t normally scream “There’s a giant hole here!”, you really have to know where to look.

The biggest challenge with this is that you’re deliberately breaking telemetry by introducing bias permanently. It’s hard to predict the effects it has downstream. But if you have a long-term agenda, like perhaps an interest in making a certain country look better in the data, you can tilt the data in your favor in very well-defined use cases.

Luckily for the Evil-DS, most issues with broken telemetry can be considered innocent mistakes and bugs, so even if you made a mistake, it’s possible to blame it on something other than malice.

4) Advocate for dark patterns

Sometimes, manipulation of perceptions of reality and office politics, which is the stuff I’ve been describing so far, aren’t enough to get the job done. Sometimes, actual results are needed. Now we’ll have to consider even more evil parts of the toolbox.

First up is dark patterns. There’s some controversy over what types of designs and behaviors constitute dark patterns, but the majority of the techniques are known to be effective. Since dark patterns get the name because they often work, anyone wishing to obtain more success would be tempted to reach for them despite their dubious ethical status.

Many dark patterns require at least the aid of engineering and designer to actually build and launch, so a single evil data scientist can’t just create them. But if the organization is data-driven, and Evil-DS is setting up and experiments, managing the testing framework, and in the product brainstorm sessions… There’s nothing stopping them from advocating for dark pattertn at the design stage. The evil stuff will often have a better chance of winning out in a straight up A/B.

The primary cost to the organization when dark patterns are deployed are hard to pin down. Obviously the product is less ethical for it and users may get upset. There’s also intangible effects like certain users may be turned off by the patterns. However, beyond that, there are probably more subtle effects like building up a culture where those patterns are acceptable. This could drive away talented people who are opposed to that sort of design pattern.

5) Use evil from external parties

Let’s crank the evil up further. Imagine Evil-DS has been tasked with growing a metric they helped define and they want a big short term gain (perhaps to pad up their resume before jumping ship). They can start using 3rd party sources of evil to further their goals, things that live outside the organization.

What do I mean by 3rd party sources of evil? The darker underbelly of the internet: data brokers that sell profiles of users based on scant pieces of information, click farms, bots, fake accounts. All that sketchy stuff, much of it you don’t have to surf the dark web and pay bitcoin for either.

Data brokers live in the most grey of this zone. They can feed data into recommendation systems to segment and increase targeting of ads and marketing efforts, depending on your view of the world, this could be an unethical practice. Often using the data implies contributing some data back in an API call so you're also help the broker collect data too.

Going completely into evil-land, if Evil-DS can get success of a project to be measured off a weak vanity metric like raw registrations, paying for clicks and registrations is cheap, even just using a platform like Mechanical Turk to “test” out a flow targeted in a specific way could manipulate tons of metrics. Or someone could just write their own cloud-based set of bots to complete certain instances.

Overall it’s really hard to combat this kind of behavior because you’d have to find a way to detect this fraudulent behavior (for example, registrations coming out of strange countries or hours of the day). This type of low-quality traffic looks different from normal traffic, high bounce rate, etc. The problem is realizing it in time. Since you’d be fighting one of the few people in the organization that is equipped to detect such fraud (and possibly helped designed the fraud detection), it will be difficult.

6) Abuse data access for direct evil

Data scientists and analysts are usually given very wide latitude with their data access. While in most places I didn’t have access to PII that was protected under various compliance/legal protections (so, no social security numbers, no credit card numbers, nothing related to HIPAA, etc.), plenty of stuff was fair game. Names, addresses, purchase history, location, IP, profile information is all just a few taps of a keyboard away. Plus it was all at scale thanks to direct database access, so a quick SQL query can dump it all out. It was also very common to have access to source code to most things to help with product instrumentation.

Take a moment to think what evil could be done with that sort of “basic” information. It’s never been easier to anonymously dox someone on the internet to various hate groups, make false accusations, or just leak things onto the internet. Depending on what information is available, selling the information can be in the cards. It can be used to hurt, embarrass and intimidate people who are considered enemies, especially in politically fractured times like this..

Mature companies, who are aware they may harbor bad actors internally, usually have safeguards around data access, so at least there’s a possibility that such violations can be logged and traced. This layer of security acts as a barrier to entry for an unethical user. Someone dedicated might be able to find a way around those safeguards.

Smaller companies often don’t have any of these safeguards at all because the teams may lack the sophistication and resources to implement them, or just have stronger faith in their ability to hire good people. Either way, data theft is a potential issue for everyone, especially as someone leaves the company.

7) Evil stuff I can’t dream of

Like I mentioned earlier, I’m not evil, nor am I a security researcher. I’m not very good at thinking about things sideways and looking at them from this perspective. I found it quite unsettling to go through this exercise here at my keyboard and just think about what kind of betrayals of trust could I do from my desk. Even my unimaginative brain could come up with quite a number.

I suspect that you people out in there in the big internet might have seen examples of such behavior out in the wild, or can come up with more hypothetical examples. I’d love to hear from you either in comments, here on Twitter, or via Twitter DM if you’d like privacy.

LATE Post-send Update

I probably shouldn’t do post-send updates, but wanted to tack this thought experiment down since it doesn’t feel content-y enough to make a full post out of (yet). Essentially, I run through some thoughts on "what would it take to actively bot an A/B test to deliberately favor A over B… It’s… surprisingly easy and hard all at the same time.

About this newsletter

I’m Randy Au, currently a quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. The Counting Stuff newsletter is a weekly data/tech blog about the less-than-sexy aspects about data science, UX research and tech. With occasional excursions into other fun topics.

Comments and questions are always welcome. Tweet me. Always feel free to share these free newsletter posts with others.

All photos/drawings used are taken/created by Randy unless otherwise noted.