The idea of ‘big data’ has become ubiquitous, but what is it and how is it changing the way we live? We sat down with data scientist, Harvard PhD and National Book Award nominee Cathy O’Neil to find out.
CT: Let’s start with the basics – what exactly is ‘big data’?
CO: Big data is a new approach to predicting things. More specifically, ‘big data’ is the use of incidentally collected data – like how you search through your browser or what you do on Facebook – to infer things about you, like what you’re going to buy or what your political affiliations are. It’s an indirect way of figuring people out. For example, a camera that’s surveilling us doesn’t ask ‘What are you doing?’ – it just gets to see what we’re doing.
CT: And what’s an algorithm?
CO: Algorithms are computations that [interpret the] data that’s gathered about you in order to create a prediction. Think of it like a mathematical equation that tries to answer a question that’s framed as prediction, such as: ‘Is this person about to buy something?’ or ‘Is this person about to vote for someone?’
CT: Why am I hearing so much about it right now?
CO: Before ‘big data’, statisticians would do expensive things like polling people to figure out the future. For example, asking people direct questions like: ‘Who are you going to vote for?’ Now, we increasingly rely on ‘data exhaust’, which is what I call the data that’s being collected about you constantly, to infer things about you.
Before ‘big data’, companies had only wild guesses to make. Now, we have better than wild guesses. What’s surprising is that most big data algorithms are wildly inaccurate, and there’s no reason to think they’re right. But they are better than wild guesses. And that’s why big data has taken off like it has.
CT: If they’re inaccurate, then what are they reflecting?
CO: The flawed data sets that we feed them. Algorithms don’t know anything beyond what we tell them. So when we have uneven data and we’re feeding that to the algorithm, or biased data, it’ll think that’s reality.
CT: What’s a real-world example of that?
CO: An example might be that in the United States, black people are five times are more likely to be arrested for smoking pot than white people. This isn’t because black people smoke pot more often – both groups smoke pot at the same rate. Black people are just much more likely to be arrested for it. If you hand that to an algorithm, which we do, it’ll correctly infer that black people are much more likely, in the future, to be arrested for smoking pot. And then it will give black people higher risk scores for criminality, which has an effect on criminal sentencing.
Another example is a thought experiment. I’ll use Fox News, because Fox News has had eruptions recently related to an internal culture of sexism. The experiment is ‘What would happen if Fox News tried to use their own data to build a machine learning algorithm to hire people in the future?’
Say we’re looking for people who were successful at Fox News, for example. It depends how you would define success, but usually you’d look at people who get raises, promotions or stay for a long time. By any of those measures, the data would reflect that women do not succeed at Fox News. If used as a hiring algorithms, it would propagate that problem. It would look at a pool of applicants and it would say ‘I don’t want to hire any women, because they’re not successful here. They’re not good hires.’ And it doesn’t just have to be Fox News – every corporate culture has bias. When you feed an algorithm data, the algorithm bias then propagates that. It continues to reinforce the biases that already exist in society.
CT: Are the biases intentional?
CO: I don’t think data scientists are trying to make sexist or racist algorithms. But machine learning algorithms are exceptionally good at picking up relatively nuanced patterns, and then propagating them. It’s not something data scientists are intentionally doing, but it’s bias nonetheless.
CT: What role do inaccurate algorithms play in our daily lives?
CO: They’re being used in all sorts of decisions for people’s lives – everything from college admissions to getting a job.
There are algorithms that decide how police will police neighborhoods, as well as algorithms that decide how judges will sentence defendants. There are algorithms that decide how much you’ll pay for insurance, or what kind of APR [interest rate] you get on your credit card. There are algorithms that decide how you’re doing at your job, which are used to determine pay rises. There are algorithms every step of the way, from birth ‘til death.
CT: So where does that leave us?
CO: We’ve jumped into the big data era and have thrown algorithms at every single problem that we have, assuming those algorithms must be more fair than humans – but actually they’re just as unfair as humans. We have to do better.
Click here to read the second part of our interview with Dr O’Neil. Her book, The Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy is available now.