Transcript: Big Data Big Trouble

Interview with Cathy O’Neil, author, Weapons of Math Destruction

For podcast release Monday, November 14, 2016

KENNEALLY: Equal parts mathematician and political activist, Cathy O’Neil has calculated the impact of algorithms on society. For the most part, she says, big data adds up to big trouble. Welcome to Copyright Clearance Center’s podcast series. I’m Christopher Kenneally for Beyond the Book. When it comes to human activities, algorithms are expected to be models of objectivity, owing to their basis in mathematical formulae and reliance on enormous quantities of measured facts about a given general population, whether students or teachers, job applicants or criminal defendants.

Cathy O’Neil makes the case that real-world mathematical models are anything but objective. In her new book, Weapons of Math Destruction, she asserts that big data WMDs are opaque, unaccountable and destructive and that they essentially act as unwritten and unpublished secret laws. Weapons of Math Destruction was long-listed for a National Book Award in nonfiction and was published in September, 2016 to enthusiastic reviews from the likes of Clay Shirky and Cory Doctorow. Cathy O’Neil joins me now from New York City. And welcome to Beyond the Book, Cathy O’Neil.

O’NEIL: Thank you so much for having me, Christopher.

KENNEALLY: Well, we’re delighted to have an opportunity to talk to you. And first, congratulations on being on the long list for the National Book Award as well as some of the tremendous reviews that people can find on your blog site, mathbabe.org. And I guess my first question to you is are you surprised that a book about math has been so well received?

O’NEIL: (laughter) Well, first of all, I’m very, very honored by the reception, and I’m so glad that it came out when it did. I think it was very, very good timing. And I know there’s plenty of amazing books that never really get a chance, and I’m so grateful that mine has.

But having said that, it’s really not a book about math. I know a lot of people worry about that, but the way I describe it is it’s a book about power. And it’s a book about, in particular, the way that people with power are building tools of social control and shielding those tools from scrutiny by saying this is mathematics, you’re not an expert in mathematics, so you wouldn’t understand it. So in other words, they’re kind of like flashing the math ID, like you might see a policeman flash their badge, and saying this is something that you can’t ask questions about.

And my book is about looking past that shield and saying, yeah, we actually have every right as human beings to ask questions about things that affect us very deeply that are secret and are possibly quite unfair and destructive.

KENNEALLY: That’s fascinating, because of course, while math may be interesting to many people, power is interesting to absolutely everyone. But I have to tell you, we were drawn to your book because big data has become the big thing in publishing – something that seems to promise to improve business in a number of ways, with benefits for authors and readers as well as editors and executives. So we thought it was important to hear about some of the darker side of all of this. And I guess we should ask you to help us understand this. And would you tell us why then these algorithms are not inherently fair but are somehow just human opinions in computer code?

O’NEIL: Sure. So there’s actually a couple questions in that question, but I’ll answer that last part first. I give an example in my book – and I think it’s a really example – so that people can understand the extent to which algorithms are just creations of the data scientists that build them, and they have all the biases and opinions and projected values of that creator.

So the example in my book I give is my own algorithm that I use to cook dinner for my family, which is not a formal algorithm – it’s not something I’ve written down with computer code – but it is something that I think of as an algorithm in the sense that I use it every day and I optimize to my definition of success. So I should say that – and any algorithm sort of has two main ingredients when you’re going to build an algorithm. And the first ingredient is data. You have to feed it data. And the second ingredient is defining when there’s a success or when there’s a failure.

So for me, cooking a family meal, the data that goes into it are the ingredients I have in my kitchen, the amount of time I have on hand before dinnertime, the amount of ambition I have – and a definition of success for me is whether my kids eat any vegetables during the meal. (laughter) And the reason I choose that is because I am projecting my agenda, which is that my kids eat vegetables, onto my family meal. So if my seven year old were in charge of the definition of success for family meals, then he would define a successful meal to be one where he gets to eat a lot of Nutella, because his favorite food is Nutella.

And it really matters because, over time, as I said, we optimize to success. So over time, I tend to make meals where I know in the past my kids have eaten vegetables with that meal. And that informs all sorts of decisions, including what I buy for groceries. If my son were in charge, again, we would have a very different grocery list.

The other thing I want to mention is, when I say that the data includes the ingredients I have available in my kitchen, I curate that data very heavily, right? I don’t include Pop Tarts as an ingredient for a dinner, although my teenage sons would absolutely do so. So that’s another way I am projecting my values onto the model I’m building. I define what is relevant and what kind of data I care about and what kind of data I exclude.

KENNEALLY: Right. It’s fascinating, because we have this sort of presumption that mathematical formula or anything that has to do with math is somehow apart from, separate from, inconsistent with human subjectivity. And you’re telling us that really they are very closely related, so that algorithm of your young son would put dessert first, and it would always come out the right way and the way he wanted it to.

O’NEIL: Right. That’s what we do when we build algorithms. We – as long as the person in charge has the power, they get to define what we’ll emphasize, what we’ll optimize to. And so going back to your question about – well, you didn’t have a precise question but the comment about how the big data is sort of taking over the publishing world – and, you know, that’s absolutely true. But as in every other way big data is taking things over, there’s always this kind of choice of how you define success.

And probably for publishing – I’m just guessing – success looks like number of copies sold, right? But if you’ve defined that as success and you optimize to the number of copies sold or whatever the book – or the number of subscribers or something along those lines, then you actually lose, over time, if you’re optimizing only to that and focusing only on that, then you’re actually probably losing value in other ways, like we actually – publishers probably also care about how many awards that book was nominated for or whether the book actually was high quality. But if you’re focusing only on the number of books sold, then you’re going to be blind to those other kinds of things that you actually do find valuable but which are harder to measure.

KENNEALLY: OK. Well, I think we’re getting the idea that these algorithms are everywhere. But really, what are we talking about? Is there an entire arsenal of big data WMDs out there or are they only in a few scattered bunkers around the country?

O’NEIL: Well, I did a lot of research, and I found more WMDs – and again, to remind the listener, WMDs are algorithms that I find particularly horrible, so they’re important, they’re secret and they’re destructive – I found them all over the place. I found them in insurance. I found them in lending. I found then when you’re trying to get a job. I found them when you’re on the job. I found them in both policing and in sentencing and in – of course in education – widely used in education.

And I have a theory about it. It’s not a deep theory, but my theory is – because they all have this sort of – they have common traits. And one of the most obvious common traits in all these examples in all these industries is that they are used when people don’t particularly want to take responsibility for complicated decisions.

So if you have an algorithm that’s making those choices for you, when people complain, you can point to that algorithm and you can say it’s not me – it’s the algorithm. And by the way, the algorithm is complicated and secret, and so you have no right to complain. There’s essentially no accountability. And there’s no accountability in a very deep sense – like the people who are using it don’t even understand it themselves. So it’s like somehow become an abstract entity that has no accountability to anyone at all.

KENNEALLY: Right. And Cathy O’Neil, you’re telling us that this really has a real significant lasting impact on people’s lives. And so tell us who suffers the most from these mathematical WMDs.

O’NEIL: Great question – yeah – because it’s really important to me that people realize this is a class thing, this is a race thing. The truth is that these algorithms, generally speaking, do not affect everyone equally. They affect people who are powerless – because again it’s about power. So it affects people that need a job. So if you need a job, you’re going to – 60% of job applicants and even a larger proportion of people applying for minimum wage jobs have to take a personality test in order to even get an interview at a job. So this is not something you can opt out of.

A lot of people, when they think of algorithms, they think of online. And there certainly are online algorithms in my book. But I just want to make the point that a lot of the algorithms I’m talking about are something that follow you around if you are of a certain class – and especially if you’re poor and you’re desperate. And they’re not kind to you. They make these arbitrary judgments and they sometimes are unfair and punishing, and you have again no accountability. So the other thing – they have no accountability, I should say. You have no way – no appeals process.

As you mentioned, these are cumulative as well. I make the claim in the book that, if you are like a poor black person, then you are probably touched by many, many of these algorithms, and they are making decisions not based on your behavior but just of your zip code and the color of your skin – and that the punishment that you’re receiving at the hands of these algorithms is probably adding up over your lifetime, so it’s not just hitting you once but hitting you multiple times, from all different angles.

KENNEALLY: So what needs to be done, Cathy? I mean do we have to give up on data? Is it ever trustworthy? Is there a place for regulation or public policy?

O’NEIL: Well, so, to be clear, a lot of the processes that are being replaced with algorithms, like hiring processes, already are regulated. The regulations are actually pretty good. But the enforcement is a problem. And the enforcement’s a problem for the most part because regulators don’t know how to deal with algorithms. So like the very first thing I’m asking for is for the data scientists like myself to sort of – to study the field of auditing algorithms so that the regulators can use tools like auditing tools to see whether a given algorithm is doing things that are already deemed illegal.

Right now, like the EEOC, which should be looking into unfair hiring practices, doesn’t have the technical tools to look at an algorithm and see whether it’s legal or not, and that’s the kind of thing we need to do.

KENNEALLY: Right. So what you’re saying is we just need to be more careful, we need to be more aware of the impact of data and these algorithms on our lives, because it seems to be happening behind the screen. These are the ultimate black box.

O’NEIL: The biggest mistake people make is that they just assume that, because it’s algorithmic, it is objective, it is fair – just by dint of its mathematical nature. I hope that that’s over. I hope that era of blind faith in big data algorithms has ended. The second thing is, well, now that we know they’re not inherently fair and perfect, how do we test them – because what we’ve done is we’ve taken these very critical and fragile decision making processes and replaced them with algorithms just assuming that it’ll be a perfect replacement. But now we know that it isn’t. So how do we measure the extent to which they are actually creating a worse world rather than a better world for us?

KENNEALLY: Well, we’re going to need some more data scientists like you to do that kind of measurement. But let’s close with an example of a good algorithm. We don’t want to leave people with the impression this is all bad. There are ways of using algorithms, of using data to achieve positive ends, at least in your view. And you did a blog post recently on one such algorithm at Georgia State University.

O’NEIL: Yeah. So in this case, in this example, what they did was they were looking for struggling students. And they found all sorts of signals – some signs that a student was struggling – and they were afraid that that student would eventually drop out. So they were trying to prevent that from happening. And I should mention that students particularly vulnerable to dropping out of college are kids whose parents didn’t go to college, kids that are poor, kids that are minority, so that’s a particularly vulnerable population.

So the algorithm itself sort of – it picked out kids that were high risk. The critical part of this algorithm, though, was what they did next. What they did next was they hired just a ton of advisers, and they – I think they quadrupled the number of meetings between college advisers and students, and they really supported the students that were most at risk of dropping out. And it cost them a lot of money to have all those advisers. And it really worked too.

So I think the takeaway from that is this algorithm wasn’t secret sauce, right? It wasn’t a silver bullet. What it did was it helped the college create a new policy of advising. And the way it helped was it helped target the advising. It didn’t by itself do very much at all. But in combination with that large investment into their own students, it really did something wonderful.

KENNEALLY: Right. And what I really appreciate about your analysis of that example as well as throughout the book is that human nature plays such an important part of everything we do and it can play a negative role and it can play a positive role, so we need to make the effort to ensure and to kind of check our work, if you will – kind of using a mathematical idea – we need to check our work and make sure that the goals we imagine this is going to achieve really are the ones that result.

O’NEIL: Yeah. And again, I don’t think it’s really a math question, right? The question I ask people to ask themselves is does this punish the unlucky or does this help the unlucky? Does this punish the poor or does this help the poor? If it’s punishing the poor, then it’s very likely to be a weapon of mass destruction. But if it’s helping people who are struggling, then it is not.

KENNEALLY: All right. Cathy O’Neil, author of Weapons of Math Destruction, thanks so much for joining us on Beyond the Book.

O’NEIL: My pleasure. Thank you for having me.

KENNEALLY: Beyond the Book is produced by Copyright Clearance Center. With its subsidiaries RightsDirect in the Netherlands and Ixxus in the United Kingdom, CCC is a global leader in content workflow, document delivery, text and data mining and rights licensing technology. You can follow Beyond the Book on Twitter, like us on Facebook and subscribe to the free podcast series on iTunes or at our Website, beyondthebook.com. Our engineer and co-producer is Jeremy Brieske of Burst Marketing. I’m Christopher Kenneally. Join us again soon on Beyond the Book.

Velocity of Content Categories