Transcript: Mass Digitization – Progress, Goals, and Roadblocks

Listen to Podcast Download Transcript PDF

Recorded at Copyright & Technology NYC 2016 Conference, January 19, 2016


  • Jacqueline Charlesworth, US Copyright Office
  • Devereux Chatillon, Esq.
  • Roy Kaufman, Copyright Clearance Center

For podcast release Monday, February 1, 2016

KENNEALLY: It’s been interesting to listen to the various presentations, and to hear about that interplay of copyright and technology that Bill Rosenblatt set up for us this morning, and in particular to watch as copyright shows its influence upon technology and technology pushes back and has an influence on copyright as well. But one of the things that really doesn’t change and will be, I think, a central point on this particular panel is that copyright remains the exclusive right of creators, and thereby hangs a tale and a number of important cases.

For our program today, Mass Digitization: Progress, Goals and Roadblocks, we’ll be looking at how technology companies are willing and able to digitize copyrighted works on a scale never imagined before. In turn, copyright owners have raised concerns over their right to do so, and the consequences of mass digitization on publishers’ businesses and the accessibility of copyrighted material to the public remain up in the air. What we are seeing is, if you will, a confluence of three trends – licensing, legislation and litigation. And we will look at all of those in their turn.

On this panel particularly, we will discuss recent developments, such as the Second Court decision in Authors Guild v. Google and the Copyright Office report on Orphan Works and Mass Digitization.

We’ll also consider how copyright law can accommodate mass digitization in the future. And it occurs to me and to others on this panel that, if we look at the Google case particularly, that was a case that began in 2004. And mass digitization then was the purview of a single company. With the changes in technology in just 10 years, mass digitization may become digitization by the masses, and that may change things in ways we have not anticipated.

So I want to use that as an introduction, welcome you again and welcome our panel. Moving from my right, I want to reintroduce to you all Jacqueline Charlesworth, General Counsel and Associate Register of Copyrights for the U.S. Copyright Office. And Jacqueline, welcome.


KENNEALLY: Also my colleague, Roy Kaufman, he is Copyright Clearance Center’s Managing Director of New Ventures. And prior to joining Copyright Clearance Center, he served as Legal Director at Wiley Blackwell-John Wiley and Sons. And Roy, very good to see you.

KAUFMAN: Thank you for having me, Chris.

KENNEALLY: And finally, on the end is Devereux Chatillon, a partner and cofounder of the New York-based Chatillon Weiss law firm. She’s very experienced in media and intellectual property as an attorney and spent many years at the highest levels of the corporate world working at Callaway Digital Arts, at Scholastic as a Senior Vice President, Corporate Secretary and General Counsel, and as well as a number of other firms, including The New Yorker and ABC. And Dev. Welcome as well.

CHATILLON: Thank you very much.

KENNEALLY: We’re going to begin the discussion with actually a presentation by Dev, so we’ll let you do that and kind of remind us all of the major points in the Authors Guild v. Google case.

CHATILLON: Yes, we’re going to start out doing a very brief overview of Authors Guild v. Google. I’m going to assume most of you are familiar with it but try to highlight the points that we’re going to talk about today on the panel. So it all starts with the Google Library Project. I looked hard to find pictures of the actual Wayback Machines they used, and apparently Google has successfully prevented those from going on the Internet. I couldn’t find any pictures, although I did find this one sign that was put up, apparently, at a library.

Anyway, back in 2004, Google decided in its infinite wisdom to start copying, digitizing, scanning for storage in digital media every book ever published in the United States and elsewhere at major research libraries. Google did this without asking for permission. They went ahead and did some deals, starting with state university libraries, which are protected by the 11th Amendment from any financial copyright judgments, they can be enjoined, but they cannot have to pay damages, and Stanford, which has a long and close relationship with Google. It’s now expanded to some other libraries, most of them contributing public domain works.

All books from the libraries were scanned, those in and out of copyright. As of the date of the recent Second Circuit decision on this, which was October of 2015, there were over 20 million books that had been scanned by Google.

The books come from all over the world. You can imagine, it’s the University of Michigan research library, so the books are in all languages, they come from everywhere, they start probably back in the 16th century and go all the way up to the most recent bestsellers. And of course they’re subject to many different copyright laws.

The original lawsuit was brought in the Southern District of New York in 2005. There were two lawsuits that were consolidated, one by the Association of American Publishers and some of the major New York-based book publishers and the other by the Authors Guild. There was originally a very elaborate settlement – Jackie (sp?) and I were just comparing notes on this – that went on for 300-plus pages. That’s a whole other panel, and it’s kind of fun but academic at this point. It was rejected by the district court, appropriately, and the publishers settled with Google. What was left was the Authors Guild lawsuit and Google, and the district court ultimately held that Google’s copying and the very limited uses Google says it’s making from that copying was fair use.

Google’s position essentially is it’s digitizing all these books and it’s using them to inform its search function, that although the entire book is copied, if you search for a term, and the Second Circuit uses the example of sort of when did FDR get polio, you’ll get a small snippet from a book that’s a biography of President Roosevelt that will say he got polio, and I’m going to make up the, you know, in 1937, that’s probably wrong, but that’s an example. So that only snippets are displayed and there are limits, and again I won’t go into all the details on how many snippets from any individual book you can get, even with repeated searches. And there are also limits on the number of pages. There’re some absolutely blacked out sections of each book, according to Google.

Google says that no ads are displayed next to the book, because one of the issues here obviously is that Google is not the Library of Congress, it’s a for-profit company. So they say they’re not benefiting. It’s really for the public good, and that they’re also making copies available to the blind and allowing people to find out, where do I find out this piece of information? There’s clearly a benefit to society from what they’re doing.

The authors, on the other hand, say that the entire book has been copied, and being held by Google, with no restrictions. Google is for-profit, and ads are displayed in connection with certain kinds of search results. And there’s a big question about security going on for the next 50 or 100 years.

And then there is this copy thingy, as I said, there’s a separate related case called Authors Guild v. HathiTrust. One of the things Google did when it was copying was it made a digital copy – made two copies of each book, one it kept and one it gave to the library it copied from. The libraries then all contributed them to a not-for-profit entity set up by 14 of the libraries called HathiTrust, and then they’re all sharing all the –and I think it’s now up to 80 libraries – are now sharing that digital body, which is part of HathiTrust. And there was another litigation related to that. And again the Second Circuit in 2014 held that that was fair use.

So the recent Second Circuit decision, it goes on at great length. Again I’m going to do some highlights, and we can go back to some other parts of it as it’s relevant to the discussion. They went through the four fair use factors. On the purpose of the use, because the court, as the Second Circuit has been focusing lately, held that the copying was transformative because the copying allowed people to search and find books that were otherwise not searchable via Google search, that the fact that Google is a for-profit company did not defeat or even count very much against fair use. They did note again that Google is not running ads next to the search results.

They noted that the books – then the second factor is the nature of the copyrighted work and that usually distinguishes between news reports on the one hand and very creative poetry on the other. And the court said that, because of the kind of use being made, the kind of copying, that really didn’t matter, because again it was still, if you wanted to, for example, find out every time someone used a particular metaphor in the English language, this body of – this database would now allow you to do that, and that really didn’t have to do with the expressive nature of the work, and also the fact that Google isn’t serving up the expressive nature in response to searches, so says the court.

The third factor is the amount substantiality of the copy. And again the court said, because the purpose of the work was transformative and because you have to copy the entire work to accomplish that purpose, you can’t search an entire book for keywords if you don’t have the entire book, even though they were copying the entirety of millions and millions and millions of books, that was still OK under the third factor.

And then probably the most, I think, troubling at least to me personally – and again I speak for myself and no clients or prior employers – in the effect of the market, the court looked at the loss of sales from search results and snippets, whether that would replace buying the book, and they said it didn’t.

They also did deal with the arguments about security. Google is probably one of the most secure Internet providers on the planet in 2016. But will it be in 2026? Will it be in 2036? And because it’s a private company with no restrictions on its use of this, what happens if it turns around and sells its entire database to somebody who’s not quite so good in security? The court just said, at the moment, they’re safeguarding. On the record before us, we can’t find that that impacts on the fourth factor, and it’s not supported by evidence.

So I think there’s a lot of controversial parts of this. And I think we’ll talk about them next.

KENNEALLY: Indeed we will. And I appreciate your presentation, Dev Chatillon. And knowing as I do some of your strong feelings about it, it was a very nice evenhanded one at that. And it’s not often that a panel presentation with CLE credits has breaking news, but I do understand from Jacqueline Charlesworth, as we look at not only where we stand today with the Authors Guild v. Google case but also with, you’ll tell us more about the recent Copyright Office report on Office (sic) Works and Mass Digitization, you have some news regarding the Authors Guild attempt to appeal to the Supreme Court.

CHARLESWORTH: Oh, yeah. No, basically my understanding is that Google got an extension for their brief, which is now due in March. So I think you’re looking at a court conference date later on in the spring, say April or May. As some of you probably know, there are two routes the court can take, the Supreme Court. They can either grant or deny cert (sp?) directly or sometimes, when a federal statute is involved, and here you have Section 107, they’ll ask for the views of the solicitor general on whether to take the case.

So that I don’t know. I can’t read the tea leaves on that one. And either way, most likely you would be looking at a fall argument in this case if they end up granting – well, if they end up granting cert in the spring, a fall argument and then a little bit later if they ask for the views of the solicitor general and they end up granting cert, so still not that determinate, but we know a little bit more about the schedule.

So I’m going to pick up, I think there’s a little bit of a parallel story. The Copyright Office, first of all, was involved in some of the – and this predates my time there – in some of the development of the government’s views relating to the Google Books case and the Google Books settlement, including the views of the DOJ antitrust department.

The government basically came in with respect to the proposed settlement and suggested that the court should rejected. And Judge Chin, as Dev mentioned, in fact did that in, I think it was February 2011. So after the parties had worked up a very complex settlement, which had both a backward-looking and a forward-looking component on what’s – the settlement basically would have compensated rights-holders for the past but it also was a going-forward licensing mechanism with established rules and prices.

It was kind of what we would call almost an extended licensing program or ECL, extended collective licensing. But instead of being adopted through congressional or legislative process, it was going to be imposed as a result of this lawsuit. And the Copyright Office, and the government more generally, had a lot of concerns about that, in part because it kind of set up a very nice system but it really only benefited one player in the marketplace, namely Google. If you’re going to go down this route where you’re going to figure out a way to license mass digitization, wouldn’t it be better to have Congress look at this?

So that’s February 2011, the settlement gets rejected. In October 2011, the Copyright Office released a preliminary analysis of the mass-dig issue, kind of raising a lot of questions. You know, this is an important thing. This is an activity that’s going on. How should we approach it? What should Congress be doing? And there the issue sat for some time.

More recently, the Copyright Office took the mass-dig issue up again, this time in conjunction with orphan works, which is a somewhat related problem – I don’t want to dwell on it – but the inability to track down rights-holders and know how to obtain a license, if you want one, on a mass scale. It makes it very difficult to operate a project or something like Google Books because in many cases there may be valuable works or works that you want to make available, but for some reason you’re unable to ascertain their copyright status. So they are related. But mass dig in many cases does involve the use of works or the copying of works and making available of works where you do know the owner.

So the question really being asked is how should we address this? Some people, I think, believe that –or I know, actually, the position of some – that we really should just do this through fair use. So as in Google Books, you have the court take a look at the project, decide whether it’s, say, socially beneficial, make judgment calls, apply the four-factor test, and make a determination that way. That’s a very uncertain way of going about this.

I think the other concern, and even this is if you read the Google Books, the recent opinion, you’ll see the court was careful to say, well this is just limited to the snippet views. It’s not a replacement for the full-text work. But one could imagine – or easily imagine, that you might actually want to look at the full-text works or have access to the text. Wouldn’t that be fabulous? How many of you have been on Google Books, and you’re like, oh, oh, that next page is blacked out. I can’t – I’m just getting near the answer. Often you’ll look something up, and it truly is not a substitute, according to the court, and in my experience, for the full text.

So fair use, there seem to be not as many limits as there were. But still I think, and the Google Books opinion makes clear, that it probably would not cover actually a situation where you not only copied the books but you made them fully available. So the Copyright Office, after studying the mass dig issue more recently, released a second report in June of this year and recommended that Congress consider legislation for a pilot program. And basically the recommendation would be limited to nonprofit and noncommercial uses of literary works and the accompanying illustration and photographs, so it would be on a very limited basis.

It’s an ECL program. What extended collective licensing is, for those who don’t know it, it basically is a system where you appoint someone, an organization, who can negotiate collectively on behalf of copyright owners and come up with rates, and then grant a blanket license. It usually allows individual copyright owners who aren’t satisfied with that mechanism to opt out. So it kind of is a presumptive licensing mechanism with an opt out, and that’s what’s being suggested by the Copyright Office.

It would be run by a CMO, collective management organization. As I mentioned, it would permit op outs. It would preserve fair use rights, because I think it would be very difficult, and particularly in the current environment, to suggest that this would completely supplant fair use rights, nor do we think that’s necessarily a good idea. Then it would sunset in five years. And the idea here is to allow the United States – this is a relatively new idea here, it’s used in other countries – but to allow us to test out this proposition in a fairly limited way and see if it has legs. And it might be one way to solve a problem.

In June we asked for further comments and more ideas about how we might approach this. We did receive more comments. I think, generally speaking, a lot of the user community – libraries and archives – don’t like it because they think it would undermine fair use. I think they think the better gamble is on the courts to continue extending fair use to do all the things that they want to do. Content owners, on the other end of the spectrum, don’t like it. They don’t like the idea that you would have to opt out and that the rights could be exploited if you didn’t. But that is more individuals, I would say. There were other content owner groups, industry groups, that were supportive, and some licensing organizations. I think CCC filed comments. I’ll let you address them.

But the Copyright Office is currently sifting through these comments, and the next step here would be to make a recommendation to Congress about what might be done in this area. So our role there is an advisory one to Congress. Congress, as many people have mentioned throughout the day, is in the midst of a very lengthy review of the Copyright Act. This is one of the many issues it’s looking at. And we’re hopeful that they’ll consider our proposal and maybe move forward to adopt a solution along these lines.

KENNEALLY: All right. Well thank you, Jacqueline. And you were discussing categorizing some of the responses to this and that libraries, on the one hand, are opposed, rights-holders, on the other hand, are opposed. That sounds like a solution that might work, if everybody’s against it. That seems to be, however, not the track record for these kinds of things, and so will look forward to seeing where that progresses.

But Roy, I want to bring you in, Roy Kaufman from Copyright Clearance Center, and talk about a particular aspect of the Authors Guild v. Google case, and that is something which Jacqueline’s raised and Dev as well, and that is just what it makes possible, because that’s really what we’re discussing here with mass digitization – what is possible when you can digitize all of that material? I suppose a very generous reading of the case would be this makes a great deal possible.

KAUFMAN: No, I think no reading says that there’s a great deal possible. I view the case as, you know, someone has to win because that’s the way litigation works. You have to declare a winner and a loser. Look, I was at a plaintiff publisher, one of the first publishers who filed that first litigation. At the time, Google was talking about, well, the head of University of Michigan was going around talking, this is going to be the next Library of Alexandria. And there was all this highfalutin stuff. And we were a Google publisher partner, so we had actually already licensed Google the right to take our most valuable works and display pages of it under a license where we weren’t going to get a whole lot of money but we kind of liked doing it.

Of course what happened is what always happens in a litigation, so Google had this great big plan, and then the lawyers got involved, and they got sued because they didn’t want to engage publishers. Then, all of a sudden, well, we’re not going to scan this type of content and we’re not going to display this much of content. And there was a settlement agreement, which, again, speaking for myself personally, I agree the court was absolutely correct in rejecting it. But nonetheless it indicated what rights-holders and users, or Google and the libraries, could theoretically do with this large corpus of digitized material if they actually sat together and spoke to each other. But that didn’t happen.

So we ended up with a case where I think the judge said, because A, B, C, D and E are true, this is fair use. Now, I’m not saying whether or not I agree with that case and it doesn’t necessarily mean that, if Google did X, Y and Z, they would or wouldn’t be infringing, although the court does definitely put a whole bunch of pins that says, well, if they did this, that would most likely be an infringement.

The problem is, as you said at the beginning, there are sort of three ways to deal with things. You can do it through licensing, through legislation and litigation. Sometimes you have to sue and sometimes you need legislation, but generally what you’re doing is you’re solving yesterday’s problems tomorrow. Licensing is actually how you solve the needs of your users today. And so the whole purpose of mass digitization, with the exception of maybe how some people view it, shouldn’t be about have we extended fair use or contracted fair use, but it should be about what the user needs to do with that content and how the rights-holder and the user can get together.

So I’ll give a couple example of mass digitization that doesn’t involve litigation or doesn’t require legislation. So today actually, just by coincidence, the Scholarly Kitchen, which unless you’re in science publishing you’re not reading, Rick Anderson, the associate dean for collections at University of Utah talked about the New York Public Library, who did a mass digitization project of clearly, clearly public domain stuff – photography from the WPA, special collections, all of these things, and made them perfectly available high res, and because it’s public domain said do whatever you want.

There’s a lot of great public domain stuff out there, and what was really interesting was what Rick was saying, was where are the other libraries? Libraries have these great collections, and they are not necessarily digitizing it. And in fact he accused them of treating these things as though they are protected by copyright and requiring you to go to the library and seek permission even to make copies of public domain materials. Now, that’s perfectly lawful and there are some good reasons why libraries would do that, but he was kind of exhorting the library community to mass digitize stuff where there is no issue. And there’s so much value in that, and I think most people would agree.

Give another example, it’s mass digitization involving text mining that we’re engaged in, and other people are doing similar things where, but we’re doing it in science, where we’re taking digital content, it’s already born digital but then we’re reformatting it and making it compatible across, I don’t know, 35, 36, 40 publishers – I lose track – thousands of journals, so that users can come in, and they can download and mine the full text.

Now, text mining in 2004, when the Google case was filed, was very different from text mining today. And I don’t want to get too sidetracked by text mining, but what I’ll tell you is we’ve got publishers on the one side and users who are buying our license on the other. Our users, they’re not saying, well, does this help me comply with copyright? What they’re saying is will this help me cure cancer, and will this save me time, effort and money? If it does, I’m going to pay a good fair market value to use this. It’s not about copyright – they might be pro-copyright, they might be anti-copyright – but it’s really about meeting a use case.

Tie all this back to the Google case, and as Jackie was saying, there are things that, whatever your view of fair use, no matter how expansive you believe it should be, you’re not able to do. Full text, in copyright works with known copyright holders, there’s no court that’s going to say you can make those available to everyone anywhere. There are publishers who would. They want to get compensated for it. They want to be part of that discussion.

We should get back to thinking about mass digitization as what’s the use that we’re trying to solve and how do we get there? We might have to get there through litigation, we might have to get there through legislation, but usually the easiest way is through licensing. And I consider what the Copyright Office is doing, yes, there has to be legislation but really it’s all about licensing. So to me that’s more of a license that happens to be enabled by some legislation.

KENNEALLY: Dev Chatillon, as I said, your presentation that opened us up was a fairly evenhanded one, but I know you do have some strong opinions, so I want to give you a chance to articulate them, because I think they’re important. And sticking with this point regarding fair use, where Judge Leval really landed on was this notion of transformative use, something that, if I have it right, and I’m not an attorney, I’m sort of the odd person out here on this panel, he really pretty much conceived at the Harvard Law Review 25 years ago, but it’s your notion that this very interesting aspect of fair use transformative, is something that’s become something of a monster, and that it’s eating up all the other factors.

CHATILLON: Absolutely. And I’m not alone in that. Judge Posner, in the Seventh Circuit actually wrote an opinion saying the Second Circuit’s gone off in this transformative use rabbit hole. We think that’s interesting and great, and picking up from some language from the Supreme Court, to be fair, in Campbell v. Acuff-Rose, but in my personal opinion, I think taking it way too far. This case is one example. There are some earlier cases that again are beyond the scope of the panel that, I think, it goes really too far.

But I think the problem here is illustrated, for example, the copying of the entirety of the work and the delivery of the snippets, what the court is really saying is that delivery of the snippets is transformative. If you look back at Campbell, what they’re really talking about is transforming, they’re talking about a parody. That case is about Pretty Woman by Roy Orbison and a rap song, I’m going to forget who the rapper is who (inaudible),

F: 2 Live Crew.

KENNEALLY: 2 Live Crew, right.

CHATILLON: Thank you. Who could forget the Supreme Court on 2 Live Crew? It’s kind of a clash of cultures. And it’s a parody, which is a common critique and commentary using some of the original to conjure it up. That’s what the Supreme Court was talking about in Campbell. They weren’t talking about changing the medium of an entire industry for a private company who’s doing it not because they’re altruistic, because Google, I think – and again I have no inside knowledge of this, I’m merely an observer – they do things because they think they’re cool, but they also, once they’ve done something they think is cool, figure out a way to make billions of dollars from it, which is fine. That’s cool.

But I don’t know why they should be allowed to do it based on other people’s work and putting at risk, if there’s a breach of Google servers, you lose the entire U.S. publishing industry. They haven’t consented. They haven’t been compensated. Why is that fair use? I don’t think it makes sense here. And my last bit of my rant is suppose Google starts selling ads? Suppose Google starts data mining what people are looking for and serving up ads next to that and makes another gajillion dollars on it?

To sue Google again, once they change their business model, and they will, they should, but they will, using this database among other things, you have to have probably a war chest of, I would conservatively estimate, $10 million, maybe it’s $20 million, to go after them. Who has the money to do that? That was one of the problems both the publishers and the Authors Guild ran into as a practical matter in litigating this case against Google to begin with.

And back to what Roy was saying, what’s interesting is, if you look at mass digitization, who has both the economic wherewithal and the business model in the private sector to make it make sense? And is that really where we want to put the entire body of American creative work, in someone who has that combination of, oh, I think I can make money off of this and spend a few billion dollars copying it? Do we want all our movies on that, all of our photographs, all of our newspapers? I’m not sure we do. Personally, my answer would be no.

KENNEALLY: Well, Jacqueline Charlesworth at the Copyright Office this point regarding fair use is one that you have concerned yourself with in the report and just other discussions. And I’m curious because, just to sort of respond to the point that’s been made so far, which is that, in the particular case of Authors v. Google, the judge made very specific findings with regard to fair use that seemed to draw a pen around what could be done with this material. But yet fair use is often considered to be open-ended and so forth and not predictive. But that’s something that doesn’t satisfy you. It does satisfy many librarians, though. And so talk about why you feel, why the Office feels, that there’s a need for clarity or even certainty with regard to this.

CHARLESWORTH: Yeah. Well, actually, I mean I can’t speak for libraries. I do think they tend to – say 10 years ago, I think they were much more interested in orphan works legislation and so forth. And now I think that the fair use tide has been in their favor. They have come to, as a policy matter, really support fair use. But I think really what we’re talking about here is whether we want courts to be basically legislating this issue, and mass digitization, and what that future looks like. As Roy said, courts decide particular controversies, and then that may become a precedent. But the next person that comes along may want to use, say, a somewhat more expansive use of the books. Is that going to be fair use?

Google can afford to take the risk that someone will sue them. And so much of policy really is dependent on who has deep pockets and who doesn’t. So if Google can probably say, well – I really don’t know, but I assume that they understood there might be some risk with this project, and they figured, well we can handle that risk, we can afford litigation. But it really shuts out, say, smaller players, who might have a slightly different or a competing model or want to do the same thing, and maybe can’t afford to take that risk.

So partly I don’t think it’s a good way to proceed. I think it tends to favor larger actors, in many cases the people who can afford to defend on fair-use grounds. It’s just it’s not the best way, we think, to solve this problem. We’d rather create a more level playing field, where different actors can get into this game with slightly different models. As Roy was saying, in many cases you would want to make broader uses of the work that pretty certainly wouldn’t be fair use. Why wouldn’t we want to allow that if the content owners are willing to grant licenses?

So for all those – there’s one other piece here too, and you would know more about this, but administrative costs – and this is another argument for ECL – if you handle everything through private licensing, it can work. But also, when you’re licensing lots of smaller orphan works and so forth, it can be very expensive to track down the information. And so having a solution where there’s legislation that would allow, say, an ECL model to come into being would help cut down on those administrative costs, because an opt-out system allows that framework.

So for all those reasons, I think a better solution would be one that really anyone could take advantage of, and it’s not dependent on which particular parties are in court or what judge happens to be looking at the issue and whether they think the activity is socially beneficial.

KENNEALLY: Right. And it’s that notion of the ECL, the extended collective license, is that it allows for a large body of works to be used in standard ways, so it provides that predictability that so many people are concerned with.


KENNEALLY: Absolutely. Roy Kaufman, with regard to the licensing point that you have raised here, talk further about this notion of a missed opportunity, that bringing it into the courts and where that all wound up with Judge Chin, and now on its way to the Supreme Court, has really allowed for a missed opportunity.

KAUFMAN: Yeah, it has. And I think that the Copyright Office did this very admirable thing of saying, OK, there’s this missed opportunity here. Let’s get everyone to give us comments as to whether we can actually fix this. If you think of the science publishers, with whom I’m familiar, and for whom I used to work, it would directly license full text anywhere in the world, large databases. No one has a problem letting users have access to their content. They expect to get paid for it, of course. There needs to be some sustainability.

And the problem is, and I don’t think this is librarians, although I do think it is certain library organizations, librarians, a lot of them, are very risk averse and would much rather have a rule than fair use, not all of them I don’t speak for librarians. I don’t think anyone can speak for all librarians any more than anyone can speak for all publishers or all authors. That’s a very dangerous place. But licensing and collective licensing has its value. As Jacqueline mentioned, generally the place for collective licensing is where the transaction costs, writ large, so the friction –identifying the person doing the transaction, cutting the check, of direct licensing – are prohibitively high. So it’s going to cost you a bunch of people talking for four hours to conclude a license fee for $100, that’s not efficient. So collective licensing, whether it’s voluntary or extended, is really designed to fit that need.

Again it’s this why shouldn’t copyright be polarized? Everything in America is polarized today. Why aren’t there user groups that say without giving up fair use, if you want to say, hey, we keep winning fair use and that’s great but I want users everywhere to have access to full-text, in-copyright stuff that they can do whatever they want, that opportunity was given to us by the Copyright Office. Unfortunately, I don’t think there was enough vision. There’s so much of this I’m winning these tiny little battles, maybe some of the battles are big and some of them are going the other way because not all fair-use cases go with the user. We’ve seen some certainly pushback in GSU, Georgia State, and other cases. There’s a continuum here, and there’s an ambiguity.

No one’s going to buy a license if they don’t get the right to do something that they don’t think they have the right to do without that license. And then, the edge of that license, there’s stuff that might be fair use, and you don’t even have to argue about it because you’re paying for that other thing. You’re paying for that use that you don’t have.

And so I view the responses that I’ve seen, and I haven’t read all of them, certainly, in response to the Copyright Office (inaudible) as a missed opportunity. I think, if everyone sits down, people who really want to serve the needs of users, publishers who really want to serve the needs of publishers, authors who really want to serve their needs, people sit down, they can’t reach an agreement, it doesn’t work, that’s fine. The refusal to sit down is just a lost opportunity, and that’s really what I saw in a number of the responses to the Copyright Office request for information.

KENNEALLY: Dev Chatillon, with regards to lost opportunities and how realistic all of that is, one of the responses to the proposed settlement actually came from the Copyright Office. The then Register of Copyrights, Marybeth Peters, spoke about it standing copyright on its head, a phrase that stuck with many people. And you have, in your review, spoken to some of the traditional aspects of licensing, which include, of course, compensation and permission obtained.

But I wonder whether –and this is a question central to this whole conference today – whether, in this digital age, with technology advancing in so many new directions, whether that really is a realistic approach today. Is it not really understandable when technology companies act first and then defend themselves later?

CHATILLON: Well, it’s understandable. It may or may not be lawful. I think that’s the question. I think there are some aspects today where copyright law probably needs to be stood on its head. Jackie referred to this briefly. I haven’t gone into all the details of the ECL so someone will correct me, but orphan works is an example where, because of changes in our copyright law, we now have very long copyright terms and no sort of automatic way to keep track of whoever owns the particular rights. There’s no safe harbor under the copyright laws, although one has been proposed for Congress’s consideration, that says if you try really, really, really hard and you still can’t find who owns a photograph and you go ahead and publish it, you have no protection. If someone comes out of the woodwork after you’ve published and asks for $100,000 and your budget was $10, you’re just out of luck, because it’s going to cost you that much to defend it in court.

So licensing schemes that give both leeway and safe harbors for that kind of problem but also could bring in some of the orphan works so that people aren’t spending ridiculous amounts – back to Roy’s point – ridiculous amounts of time and money trying to track down collections of photographs or poems or novels that are out of print or historical works of various kinds but can still make them available and funds can be segregated to pay for any rights-holders who surface later, I think, really have a place right now, because we have these very long terms and it’s very hard. I’ve experienced this on the publishing side working for Miramax Films and ABC and The New Yorker, it’s very hard to find people and it’s very hard to figure out how to proceed in the face of that risk.

KENNEALLY: Jacqueline Charlesworth, the report from the Copyright Office, the proposal for extended collective licensing, has been, so far as I know, restricted to text right now. And of course the particular case of Authors Guild v. Google is about text works. But as we look to the future, mass digitization will go certainly, and already has begun to, beyond text itself to include other types of work, and that could be broadcast medias, film databases and so forth. So what are some of the questions that you would want to see explored as it moves from books and text into other areas?

CHARLESWORTH: Yeah, well just a quick clarification, so the Copyright Office proposal, would include the illustrations that are, say, in books and photographs, but it’s still quite limited.

I think another example of this is the recent TVEyes case, and I don’t know if people have been following that, but that’s a service that digitizes every TV news broadcast and then, for a subscription fee, allows people to search for particular clips. And that was just considered here in the Southern District. And the basic database component was considered, was found to be fair use, at least so far, and it hasn’t gone up on appeal. At the same time, Judge Hellerstein said, well, you can do that and people can subscribe and look for the stuff in your database, but they can’t e-mail links around. That’s too much.

But what you have, when you think about this, you have judges who are really crafting kind of licenses. They’re saying this is OK, this isn’t OK. And again, just to return to my theme, that may not be the ideal way to figure out these issues, because it’s so dependent on the way a particular judge cuts the fair use issue . Just to your question, that’s TV broadcasts. You can imagine movies, music, why, under the rationale of Google Books, I think it’s conceivable that some might think that it would extend to creating a searchable database of music. Although that, to many maybe music people in the audience, that sounds horrifying, I don’t know that –although the decision’s limited in terms of – it certainly has a philosophy in it about giving information about the works as opposed to the expressive content of the works. You can imagine many different kinds of works that could fall under that rule.

So we’re going to see many more examples of this coming up. And the question is whether it’s something that we again want the courts addressing, or whether Congress should address it.

KENNEALLY: Right. And Dev Chatillon, with regard to the notion of this all encroaching into other areas that haven’t been anticipated or aren’t delineated in the court decisions, you raised some of those concerns. Discuss those a bit further. There’s the matter of what Google might do at some point, should it sell the database, should it decide that the law has opened up a new area, so that it’s not limited to snippets anymore. I mean it’s really, you could enjoy a bit of speculation with this, where this might go, and the concerns that it would raise for rights-holders.

CHATILLON: Sure. One of the interesting things about the litigation, for those of us who’ve followed it pretty much all the way through, is that the current restrictions, air quotes around restrictions, were self-imposed on Google, that were key to the Second Circuit and, before that, the Southern District decision. Given that, in 2004 –and I want to just remind everybody, in 2004, the e-book market was still in its infancy. Publishers were just beginning to digitize books on their own. So one of the things that struck me – this is a bit of a side thing, but I think it’s worth going down for at least a second – is one of the things dismissed out of hand in the Southern District and in the Second Circuit in this decision and in the HathiTrust is the licensing of digital copies of books for text mining and for informational searching. And the courts say, oh, there’s no evidence. Well, there can’t be evidence. Google took the market. They didn’t ask permission. They just went in and took the market.

And once they’d taken it, Microsoft and other people have started to do similar projects with permission from the copyright holders, and they abandoned them because it just didn’t make sense financially for them to proceed in light of this enormous company just sort of taking all of this material for itself. So I think one of the things, and Jackie touched on this, I think, as a matter of public policy is do we allow mammoth corporations to come in and kind of determine what uses are OK and what are not and what can be licensed and what can’t?

One of the things that I look at sort of the marketplace and say facial recognition, visual recognition technology, is getting better and better and better. And I am not, as my partner well knows, a very big Facebook fan. But one of the things I’ve noticed when I’m on there is it says tag this person, tag that person. Isn’t this your daughter? Isn’t this somebody else? It’s one of the reasons I don’t do it, it creeps me out. But as visual recognition software gets better and better and better, wouldn’t it be fun to take all the sort of cameras that are around, all the Facebook photographs, and start to compile those into databases that can be mined for a variety, and monetized for a variety of purposes?

At the moment, the terms of use for those that are uploaded usually would allow that, but what’s to stop someone else from doing a fair use and doing something similar? What’s to stop them from doing movies? You know, I want to go back and do research on movies. Why can’t Google use this to go and copy every movie ever made? They have some circumvention issues, which the Copyright Office might have some views on under the DMCA. But as a matter of simply the copying for purposes of serving up snippets or other things, there’s really no reason why they couldn’t do that under this decision, depending again on exactly how they were going to use them.

So I think it raises a lot of very troubling questions about sort of how we structure consensual use of a lot of our works and whether the government’s a key part of that, which I think, in a democracy, it should be, or if the people with the most money just get to decide.

KENNEALLY: Right. Finally, Roy Kaufman, before we go to questions from the audience, we’ve been talking about some of the really particular aspects of the Google case. And Dev Chatillon referred to a very different Google in 2004 that undertook the library project. At that point it was simply a search engine. Today it’s involved in driverless cars and smartphones and so many other things. With all of that in mind, is it likely we will see another such effort as Google undertook in 2004? Is mass digitization itself now in the rearview?

KAUFMAN: Well, no, and yes. So I view this decision, and it’s interesting, 2004, part of it, the eternal optimist, I say I maybe Google has decided it would rather deal with driverless cars than just to keep nitpickingly extending on this project, where I think they got their hand slapped, whether they won or not, they kind of got their hand slapped. That said, I can’t predict where Google’s going, but this whole issue of licensing, first of all, in 2004, the trade publishers weren’t really that advanced on licensing but there was a lot of evidence that wasn’t before the court as to actual digitization of books and licensing of full-text books online. But I’m not going to get there. That’s another issue. The record before the court is the record that you have to deal with.

But where really, where that issue, the rubber really hits the road is, in 2004, let’s take text mining, which Dev mentioned, there was no really good text mining. Someone might create some little piece of software, and it was all home grown. This is a $2 billion-plus industry with a 20% compound annual growth rate, I was going to say CAGR. There’s a lot of money. And in fact there’s a lot of money in licensing right now. So hedge funds license news feeds from newspapers. They pay a lot of money for it. And then they place bets, and sometimes a human never even reads it. The consumptive, the content is read by a computer, and then that computer makes some decisions on trading. And that’s a really nice growth area for newspapers today. But what if Google says, well I’m just going to do that? And I don’t think that would be fair use. I think there’s a good license alternative, I think it’s clearly commercial, and if Google were going to sell it, there’d be a problem. But the TVEyes case, well, that’s the same easy use case for that. Those same hedge funds who are paying money to mine the newspapers also want to mine the news. In fact they want to mine anything they can that’s going to have an impact upon stock prices.

So it’s this backward-looking issue of litigation, which means it doesn’t really solve the question that you can solve with licensing, because you can argue now, well, what is the, what was the text mining? Well, the evidence of text mining in the Google case was a type of text mining where you’re just sort of counting a word and saying it’s social sciences, it’s what I would call metatextuals, about the text, not about the content, so, and that’s maybe all that was going on then. But this is not 2004, and there’s a lot more that goes on now and there’s a lot more licensing. The market has developed, and then you can have these market chilling effects because of the ambiguity that comes from a case, which, even if the evidence was the best it could have been, was the evidence of 2004. So now we know what the law is like based on facts in 2004. We don’t know what the law is based on today.

You want to get there, we can litigate it. People spend a lot of money litigating, and maybe the best financed wins. I agree, in a democracy, maybe this is something that Congress should look at. What the Copyright Office says is, well, let’s create a flexible structure with an extended collective license that actually is adaptive, extended collective license isn’t fixed in time. It is about negotiating a license that works well for users and rights-holders. So it was a really nice way to do it, and I hope it succeeds. I’d like to see more users engaging in this process than we see today.

KENNEALLY: Well, indeed, when it comes to copyright and technology, it does seem that time flies. It’s hard to believe that it’s more than 10 years since the Google case was first enjoined. So I want to thank my panel, Jacqueline Charlesworth, the General Counsel and Associate Register of Copyrights for the U.S. Copyright Office, my colleague from Copyright Clearance Center, Roy Kaufman, Managing Director of New Ventures, and Attorney Devereux Chatillon. I want to thank as well Bill Rosenblatt, the organizers of this conference, and thank you for joining us. My name is Chris Kenneally, for Copyright Clearance Center. Thank you.


Share This