Transcript: The Machine, the Reader and the Community

“New Directions in Publishing Technology: The Machine, the Reader and the Community”

with
Gerry Grenier, IEEE
Ed Pentz, CrossRef
Roy Kaufman, Copyright Clearance Center

recorded January 22, 2014 at British Medical Association headquarters, London

KENNEALLY: Right now, we’re here to talk about the future. It seems to me to be really appropriate to come to London, to the UK, to do that. After all, it was 50 years ago that the US, and indeed the rest of the world, turned to London and to four young men for the future of popular music. How ironic it is or interesting it is that today, we could go down to the British Library, and not too far from the Magna Carta, you’ll see the handwritten lyrics of John Lennon and Paul McCartney.

That was 50 years ago that the UK led us to a future in the music business. Today, it seems to be a good place to go to learn about the future of publishing, just the future generally. If you come to London, as an American anyhow, with the expectations that you’re going to find palaces and pubs and Georgian row houses and gorgeous little pocket parks, you’ll find them. They’re all here. In fact, they’re right outside that door.

But you will also find the tallest building in the EU – 87 stories, just down the road, by the River Thames, the Shard. Quite a beautiful building, quite a remarkable building in the midst of all the other historic buildings. Not far away is another building called the Gherkin, which is, I believe, the second-tallest building in the EU. My only question is since when did buildings get cute names? In the US, we have cute names for snowstorms and that kind of thing – Nemo and Hercules and so forth. But apparently here the trend is to give your buildings cute names.

So we’re going to talk about the future. The future, said William Gibson, the science fiction author, is already here. It’s just not evenly distributed. He also said that the problem with the future is you can’t Google it. I tried to Google the future just yesterday morning. On my Google, the first result page was the bio for a rapper whose name was Future. That helped me to learn everything from how many platinum albums he has as well as what he pays in child support to his various former paramours. Another result was a program on the BBC called The Future. But unfortunately, The Future is not accessible from the UK. I thought that was pretty funny, because remember, Gibson said that the future was not evenly distributed. So at least in the case of that particular program, it’s only distributed outside of the UK.

Finally, as a way to set the stage for this discussion, I’ll offer a caveat. The problem with predictions is they’re very difficult, especially ones about the future. Niels Bohr, the physicist, said that. That’s a remarkable thought. It wasn’t Woody Allen making a wisecrack. That was the very, I think, reflective physicist Niels Bohr who pointed out just how difficult this whole business is.

To help me out, we have a panel here of distinguished representatives from throughout publishing. To my right is Gerry Grenier. Gerry, welcome.

GRENIER: Thank you.

KENNEALLY: Gerry is senior director of publishing technologies for the Institute of Electrical and Electronics Engineers, easier to say as IEEE. He leads a 46-person electronic publishing team responsible for technologies that are used to create and distribute content, including development and operation of IEEE Explore, a digital library that contains some 3.8 million journal articles. Prior to joining IEEE, Jerry was director of publishing technologies at John Wiley & Sons. He’s an active member of the scholarly publishing community, I think a familiar face and name to many of you here in the room. He serves on the boards of CrossRef, NISO, and the International STM Association, where he’s currently the chair of STM’s Future Lab Committee. The Future Lab is clearly why we’ve asked you here today.

Then to Jerry’s right is Ed Pentz. Ed, welcome.

PENTZ: Thank you.

KENNEALLY: Ed became executive director of CrossRef in 2004, when the organization was created as a not-for-profit member association of publishers to provide a cross-publisher reference linking service. He’s also chair of the board of ORCID, a registry of unique identifiers for researchers established in 2010. He has a degree in English literature from Princeton University, and he lives in Oxford, England, so at least there are two English degrees up here in the room. I’m going to look to Ed to get me out of some difficulties around science and technology.

Finally, Roy Kaufman is my colleague at Copyright Clearance Center. Roy, welcome.

KAUFMAN: Thank you. Good to be here.

KENNEALLY: Roy is managing director of new ventures at CCC and is responsible for expanding our service capabilities as we move into new markets and services. Again, he’s a face and a name familiar, I think, to many in the room. Prior to joining us, he served as legal director at Wiley-Blackwell, John Wiley & Sons. He’s a member of the STM Copyright Committee, as well as the UK’s Gold Open Access Infrastructure Program, and he formerly chaired the legal working group of CrossRef, which he helped to form, and he’s also worked on the launch of ORCID.

So a great panel, and look forward to our discussion. We will move into the room for your questions and thoughts a little ways into the program. But I want to start with Gerry Grenier. Gerry, we said you were the chair at the Future Labs of STM, so a good place to start. Just recently, you and that group worked on an effort to not so much predict the future, but to at least anticipate it. You identified some important themes, and I want to talk about those. One of them was something that helped to give a name to this discussion, which is that the machine is the new reader. I like that phrase, because it sounds futuristic. It sounds like there are robots and rayguns involved, right? But what does that mean, really, when we say that the machine is the new reader?

GRENIER: Let me start by just saying thank you for bringing me here today, and thank you for allowing me to share my thoughts. When you mentioned the Beatles at the beginning, I was hoping you could assign us names. I was wondering as you were talking, am I John, Paul, George, or Ringo? But anyway, it’s great to be here.

KENNEALLY: What does it say about me that I always thought I was Ringo? I don’t know why. I never thought I was the cute one. I never thought I was the smart-aleck one or anything like that. I thought I was the dumb one, I guess.

GRENIER: Well, I always thought I was Pete Best, the one who quit before they got big.

KENNEALLY: The one who lost out, yeah.

GRENIER: Anyway, it’s great to be here, and I see a member of the Future Lab. Sam Bruinsma from Brill is back there, and Sam sat in. Sam, you my conscience here today. So if I go off on a tangent, just bring me back to reality. What do we mean by the reader is a machine?

First of all, the Futures Lab is a great forum and a great service that STM has put together. What we do is we gather a mix of technology as well as publishing business people to sit at least once a year in a roundtable, generally in London the day before our innovation seminar, and we go around the room three times and ask people what they see as the trends for the coming year, and then we boil that down. There’s lots of redundancy, obviously. But then we, at the end of the afternoon, boil that down into themes. This year’s most prominent theme was the machine is the new reader. What do we mean by that?

Well, I think we’re seeing the dawn of the age, or at least the maturation of this whole science of computational linguistics. It’s a science that really began somewhere around the ’70s and ’80s, and is now just getting us to a point, I think driven by the Internet, where people really like to seek deeper meaning in the words actually like to see a machine do the thinking for them. Breaking down text into their parts of speech and then being able to discern some thematics, themes, and being able to fingerprint articles, I think is something that’s bringing us one step further now, so that we don’t need to spend time to actually read that article. We can have a machine actually summarize it for us and give us some insight into the deeper meaning of it.

KENNEALLY: In essence, the article finds me.

GRENIER: Yeah, I think that was something that someone might’ve said on that Friday afternoon back in December, the article finds you. Exactly.

KENNEALLY: Which addresses one of the key challenges throughout all of publishing, not just in the STM environment, but throughout publishing, which is discovery. There’s such a plethora of content available to people, so many choices. To be able to push material, to push information, at the recipient and the appropriate one is really something to work hard for.

GRENIER: Absolutely. I think the combination of a machine doing some linguistic analysis of an article, but also a machine learning the habits of the user and being able to define my interests by tracking the things that I read – Amazon does this on, I think, a very simple level, with the products that we purchase and the products that we view online at Amazon. But it’s that combination of matching those two things up. What content set is out there, and what content set does Gerry like to read? What has he read in the past? And try to bring those together at a better level, and do a better job of that than we might’ve done in the early part of this century.

KENNEALLY: You brought up Amazon, Gerry. What’s interesting about that, I think, is that the user today – we used to call them the reader – has an experience online that goes far beyond their professional, their work experience. It encompasses their personal lives, their consumer experience. Some of the expectations that are raised by companies like Amazon and Apple and Google sort of bleed back into their professional lives. So it really puts the burden on professional publishing to really come up to those expectations that have been set elsewhere on the Internet.

GRENIER: Oh, absolutely. I think there’s no question. I think the one thing that have probably all been part of your own customer – we at IEEE have customer feedback panels and constantly are touring the nation and the world, actually, and getting feedback from user groups. What we hear all the time is yeah, I want it to be like Amazon. I want it to be like Google. I want it to be like XYZ. In some ways, that’s good, because we have these companies pioneering, and it’s easy to follow in someone else’s footsteps.

But on the other hand, I think at least in science publishing, it presents some particular problems, especially in the area of article recommendation – a bit more complex than simply recommending a different pair of shoes or recommending a particular appliance. When you think about the descriptive terms that are used in the consumer world, they’re far more simple than the concepts in a 10,000-word journal article.

KENNEALLY: Absolutely. And I can’t imagine, too, that you’re going to be delivering your subscriptions by drones anytime soon.

GRENIER: (laughter) No, no. No, I don’t think so, but we’re working on that one.

KENNEALLY: Well, you’ll work on that, too. I want to bring Ed Pentz into this, because I think it makes a good transition for what Gerry was just saying about helping identify materials and getting them to the right people is really all about shifting to a more responsive technology on behalf of the needs of the researchers, the authors, the people who are actually using the material. Can you talk about the ways that CrossRef sees the future with regard to helping that particular objective or serving those needs, the authors and the researchers?

PENTZ: Yeah, I think looking at it from the point of view of authors and researchers, and touching on what Gerry was saying about the machine, I think Dominic had a really good comment about Google hiding the complexity and making it easy. I think that I would agree with Gerry that scholarly content is much more complicated, and the discovery issues are more complicated. But I think that’s one of the challenges for publishers, is to make it as easy as it is to use Google. Actually, in fact, publishers see a lot of traffic from Google. So researchers are using it, and their use of consumer technology is setting their expectations for how they use scholarly content.

But having said that, I think it’s really important for scholarly publishers to understand the things that they do that are unique. How do publishers go about making sure content is high quality? There’s a whole set of things that publishers do that aren’t apparent to readers of content. We have to move beyond – right now, there’s a lot of reliance on journal brands. This article’s in Nature, this article’s in Cell, so people see that as conveying quality. Often, it does, but there are so many different types of resources now.

Researchers are publishing in many different formats. There’s blog postings, that type of thing. So I think the real challenge for publishers is conveying better information to users, so users can assess quality. You’re not just telling somebody, hey, this is good quality. You have to show them, so that they can assess and make a determination, and I think that’s the biggest challenge.

KENNEALLY: One of the things, I believe, that CrossRef helped to develop was something called CrossMark. That’s also related to the whole notion of quality and trust, knowing that the content you’re looking at is up to date. Tell us about that and place that within that answer you just gave as to why that’s important.

PENTZ: One of the ideas behind CrossMark is for conveying extra information to readers about scholarly content. CrossMark is both a logo that is on content – PDF, HTML versions – and users can click on the logo, and it then gives them information about the current status of that article. For scholarly content, if the content is corrected, or even retracted or withdrawn, it’s crucial that users know that information.

For instance, somebody may have downloaded a PDF, it’s sitting on their hard drive for six months, the researcher goes to use it, wants to cite it. They may not know that that content’s changed. So the idea behind CrossMark is there could be extra information. They can click on that, it’ll actually check the current status of that content, and then tell the user, hey, wait a minute, something’s changed about this. You need to go look at the publisher-maintained version.

That ties into this issue of trust, and that CrossMark is sort of a platform for publishers to not only provide information about corrections, but information about peer review, what type of peer review that content underwent, so it can again build trust. We see that publishers could actually include a whole lot of information as part of the CrossMark, which could be rights information, copyright information. CrossMark is a vehicle for that information.

KENNEALLY: That point about trust is important, because scholarly publishing, scientific publishing, at one point had a fairly limited audience. It was those who wrote the articles and those who read them. But because of the Internet, because of Google, now that community has grown enormously. It’s a global community. It includes absolutely anybody who can get to the Internet and get to that particular article. So these questions of trust and reliability are all important not just to the research community, but to this global audience.

PENTZ: Yeah, that’s right. If somebody just does a Google search, there may be a scholarly journal article in there, but there may be other versions of that same article. There may be the author’s version. There may be a version that’s out of date posted on a different website. So it could be very tricky to get users to understand this issue of quality and trust, but I think publishers have to do that.

They have to make an effort, but do it in such a way, again, not just to say – my colleague Geoff Bilder always says, the worst way to get somebody to trust you is to say, trust me. (laughter) You have to give people reasons why they should trust you, and so I think there’s a whole range of activities that publishers need to do in order to gain that trust from end users, whether it be researchers or more lay readers, which is becoming more important, especially with open access and funders and governments concerned about public access to the outputs of research.

KENNEALLY: That puts the finger on something that really seems to me to be an important challenge when it comes to technology. Obviously, we need to keep up with technology, but that then precipitates its own communications challenges, it seems to me. They kind of go back and forth. We advance the technology, but then we really need to address the need to inform people about what’s changed, how this is better for them. It really means you have to do the work behind the scenes and then up front.

PENTZ: Yeah. You have to hide the complexity. There are a lot of challenges. I think another big challenge is that we’re in a world of imperfect information. I think everybody deals with this when you think about, say, the experience of Amazon and looking at Amazon recommendation system. It can be good in some ways, but it can be just awful in some other ways.

When you think about the computer power that an organization like Amazon has and the huge amounts of data that they have, but when you look at their recommendations – I bought a birthday gift for my nine-year-old a couple of years ago, and for months and months after that, all my Amazon recommendations were things to do with nine-year-old children. They were totally useless to me after that one event.

I think that also has an impact on scholarly publishing as well, and that’s where we get to, say, identifying content, but also developing better services for users and collecting better information. This is something we see all the time at CrossRef. Publishers have new requirements for collecting lots of data in a reasonable way that they can distribute that they never did before, things that have to come out of the editorial system, the manuscript tracking systems – information about funding, all this kind of stuff. But the end goal is always to make things easier for the users to establish that trust, to hide the complexity. I think those are some of the challenges that publishers are facing.

KENNEALLY: Roy Kaufman, Ed was just hinting at something that comes up in almost all of these discussions – the secret word, if you will, and that’s metadata. It is an important aspect of all of these points. You’ve been involved working with CrossRef and other organizations in helping develop standards around metadata. Can you talk to the challenge about that and just how, for this audience, they should view the development of standards? Is it a positive development, driving change that’s really a positive change? I know there are those who worry that standards lag behind change.

KAUFMAN: Yeah. First of all, there are standards and standards. What do I mean by that? There are standards like a DOI, where everyone would say, oh, this is a standard. It’s got an official standard stamp on it. Maybe it even comes from an official standard stamping organization, like NISO or ISO or someone. So this is now the standard. Then there are the de facto standards that come up around normal human interaction. I would say Google search. No one’s said Google search is a standard. It just is what everyone uses. Frankly, there are better search engines out there, and probably Google could make a better search engine, but it probably chooses not to because something has now become the de facto standard.

So when we talk about what’s holding back – and Amazon’s a really good example. The recommendations are a little bit flawed, and they don’t always even know. They might know that someone who buys this book also buys this widget, and there’s no connection between the book and the widget, but they’ve just noticed this pattern, and so they’re just going to keep feeding up the pattern, and they don’t even know. They see a pattern. They don’t know what’s behind it, and they really don’t care. They just want to sell.

But Amazon’s a very closed system. You have many, many publishers, and they want to be everything all at once for everyone. They want to know that if you’re buying What to Expect, you might also buy diapers or something very simple. But they have it all there.

The challenge for publishers, and where I think we need standards – it’s not good enough, though, to have standards. You also have to adopt the standards, and then you have to not just adopt the standards, but then you have to use what you get out of the standards. So you get all this data back. As Ed was saying, publishers want data, everyone wants data. Do they know what to do with the data when they get it? The answer’s probably yes to some degree, no to some degree, what are you going to do with it?

The challenge with content and with STM is the long tail and the open systems. Everyone might have that little piece of data, but imagine you have a recommendation engine. You can be a very large publisher and say more like this, but the more like this is only coming from you, because you don’t have all the other publishers. No matter how big you are – you could be the IEEE, you could be Elsevier, you could be Wiley – your dataset’s only as big as your dataset.

And the users, they care a little bit about the brand, which is the journal name. If you’re a society publisher, they probably also care about the brand, which is the society name. But they care about their brand, which is the author name, and ultimately what they want is what answers their question. They’ll evaluate the quality based on the brand, and the brand is the author, the publisher, the society, whatever. But first, they got to get the data.

That’s one of the things that I think CrossRef and CrossRef search has done to some degree is to try to bring all of that together, but it’s a whole lot of data, and then you got to mine all of that data, and you have to mine it intelligently. So yes, we need standards. Yes, we need to develop standards where there’s a gap. We need to adopt existing standards, even if they’re de facto standards, and work together to expand them when there’s that opportunity. But then we really have to share it and figure out what to do with it all. That’s where the opportunity is, but that’s also where the challenge is.

KENNEALLY: Indeed. Gerry, the other challenge that was important for the STM Future Labs was the rising role of the author. This has come up tangentially in the discussion so far, but I wonder if you could explore that point with us a little bit. This author, this researcher, the contributor, is changed. They aren’t simply providing content anymore. They’re also a purchaser. They are fulfilling a transaction, much more than simply submitting a manuscript. Talk about what that’s going to mean for the future.

GRENIER: The concept that we came up with in December was the idea that the author now has the wallet. The author more and more, as we see open access taking hold, is the person with the money. So now we have a new dynamic in our interactions with the author. Making it easy for that author across the industry to engage with us on a business level, I think is a challenge. When we talk about standards, I think standardizing that interaction with the business – authors write not only for IEEE, or the IET, or Wiley, or Elsevier, authors write for themselves. In the process of doing that, they deal with many different companies. It would behoove us to try to standardize those interactions across the companies.

I think the other issue with authors – we coined this term at the Futures Lab called developing an author egosystem. It’s not the ecosystem, right, but the egosystem. Talking about standards, Eugene Garfield back 40 years ago came up with the journal citation index and using citations as a way of measuring not only a journal’s value, but an author’s value. Now, we’re seeing this whole new area of alt-metrics. There are so many new metrics out there now, and I think the challenge in the coming couple of years – and I think it would behoove us to move pretty quickly on this – is to try to come to some common understanding of which alt-metrics really matter.

Downloads may, at first glance, seem a no-brainer, but people could be downloading an article because it’s really bad, because it’s a great example of plagiarism. Citations of citations – who cites the papers that you cite, and doing some analysis on that. I think combining the needs of the author and linking that concept with standardization across the industry is an area of opportunity for all of us to work together in the coming 18 months, two years.

KENNEALLY: And the relationship with the author is changing. Not to hit on this note too often in this discussion, but really now that interaction between the publisher and the author resembles much more a B2C, if you will, business to consumer relationship, than it did in the past. That’s going to change the way you do business.

GRENIER: Exactly. All of my time in this business, the entity that we engaged with was the librarian, or the dean of the library held the wallet. In that model, it was a B2B transaction. Now it is moving towards B2C. We now need to cater to that author. Not only on the side of making it easy for them to do business with us, but also things like editorial services outside of the Western Hemisphere, providing service to those authors who want to get published in English-speaking journals.

KENNEALLY: Tell me a bit more what you mean by editorial services in that context.

GRENIER: The easy one is just copyediting, helping them before the peer review process, shepherding their articles towards journals that fit their current level in their publishing career.

KENNEALLY: But authors here in London, or back home in Boston, or wherever they may be, they’re going to be looking for additional services too, aren’t they? They’re going to be expecting more than simply, again, that simple process of submission, peer review, acceptance.

GRENIER: Right. The big one, obviously, is rapid publication. I think that rapid publication, as well as getting feedback from the publishing community as to how well their article has been received, getting help from the publisher to promote that article – publishers providing marketing services at that micro-level. Before, we marketed at the journal level, or the program level, even. Now, it’s helping that author promote their own brand.

I think Ed might have mentioned blogs. Go to any university website, and there generally are superstar scientists at different universities that might have a blog. I think helping authors, giving them a platform and an opportunity to blog about their area of expertise, and promising that you’ll shepherd that blog, curate that blog over time, is another opportunity for all of us.

KENNEALLY: Fascinating. Ed Pentz, you were talking before about trust and the role of technology in all of this. It’s about capturing the right information and being able to assess it, isn’t it? Again, back to identifiers, can you talk about how well that’s being done today? You’ve been with CrossRef from day one. You were employee number one. In thinking about the future, every once in a while we have to stop and reflect on the past. How good a job has publishing done in those 14 years since you began working at CrossRef in creating identifiers that do the job that’s expected of them?

PENTZ: I think that I have a bias, having been at CrossRef all that time, but I think the industry did a good thing with setting up CrossRef. They came together and collaborated. Different organizations, society publishers, nonprofit publishers, commercial publishers got together and had a clear idea of what they wanted to do, and it was focused around services. They didn’t just say, oh, we want to identify content. It was reference linking. Journals were going online, and publishers were signing bilateral linking agreements, trying to link with URLs, things were breaking, and basically there was this idea that we want to make reference linking more efficient. Adopting the DOI system solved that problem.

I think that’s a real big issue with identifiers is not to just think, oh, well we need identifiers. It’s to think about the problem you’re trying to solve. So I think that’s been done. There’s still a lot to do. But we have 65 million DOIs now – journal articles, conference proceedings, book chapters, lots of stuff. That’s enabled us now to move on to other things. We’ve also with ORCID, with the Open Researcher and Contributor ID, tried to do the same thing for authors. The ORCID registry launched just over a year ago. Actually, just yesterday, the ORCID registry passed 500,000 researchers who’ve claimed an ID, so they’re now uniquely identified. That’s going to become even more and more important, especially now within the global environment.

KENNEALLY: Let’s put that in perspective. I think I saw a number that there’s something like 1.5 million researchers in the world, so you’re doing pretty well if you’ve got half a million.

PENTZ: I’ve heard numbers higher than that. If you look globally, it’s millions of researchers. So it’s a good start, and it links in with CrossRef, it links in with services like Elsevier Scopus, so it’s going to help tie all this together. But what’s interesting is that even though I think scholarly publishers, actually unlike the music industry or the newspaper industry, made a successful transition – kept business models intact, big deals, all this kind of stuff – journals are surprisingly unchanged in many ways. They’re fully online and they’re being successful. But even though you’ve achieved that, it just never stops. Now there’s the next wave. You just can’t rest. Publishers keep having to innovate.

KENNEALLY: That seems to be the dilemma of technology, is just the pace of change. You raised it, there are often solutions that are developed for which there are no problems. It seems to be a kind of a quandary for many people, the overwhelming volume of technology and applications – choosing which ones are appropriate for their business, knowing where to go, which direction to go, is really difficult.

PENTZ: It’s difficult. Getting back to the standards question, the good thing about standards is that there’s so many to choose from. (laughter) I think standards can be helpful, but standards can also definitely be misused, and often I think people come up with a standard and then try to fit it to a problem, or something like that. Whereas I think what the scholarly publishers can look to what happens on the Internet, where there’s much more of a bottom-up type standardization, that something doesn’t get standardized until it’s actually being used. That presents its own problems – once you’re doing something a certain way, it’s hard to change it.
I think for scholarly publishers, standards are important, but I think it has to be done carefully.

For instance, NISO in the US has developed what they call recommended practices, so it’s a way to very quickly agree on simple best practices without going all the way to, say, a formal standard. And then after something gets implemented, after it’s being used, then you can look at actually creating a standard.

Just an example of that – the DOI system is an ISO standard now, but it had been up and running for almost 10 years before it became an ISO standard. Whereas ISTC, the International Standard Text Code, I think was more created as a standard in a more abstract way, rather than based on being used. I don’t really think it’s being used much. That’s just an example where they created a standard, and now they’re trying to – there were particular problems it had to solve, but anyway, not to get in the weeds too much.

KENNEALLY: We will talk more this afternoon about open access as a challenge for publishing specifically, but CrossRef and NISO and others are working on a kind of standard for open access metadata. Can you talk about what are going to be the key ingredients for those standards? What are the things you’re going to think about?

PENTZ: I’m on the NISO Open Access Metadata and Indicators group – can’t remember the acronym. A bit of a mouthful. The basic idea there, very simple, was to try to look at a way to identify the status of an article. One of the use cases is you’ve got hybrid journals, a journal where some of the articles are open access and some aren’t, and how do you identify those? The group had some discussions. The draft recommendation is out for review right now, so check the NISO website. Comments can be made until February 4.

Basically, we came up with a very, very practical solution, two pieces of extra metadata that will help. One is what we call the free to read tag. A publisher could identify and it could be dated, so it could be free to read for a period of time and then not free to read. That’s just a very simple indicator to say that anybody on the Internet could get to that full text and read it. It doesn’t say anything about any reuse rights or extra licensing. It’s just a very clear thing. Search engines could use that to direct users to that.

And then in addition, there’s a tag called a license reference tag. It’s a link to a license, which then gives the user the ability to find out more information. So it may say free to read and then there’d be a license, but the user can go and find out more about the status of that content. So it’ll be interesting to see how that might tie with things like the Copyright Hub, but it’s a way for a publisher to make a statement and provide this extra information.

A key aspect is that if the license URI, the URL, is recognized, then that could be something that could be acted on. For instance, in a hybrid journal, if a journal article has a CC-BY license, that’s a statement by the publisher. A machine reading that could actually then say, hey, this is open access, and it could be dated. It may be the open access applies a year after publication for an embargo period, something like that. So with just a couple pieces of metadata, we think it’s a way to actually address a lot of the problems.

KENNEALLY: That seems really important, because one of the problems with open access is what do you mean by open access? There, you’re helping to define that, to give an answer.

PENTZ: Right. We consciously said there’s no way to define open access, because the definitions vary depending on who you are. But again, getting back to this issue of providing information to allow people to assess content, if there’s a statement about the license, if there’s a free to read tag, some of this other metadata, then an organization – say, a research funder – can look at this data and say, does this meet my criteria for open access? That’s the plan. People could use this information. Rather than trying to define what open access is, we’re just saying, hey, here’s a set of metadata. You can make the decision for your own needs about it.

KENNEALLY: Roy Kaufman, in your work with CrossRef and on the STM copyright committee and at Copyright Clearance Center, you’re obviously addressing the needs of a variety of communities. I wonder if you could talk about the challenge of meeting the needs of such a varied group of people and whether technology can help to address that.

KAUFMAN: If technology can’t help to address it, nothing will. So we’ll start there, with the easy answer. It’s interesting. We’re going from in STM – and journals, I don’t want to exclude humanities journals from that either, and maybe even now for open access books – you go from a world where you start with let’s make it easy, and then you have to make it desirable. I think that’s a lot of what Gerry was talking about with the added services and promotion and things like that.

I get involved a lot, both when I was Wiley and now at CCC, in government and policy issues. You’ve got the Copyright Hub in the UK. Now the US is thinking, do we want a copyright hub, and what will it have? It’s so easy for me to say, well, in STM, we have DOIs. We have CrossRef. We have all of these things. We have standards that are generally accepted.

The other thing about standards – it’s great to have standards, it’s great to choose amongst them, and then the other thing is to apply them in a standard way, which is a whole other problem. You’ll all have the same standard. It’s like using the same word meaning something completely the opposite. Like in the UK, if you table something, it means you’re going to talk about it. In the US, if you table something, it means you’re not going to talk about it. So you can use your standard a little bit in a funny way.

Music, which is not an industry, but many industries – if you think of it from the licensing side, you’ve got performance rights, you’ve got sync rights, you’ve got all these things which I’m not going to get into – but music is actually fairly far along. They have a lot of standards. I think their problem is they’ve got so many to choose from, they don’t know which ones they want to use necessarily yet.

Is there anyone here in this room – I know there are people whose houses do trade publishing. Is there anyone who’s purely a trade publisher in this room? Start having the same kind of conversation with a trade publisher. It’s very, very hard. They’re still thinking of the book as a book, whereas in STM, five years ago, they were thinking of a book as a collection of chapters, and now they might be thinking of a book as a collection of chapters, figures, and words that correlate as a machine reads them in that way. The further you get from STM, but still within publishing, and we are all lumped together as publishers, the less adoption of standards you start to see, the further there is to go.

Outside of publishing, there are some standards developing, for example around images, the PLUS Registry. But I have to tell you, I go to one country, they’re like, we love the PLUS Registry, and then everyone in another country is like, no, no, we don’t like the PLUS Registry. It’s very hard. I do think that the journals business was kind of early to the game. They had CrossRef. They’ve always been international. For all the struggles we may have internally around standards and defining them and implementing new ones, we are generally pretty far ahead of almost every other content industry.

KENNEALLY: When it comes to addressing those needs of the communities, I guess what I hear you say there is that yes, technology has to be the beginning of the answer, but there’s still a role for what I’ll call diplomacy. This gets back to the point Ed raised about the communication challenge. Persuasion, gentle persuasion, is going to be important.

KAUFMAN: To have a standard work, you have to begin with the proposition that we’re all in this together. I think in STM, there’s a long tradition of that. Sorry, STM and humanities and journals, there’s a long tradition of that. In a lot of other media and industries, there isn’t a tradition of that at all. It’s we’re all after each other. Obviously, everyone competes. Within the journal space, you’re competing for dollars, you’re competing for authors. But there’s the precompetitive part, where you say, this is good for all of us if we want to survive the next major disruption. I often think science publishing disrupted itself so many times in migrating online, and that’s why it’s been successful, as a whole, getting where it is.

But yeah, you have to have that subtle persuasion, and I actually think that’s sort of implicit in what Ed was talking about, this notion of you start with something less than a standard. Because once you call something a standard, it’s like calling it a law. All of a sudden, things that people would agree on, that people would adopt, all of a sudden, the positions get, oh, we can’t have this standard, because that as a standard is scary. Whereas if you come up with something that’s a little bit less than a standard, it becomes a de facto standard, people adopt it, and then you move into that. I think Ed’s example is an excellent one.

When you start from the ground up, and this is going to happen in alt-metrics as well – you can’t top-down say this is what you should look at. From now on, all faculty trying to get promoted to tenure, you should look at how many downloads of their article, or how many blog posts there are about the article. It’s going to have to come from the bottom up, and then the publishers are going to have to create a space to allow this develop and then kind of recognize what it is when that time comes, and then try to move it the next step. It’s not easy.

KENNEALLY: No, absolutely not. Gerry Grenier, the point that Roy just raised for the trade publishing world, where they’re still thinking about the book, the journal publishing world has moved beyond just thinking about the article, hasn’t it? It’s really now about publishing information. That’s itself a real technology challenge.

GRENIER: Yeah, I think there are great opportunities that lie before us that go beyond the article. One technology advance that occurred over the past 20 years is just the low cost of sensors. This is now a data-intensive world because of the proliferation of sensors. We’re collecting, as a civilization, so much more data. So we’re seeing within the IEEE, IEEE members coming to us and saying, hey, look, people at Verizon, for example – we’re collecting data on the pathways of mobile telephone signals through skyscrapers in a metro area, and we have this six-terabyte database with this information. We’d like to see someone curate that or steward that for the future.

KENNEALLY: They went to you instead of the NSA for that?

GRENIER: (laughter) Yeah, right. Right, right. I think that there’s some great opportunities. There are pharmaceutical companies that have been collecting lots of data over the past couple of decades. So I think that there’s much, much more of that, and people will begin to recognize that someone needs to begin to curate and steward that data, and I think it’s a great opportunity for us to go out there. I don’t know how much money there is in that. I think the business model for that will be interesting.

At least at the IEEE, we see ourselves in a sweet spot. Our mission is to further the science of electrical and computer engineering. The quandary that we’re in, in an open access world, if our revenues drop, then it gets interesting, and how do we fund these other services? But I think that overall, there will be some great opportunities for us in the not-for-profit world, as well as the for-profit world.

KENNEALLY: Ed Pentz, I think CrossRef has a text and data mining initiative, isn’t that right?

PENTZ: Yeah, it’s called Prospect. We ran a pilot last year, and it’s just about to roll out. That’s sort of again, getting back to the machine, publishers needing to provide extra services, researchers being much more interested in different types of data. However, I would say that part of the issue with text and data mining is because I would agree with Gerry that there’s great opportunity, but I think publishers are a long way from taking advantage of it. PDF dominates, and PDF is like the horseless carriage to the automobile, although it serves its purpose, researchers tend to like it – it serves its purpose, it’s great. So I think it’ll be interesting to see what the transition is. I think there’ll be increasing pressure, and maybe it’s a generational thing, as science in some ways can change very slowly.

In some ways, the need for text and data mining is because when a researcher does their experiments, they do their research, it gets distilled down into an article, it goes through the publisher process, and basically all this extra information, say the data behind a graph or something like that, it’s all stripped away and it winds up in this PDF. And then the researchers want to text and data mine it to then get all this information almost back out. So I think it’ll be interesting to see over the next few years.

Again, the challenge for publishers is collecting more of that information, having more of that information come through the publishing process, so that you may wind up with a PDF, but there could be lots of extra metadata behind it. Certainly online, publishers are doing it in their online formats, with lots of semantic tagging and semantic indexing. The challenge will be do you do it post-publication, or do you try to do it pre-publication? There’s good and bad in each way.

But I think a huge aspect is some of this has to come from the authors themselves. They’re really just getting their paper published. Asking them to do even more, I think is a real challenge, especially when you’re focused on the author as having the wallet. I think that’s going to be a real challenge over the next few years, about how publishers manage that.

KENNEALLY: With that, we’re going to close out this particular discussion. I want to thank our panelists. Gerry Grenier, he’s the senior director of publishing technologies for the Institute of Electrical and Electronic Engineers. Ed Pentz, executive director of CrossRef. And my colleague Roy Kaufman, managing director of new ventures for Copyright Clearance Center. Thank you all on the panel, and thank you for your attention.

Velocity of Content Categories