Transcript: On Double Duty: Open Standards & Open Access

with:

David Baker, Co-founder/Executive Director , CASRAI
Anna Clements, Head of Research Data & Information Services, Univ. of St Andrews
Jennifer Goodrich, Director, Product Management , CCC

KENNEALLY: Welcome, everyone, to this particular program that we call On Double Duty: How Open Standards Enable Open Access. It is part of an ongoing series of Webinars and other presentations that the nonprofit Copyright Clearance Center has produced over the last couple of years, helping to inform publishers, institutions, funders and others involved in developing open access publishing about the issues that they face, and providing them with some of the answers they’re going to need to put their own solutions in place.

Over the next 45 minutes or so, we will take a look at this particular two-headed challenge, as we call it, and we will help you understand better where open standards can fit into the process of – or the workflow, I should say, of open access publishing. And at this point, I’d like to welcome to the program David Baker from Ottawa, Canada. David, welcome.

BAKER: Thanks. Welcome, everybody.

KENNEALLY: Yes, indeed. Well, David, we’re very happy you can join us today, and we’ll introduce you in a sort of formal way. David Baker is cofounder and executive director of CASRAI, which is an abbreviation for Consortia Advancing Standards in Research Administration Information.

Based in Ottawa, Canada, David Baker has over 20 years of experience in research administration and management, which has included design and implementation of grants management databases and software systems as well as senior advisory services to national and regional governments and foundation funding organizations. The nonprofit CASRAI is dedicated to reducing administrative burden on researchers and improving business intelligence capacity of research institutions and funders.

And really, that’s a mouthful, David, and very much a challenge. And I know that you like to think of the organization, CASRAI itself, as a kind of a community organizer. You’re trying to bring together this community of various stakeholders to rally around some of the important issues in open access publishing. And you make the point that, while CASRAI is doing this, it’s a fairly thin organizational layer. You really turn to the members for input and for work.

BAKER: That’s right. When you’re trying to solve any kind of problems where coordination is key and agreement is key, we think it’s really important to keep the scope of that agreement as thin as possible to allow progress to be made.

The thicker you make it, the harder it is. And you have greater overhead costs, and also you put some greater constraints on the latitude of individual organizations to function within that agreement. And so it’s kind of burnt into our DNA to try to agree on as least as we can and not boil the ocean or overload that agreement with forcing everyone to do too many of the same things.

KENNEALLY: Well, that’s an important message there. And I think the other message, at a very high level, that you have is that this is an audience of many constituencies. And there’s a tendency in publishing, probably really across the board in any kind of organization, to hear the words data and standards and think that that’s a responsibility of IT. But in your view, there are other parts of the organization that really need to hear data and information and think it’s about them.

BAKER: Yeah. IT is obviously a key player, but they can’t be the only player in something like this. I’m sure any folks from an IT background would say that their solutions, the solutions they implement, are only as good as the business requirements and the clarity that come in to those solutions.

And so although IT is a key stakeholder, we’ve got to make sure we include business and policy people, the people who actually use the information that IT is trying to safeguard and move around between us. They need to have a path in. And if the conversation around interoperability is limited at only the technology layer, they’re not at the table.

And so that was one of the key founding principles in CASRAI, is we’re not really a technology organization. We’re a policy agreement organization. And those agreements can then get unambiguously, we hope, implemented in various technologies, and we think that’s an important thing.

You’re mentioning data. Whenever someone in policy or the business area, the programs area within an organization hears data, depending on the person’s background, they’re either going to say, oh, that’s an IT problem and not my problem, but I would argue that it’s the opposite. The lifeblood of everything that you need to try to do at the policy or business or program level rests upon information, good information, good data. And so you need to have direct control of the business or policy intent of the data. Let the IT folks be in direct control of the technology aspects of the data, but it needs to be in tandem.

And you mentioned standards also. Standards is a tricky word. Each person can define it in a different way. There are formal standards, there are informal standards, there are de facto standards. And they all are important. For example, Microsoft Word could be seen as a standard because a lot of people use it, but it’s not an open standard. It’s a de facto standard. It’s like Adobe Acrobat or PDFs. It’s become a “standard” because of de facto purposes. It’s got wide use, so let’s just continue that momentum.

But at a more open standards level, you then have other subcategories of formal, technical standards, organizations like NISO, ISO, organizations like that. But there’s also – standards, sometimes people get hung up on the word. And to us, it’s a relatively simple word. A standard is a neutrally governed expression of agreement among experts that is adopted by enough of us to matter. And so the adopted – and the format then becomes more of a question of is it a formal technical standard that’s going to need a lot of other activities placed around it to ensure robustness and good adoption.

But when it’s at the policy and the agreement level, it’s different people need to be at the table, and it’s a different conversation. If you try to bring technology to that table, you will lose the input of the policy owners, so you need to be able to have clear agreement at that level so that there can be clear agreement passed down to technical implementers. So we have a broader view of standards.

KENNEALLY: Right. I really appreciate that, David, and I think our audience will too. First of all, we are very happy that we do have that spectrum of people listening to you right now, so clearly that message is getting through. And I have to repeat what you just said was your definition of standards, because it’s going to become my definition of standards moving forward. An expression of agreement adopted by enough of us to matter. And really, I think that puts it as clearly as you could, and so I thank you for that.

So we will continue this discussion about standards right now. And we’ll take a look at the particular challenge that CASRAI is looking to address. And we spoke about two-headed challenges here, and this is our very schematic way of presenting that two-headed dragon, David. But your notion is that you can solve two problems by attacking the body itself. And rather than each of the heads, you go right for the body. And that is this concern around information flow.

BAKER: Yeah, exactly. Often, when you’re feeling the problem at the administrative burden level, which I’ll discuss in a moment, or at the, I’ll just generally call it the evaluation level, you think you’re dealing with the problem at the root, but there’s actually – we feel that the common root of both those problems is this friction, this lack of the ability to flow information within organizations or between organizations.

When you’re dealing with information – and, for me, I don’t get hung up on the definitions of is it information, is it data, is it metadata, is it X, Y or Z. To us, it’s all information. And at the base of information is some form of data. We broadly define data to also include text. That’s data. It’s stuff that a computer can receive and, hopefully, either store or do something with. We don’t just see it as columns and rows. It’s data that makes up information. And all of our processes.

We tend not to also think in terms of is it a repository, is it a CRIS, is it an HR (sp?)? To us, all information rests upon databases. That database might be primarily designed for the purpose of a repository. It might be primarily designed for the purpose of HR or finance. But ultimately, we think it’s helpful to think more along the lines of they’re all databases. It’s databases all the way down, using the turtle analogy from other conversations.

So if it’s data all the way down and if each of our organizations has more than one of these databases, but these databases are all referring to similar things, people, organizations, products, projects, funding, that the concepts that are being managed inside those data have a very high level of commonality, but the actual implementation of the concepts are a legacy of whatever closed process of developing those things between business users and IT folks within each organization.

So the separations, the friction is not something that’s on purpose. It’s just there’s never been a really good sustainable way of capturing those agreements earlier on in the process. The later you go in the process, you now go into a mapping exercise. Oh, yeah, technology. Oh, yeah, we can map from this system to that system. That might be OK if there’s a few systems. But when you’re getting into a global view of being able to share information, bilateral mappings is not sustainable. It’s very expensive and brittle.

So just the lack of information flow. In the case of the administrative burden, to us, that primarily manifests itself in massive duplication, retyping of information for either a funder or for a partner. You already know the information exists in your database, and you want to get it to their database, but you’ve got to retype it because, even if you can export it, they can’t really do anything with it. So that creates a lot of burden because of a lack of information flow and reuse.

And then when it comes to evaluation, as a funder, for example, all the data that you need to be able to assess your open access policies to monitor compliance and things like that does not exist inside your borders. It exists in a kind of a fragmented, shotgun way across many, many sources or databases around the world, in your country or around the world. And so the fact that you can’t get to that without, back to the other head of the dragon, without forcing people to retype information in annual or end-of-research reports is also creating lots of data, but you can’t do anything with it without, again, forcing more administrative burden.

So we feel that these kinds of standards that start at the policy level and then could be implemented at the technology level, that allow information to be flowing, it’s not about open flow. It’s about controlled flow. Some information should remain private, obviously. But where it is shared, even privately, both parties need to know what’s coming.

KENNEALLY: Well, indeed. And to know what’s coming, I think one other way that you describe what CASRAI is up to is to publish a dictionary of shared common language as far as research management goes. Is that right?

BAKER: That’s right. For us, we need to have a, I’m going to say, business or policy user, depending on your background. When I say business, I don’t mean industry. I mean the business of each organization is not about its technology. It’s about what actually it’s doing as an organization. So if the businesspeople can understand the concepts, the simple data relationships that are needed for them to be able to do business process one or business process two, they can’t be looking at a relational database schema or a technical ontology. They need to be able to understand and be looking at something in their own terms. And for us, that instrument is the online dictionary.

It’s not complex. It’s not much different from any other dictionary. There’s a bunch of terms stored alphabetically, and there’s some definitions about those terms, and also some object and attribute definitions at a very high level that allow people to understand, well, a CV is a thing, but what is actually included in a CV? A financial report, OK, we understand what that means, but what kind of information elements might be included in a complete financial report? We’re not even talking technology yet, just basically a recipe list of things that you need.

And so the dictionary has to have a business view or a policy view for the people who have to agree that it’s a valid expression of their agreement, but then it needs to have a technical view. And Anna, my co-speaker, will speak a bit about one of the standards, the CERIF standard, which is more of a technical representation of such agreements.

And there’s others as well, other modes of expressing that agreement, including things like the VIVO integrated semantics framework. So to us, those are more technology implementations of agreements reached by policy folks. So the dictionary has a business view and a technology view.

KENNEALLY: And though you are describing it in fairly simple terms, as this slide indicates, this is a pretty complex system that we’re speaking about here when it comes to research and the publication of research. And so, again, the earlier in the process, the earlier in the development of systems that we reach these definitions, the better off we are.

BAKER: I think so. But in order to do that – coming back to my definition of standards, that expression needs to be a neutral expression. It shouldn’t just be expressed in one particular vendor’s product. It shouldn’t be expressed in just one particular database installation. It should be expressed at some kind of community agreement layer, where a smaller group of subject experts can tackle the problem, expose it to a wider group for review and an even wider group for review from subject experts, not technology experts, the subject experts, so that they can say yeah, OK, that reflects what I’m after.

And if you look on this slide, this is still not complete in terms it’s just what can fit on one slide. But even though, at one layer, this looks very complex, I think future slides try to break it down that financial reporting still concerns information about people or projects or organizations, but so does impacts and social and economic and cultural impacts. And we’re reusing a lot of the core entities that are being reflected in our local databases, but we don’t always have a mechanism for agreeing on what we mean. And without that agreement, we can’t mix and match and combine and aggregate stuff across our borders to make sense of it.

KENNEALLY: Right. And indeed, and it’s that active aggregating that’s going to really make the data much more powerful than when it stands alone. David Baker, we are chatting with you today about what we are calling Double Duty: How Open Standards Enable Open Access. And David Baker joins us from Ottawa, Canada as executive director of CASRAI.

And as you pointed out, David Baker, different organizations use information in different ways. And this slide is an attempt to see that all in a single view. And clearly a lot is, as you say, left out. But it really does, I think, show that what we are seeing in an open access publishing environment is that there are many different agents who are interacting. We’ve got funding agencies, as we see on the screen there, institutions, government, as well as universities, and publishers and indeed authors are a part of this as well.

BAKER: Yes. What this slide tries to do is try to show a simplified view and, hopefully, an aspirational view, where if what’s common – like at funding agencies, in order to properly manage the information that’s coming in to them and going out, they need to have policy and business and program folks sitting together with IT folks to, as I say here, have mechanisms for uniquely identifying things. Even if they’re just talking about uniquely identifying their own local system, like personal PIN numbers and things like that, they still have to do it. Otherwise, it’ll be a mess.

They need to define things, some form of data dictionary that would probably be some kind of a local spreadsheet where, in order for all the people around the table just at that one funding agency, we’re all agreeing that this is the list of things and the definition for the things. Classifying things, whether it’s simple classifications or complex classifications along taxonomies that allow a funding agency to understand, over time, how its portfolio is classified. Is this forestry or is this diabetes? Is this Alzheimer’s? How do we classify that around research domains and application?

And then grouping. If anyone on the call has ever experienced any kind of an online Web-based funding application for a funder, all the pages, the 12 or 15 or more pages, just contain a whole bunch of information, but there are natural groupings of things. When you’re working around a CV information in an application, there’ll be a subset of pages that group together all the things that tend to be useful to know for that part of the information puzzle. And then when you get down to more financial and collaborative efforts, there’ll be another sub-grouping of information. And that kind of grouping tends to have a high level of commonality as well. It’s just by accident that when you’re talking about a person’s history, you’re going to talk about their outputs, their educational background and their employment history without really having to find out if that was the right thing to talk about.

And institutions need to do the same, but they’re doing it in silos, not for any reason other than that they have to get a system out there that’s purpose-built. And of course governments, either the governments of which the funding agencies are extensions or the higher level departments that need to look at general workforce productivity, they are also trying to think about people, orgs, projects, equipment, and they need to define them, identify them, classify.

And so it ends up being a lot of silos. And the trick is trying to find a sustainable way of integrating those efforts so that we can remove as many silos as we can, without disrupting the regular process that those subgroups have to do for their own mandates.

KENNEALLY: Right. Well, as you say, this tendency to work in silos isn’t necessarily nefarious. It’s just the way things have been done in the past. But one of the aspects of the Web, which is really another word for a network, is that it brings us all together. And you mentioned the information puzzle, so we’ve got that all put together here. And clearly, you could speak to it at a high level. There are real benefits from having standards that enable everyone to work together in this – I should say across this workflow.

BAKER: Yeah, agree. And again, it comes back, it’s important to properly define the word standards. Technical standards play a role. But these groups, how do they reach those agreements that are referred to as our definition of standards in a sustainable way? Then they can step away from each other again.

The whole point, if all of these sectors on this slide were forced to share a common software portal in order to do this, then that doesn’t relieve stress on their own internal database systems. They have to invest in those ones, but now they have a third-party tool that they also have to be investing in, which is not cheap. And that comes back again to the, what is the thinnest layer of agreement and coordination?

If you’re trying to agree on basic definitions for terms, but also which technology you all have to adopt, also which database vendor, it keeps building up that stack into a cumbersome – a fatter thing that we have to agree on, whereas perhaps all we needed to agree on was the lower level. What are the terms, and what kind of information – which of those terms need to be encompassed in a particular piece of information in order for a business process between us to be useful?

And then that’s it. Step away at that point and allow local implementations and decisions about vendors, etc., etc. to work according to things like product quality, pricing and things like that, things that are ancillary to this but not as important at this level of agreement.

KENNEALLY: Right. And David Baker, you’ve got a really interesting analogy that is about as far away from open access publishing as I can imagine, and it’s involving how cargo was transported, was moved around the world in the time before 1960. And those were the days when the longshoremen really were the ones who – it was on their backs that this cargo got moved around. Tell us about that situation, and then we’ll get into how this is analogous to where we are today in open access publishing.

BAKER: Yeah, sure. I should say that I didn’t come up with this analogy. I lifted this analogy from another group that was trying to tackle a similar problem, but at very much of a technology level. But it’s important to have these kinds of analogies when people are trying to understand something new. What CASRAI is trying to do is not form a new technical, formal standards development organization. It’s trying to find another layer of agreement that can enable a lot of other things and other problems to be solved.

And so we need sometimes to have an analogy, because one of our primary audiences are executives at the business layer. And they want to understand, what is this, what are we trying to do? And so this kind of analogy of cargo transport is important. As you’ve got on the slide here, we’ve got a multiplicity of goods in that case, which we think there’s a pretty clear corollary to not just open access, but to the entire information flow underlying the research and scholarship enterprise. Lots of different information, lots of different things that we need to talk about.

And there’s a multiplicity of – the corollary in the transport is it’s your trains, buses, boats, etc., but in our world, it’s different databases. It’s different systems that have to take that cargo, if you will, and make sense of it and move it from point A to point B and, in some cases, aggregate point A and point B into a new view.

And in the case of cargo, obviously there’s reasons for physical products, that coffee beans and spices shouldn’t be stored together, but there’s other reasons, like privacy, open things, controlled things, things that can be open, things that can be closed, that they need to be kept separate. And then smoothly transporting it with the other things, that’s kind of – there’s a point at which you can overstress this analogy, but it seemed to ring a bell for me when I saw it.

KENNEALLY: Well, it works for me. And I appreciate the fact that you acknowledge your source for this information. I always tell people I get my ideas in two places, David. I think of them and I steal them. So I think this is an approach that others will recognize in their own work.

But what happened in the days before 1960 was there was a longshoreman with a canvas bag and a hook, and that’s how things got around. And then something was developed, which is so common today as to imagine that it was always there, which is the intermodal shipping container.

BAKER: That’s right. It was nothing short of a revolution. And I don’t have the actual numbers in front of me, but it went from a kind of a $5.60 per unit to move things down to pennies per unit to move things. And so obviously, when you spread that around a global landscape, that’s a revolution waiting to happen.

And it took a while getting off the ground. It was a new thing again. There’s corollaries there as well. People were skeptical. People were doubtful. They said, well, that’s not the way we’ve always done it. Well, that’s not a reason not to do something. It’s a reason to understand what’s needed to get from A to B, but it’s not a reason not to try to improve things. But it was a massive revolution in that particular market, if you will. And we think it has similar potential if we can apply properly the same kind of pattern to the problems we face.

KENNEALLY: Right. And indeed. And while we are stressing in our presentation today the particular challenge of open access, I don’t want to let it go unsaid that what CASRAI’s approach is about is useful for open access, but for a variety of problems confronting publishing today. But in particular, to keep up with that container analogy, open standards in your view and CASRAI’s view become the shipping container for research administration.

So how does that work? I understand there will be something developed called a CASRAI profile. That is the container.

BAKER: Yes. Well, the profile is part contents and part container. A profile is a kind of a first-class citizen in the dictionary. Obviously, anyone going to the dictionary after this call will find profiles missing. But we’re just in the process, just bad timing, of moving our dictionary from one platform to another, and the profiles are being combined and brought back into the system over the next 10 days or so.

But again, the idea is you’re going to have thousands of terms in your world, in your research and scholarship enterprise world, and knowing all thousand terms is not important. What you need to be able to do is identify subsets that are needed for this step in the process. So what do we need to ask for from a database, which elements do we need in order to monitor an open access policy from a funder’s perspective? Or what do we need if two institutions want to ask a question of each other’s databases?

So you’re going to, in some case, get a response that you need two or three information elements or, another case, you need 20 or 100 or something. But we need to have a way of defining what those three, 20 or 120 things are. For example, a CV, for us, is a profile. It’s a subset of information about a person and outputs and organizations that focus on a person, but it leaves a lot of things out from the thousands of terms.

And so, for us, a CV is not this PDF or Microsoft Word document. It’s a concept of a stack of information that, if we can agree on what’s included in an abridged CV, in a full CV, in a dean’s report, whatever the information about a person is needed, in the CASRAI parlance we define that as a profile, so it’s unambiguous, so an institution can say, what do you need to know about a CV? Well, we adopt this particular standard profile. OK, great, thanks, we know what you need, it contains pretty much everything we need. So there’s just a basic way for businesspeople to agree. And to us, that’s a profile.

So in a sense, it’s like the container in the sense that it defines some boundaries around what’s in and what’s out. And for what’s in, what does each term mean? And is that an attribute, is it an object? What are we talking about here? And then there’s the technology part, coming back where I said the dictionary has to have a business view and a technology view.

For example, taking CERIF as an example, as again, I think Anna will speak a bit to CERIF, the technical outline or structure of that container could be expressed as a CERIF XML file, but it could also be expressed as an RDF file. The fact that it has basic dimensions around a profile of information that we need is part of the problem. But even if you have agreement by everybody around what’s in an abridged CV or what’s in an open access repository report, it doesn’t mean you all have to use the same technology to move the container around. And so again, we might be pushing the analogy a bit, but that’s the basic idea.

KENNEALLY: Right. And just to jump to a particular OA, open access use case, and really I think this is where we’re getting at the heart of the matter, it’s around reporting and compliance and so forth. So funders really care about getting the right kind of information so that they know their dollars or pounds or euros are being spent in the ways that they expect them to be. And so what CASRAI is about is really ensuring that these reports are standard across publishing, across the various funders and so forth.

BAKER: That’s right. If you can imagine a funder sitting in Germany and they know that, in order to answer their question on open access compliance in terms of things they funded, they’ve got to send their multi-handled crane out and pick up a number of containers from various spaces. And obviously that’s hard to do in the physical world, but it doesn’t have to be hard to do if the organization that’s hosting the crane, in this particular case the German funder, all the multiple nodes, if you will, that it needs to reach out to with the hook have also had a mechanism for agreeing on what do we mean by the minimum information you need to comply. And we were able to locally implement it so that, if you do want to make a request and we’ve decided to make that public, you know where to go and you know what you’re getting.

It’s kind of like – depending on the technical background of the folks on the call, it’s essentially trying to apply a standard API approach within the research enterprise, whether that’s an API connecting to publishers or repositories or connecting the funders themselves. People also have questions, information questions of funders, that maybe machines could ask instead of people having to visit Websites and type in queries.

KENNEALLY: OK. Well, David Baker, the executive director of CASRAI in Ottawa, Canada, thank you for that. Please stand by. And we’ll bring in to our program now someone you have mentioned, someone we know well here at Copyright Clearance Center, coming to us today from the University of St. Andrews in Scotland. Anna Clements, welcome to the program.

CLEMENTS: Hi, Chris, and thanks very much for welcoming me.

KENNEALLY: Well, indeed, we’re very happy you can join us. At the University of St. Andrews, Anna Clements takes up the post of assistant director of digital research for the university library next week, in April. Her new post covers open access, research data management, digital humanities and bibliometrics. Previously, she was head of research data and information services. St. Andrews is Scotland’s first university and the third oldest in the English-speaking world, founded in 1413.

Anna is a tireless advocate of the basic management information principles of entering data once, something that I think we can all identify with, and using open standards to improve data quality. She chairs the CASRAI UK data management planning working group. And again, welcome, Anna Clements.

And I think you have a pretty simple message here, one that, because of the scientific nature of this particular challenge, we’ve gathered together some of the elements on the periodic table to spell it out. And that is harmony. You really are about harmony. So maybe at the end of the program here, we can bring David back and we can all try to harmonize together. But what do you mean about harmony when it comes to open access? And how do the kinds of open standards that David is speaking to help us all sing really more beautifully together?

CLEMENTS: Well, I think David’s hit the nail on the head in ensuring that the right people are around the table to discuss these. And I think, effectively, if we start at the wrong end, at the technology end, then we’re not really going to be very harmonious at the business end.

So I think that’s one of the key areas. And also ensuring – if we’re talking about a specific case where we want to exchange information from process, so in open access it could be manuscript submission process or it could be the compliance reporting, then we need to agree between ourselves, the people who are providing the information and the people who are receiving it, precisely what we want the information for and why and how often we want the information. So it’s that kind of process which I think CASRAI is a good facilitator for.

KENNEALLY: Well, indeed. And so obviously one of the reasons we need to try to move towards harmony, and clarity, I suppose, goes with that, is that just the tremendous number of funders out there and the various policies that they have –

CLEMENTS: Absolutely. Yeah.

KENNEALLY: – obviously the multiplicity of publishers and indeed the community of authors around the world, which numbers in the millions.

CLEMENTS: I think that’s absolutely correct. If I think about the three main areas that I think would help improve the open access process, particularly the APC side of things, then I think effectively the plethora of different publisher and funder policies we have is a real issue. And I think if there’s some way we can work together to harmonize those or at least clearly define them or categorize them, that will be great, so they’re understandable both by humans and by machines, obviously the latter to help automate things.

The second thing, I think, is the workflows around publication, and particularly the APC payments and so on. And if we can work out a fairly standard workflow that people can plug into, that will be great.

And the third thing is coming more to the technology side and the definitions of precisely what we want, the metadata and various other elements that we want to exchange between ourselves, so all the stakeholders at different parts of that workflow.

So those are three areas which I think harmony – hopefully, we can achieve more harmony than we have at the moment.

KENNEALLY: Well, indeed. And that is an objective of Copyright Clearance Center as well. And Anna, you joined a roundtable discussion that Copyright Clearance Center sponsored last fall, in October, there in London to take a look at this and to hear from institutions. And one of the things that is an approach that we have taken here at Copyright Clearance Center, thinking about the challenges in the publication workflow that open access brings forward, is to hear from the various participants.

And I think it may be interesting to some of the publishers on the call today to realize that the world has changed, and the role of the institution, which you’re our representative for, is very different today than it was in the past. In a simplified way of looking at it, the library, the information flowed in a single direction. And that was from publishers and their authors, by extension, into the library. Now in the digital library, where you are now very much at the forefront of digital research, that information flow is two-way.

CLEMENTS: Absolutely. And I think it was very, very reassuring at the roundtable that I attended that there were two or three publishers there who were understanding what we were saying and understanding that the institutional, usually represented by the library, does have a role to play in these workflows. And at the moment, I think the problem is that the publisher workflows haven’t yet adapted to that. And that’s no criticism, because they’re not used to working that way. So hopefully we can work together to improve that. And obviously intermediaries like Copyright Clearance Center and so on are key to that as well.

KENNEALLY: Well, indeed. And obviously, if I can put it that way, it seems obvious now, having listened to David Baker, that the open standards that CASRAI are seeking to develop are really going to be part of that solution. And I know that your work there at St. Andrews involves you with a number of different bodies. David Baker has referenced an organization called CERIF. And this world of metadata and standards really populates the landscape with abbreviations of all kinds. Tell us what CERIF is and just its contribution to the solution for this.

CLEMENTS: Well, CERIF is actually a data model format. The organization that has stewardship over CERIF is called EuroCRIS. And although it’s based in Europe, it’s actually an international organization.

CERIF itself, it stands for Common European Research Information Format. And basically it’s a model that models the research landscape, so you’ll have entities or items in it which represent all the different things that David was mentioning earlier, like people, organizations. And organizations can be funders, institutions, publishers, etc. It has projects, activities, impact elements that you can represent, CV elements, equipment, all these things that, when you’re dealing with any kind of aspect of the research information landscape, you think, oh yes, I need information on that. So it has a huge array of entities within it.

And the very interesting thing about the model, which is one that I like so much – my background is in IT, and I moved over to the library about 12 months or so, but within IT, I was very keen on data modeling and ensuring that the models that we used were efficient and effective. And I could see that CERIF was that, because it allows you to relate these different entities together. And also, when you relate them together, you can say what the relationship is between then. So a simple thing is, obviously, if you have a funder who funds a project, you can say that. You can also indicate the timeframe as well, so you can say when the funder was funding that project and so on. Or if you take a person who is working at an institution, an author who’s working at one institution, you can indicate when they started working at that institution, when they left and so on. So it’s really rich in the way it can represent research information.

KENNEALLY: Well, indeed. And again, finally, Anna Clements at St. Andrews, the point that David Baker made as well, and I’m sure you would agree to, and that is it is about the aggregation of this data. It’s about bringing it all together in ways that can be analyzed and used in reporting and other second stage. It’s not just gathering the data. It’s using it and to learn from it that’s really critical.

CLEMENTS: I think it is, absolutely. And I think David as well was absolutely correct when he talked about silos of information. And I think that happens within institutions as well as between institutions. And really, if we want to get the most effective use out of our data, then we have to find ways of breaking those silos down. It’s really a simple message.

KENNEALLY: Well, indeed. Well, thank you for that, Anna Clements.

And finally, now I want to turn to my colleague here at Copyright Clearance Center, Jen Goodrich, who is the director of product management for Copyright Clearance Center and very closely involved with developing our own RightsLink for open access solutions. So welcome, Jen.

GOODRICH: Thank you, Chris.

KENNEALLY: It’s great to have you join us here, because you’ve had a chance to not only hear what David and Anna have said today, but really, as part of your role with Copyright Clearance Center, you’ve been talking to individuals, including David and Anna, for some time now, because what we’re about is really the point that Anna was just making, which is breaking down those silos, being sure that the various participants in the publishing process, in that workflow, can work together in, and here comes that big IT word, the interoperable way. Right?

And so we are working at Copyright Clearance Center with CASRAI and with
other parties and with institutional organizations to help standardize metadata and as they relate to APCs. And so really expand a bit upon what you’ve just heard from both David and Anna and tell me, I’m sure this is stuff very familiar to you and your colleagues here.

GOODRICH: Yes, Chris. We definitely are very committed to metadata and standards and interoperability. In fact, the way our platform works, it’s really driven by metadata. So upstream, we integrate with third-party manuscript management systems, like Aries and BenchPress and ScholarOne and eJournalPress. And it’s really that metadata that’s passed to us which drives the publishers’ pricing and discount rules within our APC workflow. So really, the better the metadata, the better the author experience and the more author-centric the experience.

So we’re really trying to – we know who the author is when they arrive in our workflow. And you would see many kinds of standard metadata in the metadata bar throughout our transaction. ORCID IDs, Ringgold IDs, FundRef IDs, ISNIs, DOIs and much more.

KENNEALLY: And the other piece of this is taking on that challenge that Anna spoke to, and that is making it clear to publishers why all these things really matter. And so in those conversations that you’re having, what’s working? What’s really resonating with them?

GOODRICH: Well, the interoperability is really working on their behalf, so our integration with their vendors and the authors coming seamlessly through, from the submission of their manuscript right into our workflow, to be able to pay their APCs, to see what the charges might be.

The standards are evolving. So the more vendors like CCC are leveraging the standards, the more the downstream reporting is leveraging standards, the easier it is for the publishers to make that commitment to help their authors get ORCIDs, to make sure that their manuscript management systems are capturing the funder and grant information so that, once a transaction is complete, all of that downstream metadata is flowing back to the publisher, to the institution and to the funder, again without that re-keying.

So I think the fact that we sit in the middle and can help upstream and downstream is really resonating with publishers and the ecosystem at large.

KENNEALLY: Well, indeed. And you’ve got it there in red with an exclamation point. What this is about is an author-centric workflow. And that’s of concern not only to publishers today, obviously it would be, that’s the source of their material, but that’s of concern to institutions, like the one that Anna represents as well.

GOODRICH: Yes, absolutely. We are a new platform, and we focus very intensely on getting the workflow right. And we’re broadening that now and trying to really make sure that the institutions get both the up-front information they need to be able to help their researchers pay for the APCs out of the appropriate funds and, downstream, make it easy for them to reconcile and work with their internal departments and their third-party funders.

KENNEALLY: All right. Well, Jen Goodrich, my colleague here at Copyright Clearance Center, works very closely on the RightsLink for open access project and is director of product management here. Thanks for that.

David Baker, a question, if I can, about CASRAI. How does somebody listening to this call, who is intrigued with what you’re doing and wants to contribute to it, participate? What do they need to do to join this CASRAI community?

BAKER: (break in audio) us and express your interest in getting involved, and we’ll get going. CASRAI is a nonprofit membership-based organization. There needs to be some kind of sustainability model for any kind of initiative to work. We don’t think the approach is to survive on project-by-project grants. And so CASRAI is not an expensive organization, because it’s – again, it’s not developing software or hosting data. It’s just trying to host and convene policy agreements that get published in the dictionary. But it does have costs, and so the way we cover those costs is through membership at various tiers.

And so we’re actually – given some of the growth that we’ve been having lately, we’re coming out with a new set of membership tiers and fees to reflect that there’s more organizations involved. We’re not a for-profit organization so, as more people get involved, it can become cheaper for each organization through lower fees. So I would just say reach out to us and start a conversation.

KENNEALLY: Right. Well, thank you for that, David. And of course the place to go for all of that would be your Website. The URL is casrai.org. So do look into that if you’re on the call today and consider participating in that discussion and joining that community that David has.

Finally, if I can, a question, Anna Clements, for you there at St. Andrews. I mentioned it’s one of the oldest universities in the English-speaking world, the oldest in Scotland, founded in the 15th century. My goodness, a different world entirely. But at the university there, we’ve heard about silos and silos in the government and silos in institutions and funders and so forth and so on. I imagine you have a challenge internally with regard to silos and breaking down silos, even within your own institution. Is that true?

CLEMENTS: Well, yes, to a certain extent. But we’re also quite a small institution, so we have about 8,000, 9,000 students, 2,000 faculty. So that actually does help the conversations, because people tend to meet more people and talk to them and find out what is going on and break those silos down.

And really we have had a policy and a program of trying to, from a technical point of view, from a data management point of view on the administrative side, to break the silos down and bring information in together into a data warehouse centrally, and so on, and try and ensure that everybody who needs access to information can get access to information at the institutional level. So in some ways it’s difficult, but in other ways our size and also some of our expertise in the past have helped us to start break those silos down.

KENNEALLY: Well, maybe you can teach your colleagues at other institutions how that all works. That would be, I think, important for them

David Baker, cofounder and executive director of CASRAI, thank you for joining us.

BAKER: Thanks for having me.

KENNEALLY: Anna Clements, assistant director, digital research at the university libraries of the University of St. Andrews in Scotland, thank you indeed.

CLEMENTS: Many thanks to everybody.

KENNEALLY: And Jen Goodrich, nice to see you here at Copyright Clearance Center.

GOODRICH: Thank you. And thank you, David and Anna.

KENNEALLY: My name is Chris Kenneally, director of business development for Copyright Clearance Center. Thanks for joining us.

Transcript: On Double Duty: Open Standards & Open Access

Velocity of Content Categories