Interview with James Bryant, Co-founder & CEO, Trajectory
For podcast release Monday, December 14, 2015
KENNEALLY: With every passing day, the book business is more and more of a numbers game for authors and publishers. The numbers we’re used to tracking – bestseller lists and Amazon rankings – aren’t even the half of it. In a digital world, books are the sum of their parts, and the parts are parsed as data.
Welcome to Copyright Clearance Center’s podcast series. I’m Christopher Kenneally for Beyond the Book. The Book Industry Study Group annually confers its Industry Innovation Award for success at boldly reimagining what publishing is and can be. The winner for 2015 is Massachusetts-based Trajectory, cited for a pioneering natural language processing engine and machine-based metadata development. Cofounder and CEO Jim Bryant joins me now. And welcome to Beyond the Book, Jim.
BRYANT: Hi, Chris. It’s nice to join you on Beyond the Book.
KENNEALLY: Well, we’re happy to have you here. And it’s an interesting subject – and congratulations on the BISG award. But these phrases natural language processing, machine-based data – they weren’t thrown around in editorial meetings or publishers’ sales conferences until very recently. And behind all that IT jargon is a fairly basic drive to improve discoverability in a book world of millions of available titles. And rather than recommend a book based on what others have bought, as Amazon does, the Trajectory algorithm claims to recommend book titles based on which books a reader has previously read. And that’s a fundamental breakthrough, isn’t it?
BRYANT: You know, it’s actually really interesting. In an industry comprised of content, big data is kind of a new player, in general. And it’s really exciting to apply big data concepts to books and to be able to identify similarities between one book and another book, which had once again surprisingly never been done before.
KENNEALLY: Right. But the focus that you have is identifying books by their textual data – their contextual data, I suppose, and then that yields a kind of a profile of a personality. But before we get into that, many in the audience here are going to be scratching their heads as they hear phrases like natural language processing. Briefly, tell us, what is natural language processing?
BRYANT: Sure. So natural language processing is really nothing more than a series of algorithms that exist. We’ve developed some. There are quite a few that are out there right now that are analyzing the context of text.
In the case of books, what we’re doing is deconstructing every sentence that appears in a book, and we’re identifying parts of speech, we’re identifying people as people and places as places. We’re identifying co-occurrences that occur within a sentence. So if there is a collection of verbs or adjectives that surround a location or a character within the story with some degree of repetitive nature, we’re able to start to develop a profile of what that location or person may be, based on the words that surround that particular word.
But in general, at a very high level, some of the things we’re doing is developing a simple index – the total number of words, the total number of unique words, the total numbers of different parts of speech. We’re applying some pretty well accepted algorithms to develop measurements of complexity and the average grade level that would be necessary to read a book.
And we’re doing some really interesting comparisons to external datasets. So one of the external datasets that we’re working with is the well known and highly feared SAT database, which is a list of several thousand words that high school students are requested to master before taking that test. And we’re simply analyzing the number of words that appear within a book that appear within that database. And it’s interesting to have that as an integer to put alongside the book, because it may become an incentive for someone to buy the book if they’re trying to master those words.
KENNEALLY: It’s interesting stuff. And we’ve come across NLP – natural language processing – at Copyright Clearance Center just recently with our efforts to understand better and to work in the text and data mining space. As I understand how it works in a research laboratory, they do this text and data mining to analyze a corpus of work to find matches. So if we’re looking for certain identifiers that may lead to a cancer or to some other kind of disease, we can go through this entire corpus of all these articles and find matches, and so discover information that would be impossible to do without the machines – impossible for any individual reader to do.
In this case, discoverability lies at the heart of it, but you’re not trying to discover a cure for cancer, or, for that matter, just what are common elements in a variety of books. You’re really trying to discover the book to drive it into the hands or into the devices of readers, and that’s something very important for publishers.
BRYANT: Absolutely. And you mentioned the challenges that the publishing industry faces today as we migrate from print to digital and from local to global. Today there are, by all accounts, just a massive number of books that are available for anyone to download anywhere on the earth with just a push of a button. And because of the number of books, discovery is becoming more and more of an issue that the industry’s talking about.
And a lot of this is being driven based on the republication of backlist titles and the massive emergence of independent published works that are now generating upwards of 20% of retail sales, we’ve been told, and of course the emergence of foreign publishers, who are now distributing their titles across borders.
KENNEALLY: Right. And in the old days, when we had a physical library to go to, we might have had a few thousand books we could sort of shuffle through to look for something we want. But in that universe of literally millions of titles, discovering the next book that we want to take home with us and curl up with is very much of a challenge. And as you say, it’s not just a national sales challenge. It’s a global sales challenge. Some of the opportunities that the work you’re doing yields for publishers and for authors who are self-publishing would apply, for example, to subscription systems.
BRYANT: Absolutely. It’s really fascinating to look at the individual reading habits. And imagine if you’re a subscription service or you’re a local library, for that matter, and you know what your customer or your patron likes to read. Being able to make recommendations based on the content is something that is just very natural.
So one of the things we’re doing, when we process a book, we identify not only the statistical information that I mentioned but we identify all of the keywords, concepts and themes that are being mentioned in a book. And gathering that information together with understanding the complexity of the writing style and the actual style of writing, we’re able to make some remarkably accurate recommendations to similar books.
KENNEALLY: Now, we’re talking today with Jim Bryant. He’s the cofounder and CEO of Massachusetts-based Trajectory, the winner of this year’s Book Industry Study Group Industry Innovation Award. And at Trajectory, the kinds of profiles and personalities for books and for collections of books that you have developed – do they yield surprising insights as far as what it is that readers find interesting and intriguing, what attracts them to books, or, for that matter, what it is about a particular author’s work that is individual, is kind of like a fingerprint for that author?
BRYANT: That’s a really fascinating question. One of the things we’ve found within certain genres, like romance, is that there are style – there’s a style of writing where we can track the flow of sentiment throughout a storyline, and in fact also the flow of intensity throughout a storyline that’s applied thematically to multiple books written by the same author. So being able to visualize – imagine you’re looking at a standard chart that goes from the beginning of the book to the end of the book, and you have this wave reflecting the highs and lows of sentiment that are expressed. And it’s interesting to see those duplicated amongst successful authors.
KENNEALLY: Right. And great potential here for the marketing of books. You mentioned the potential use in libraries, and libraries are under such pressure these days to really deliver the goods, if you will. And if this kind of algorithm can help them provide books that are actually what readers are looking for, then they can justify the spending.
But in the big publishing houses, and, for that matter, in the independent publishing houses, they need the data that’s going to make a book a success, and they need it to be a success pretty much right out of the gate. And I would think that Trajectory’s algorithms would help them with that objective.
BRYANT: Actually, it is, on multiple levels. We’re working with one of the big five, and we processed about 25,000 of their e-books. And they actually now have the ability, for the first time, to be able to search those books for certain themes related to marketing activities or related to other plans. So, for example, if they’re trying to identify all of their backlist titles that correlate to Christmas, they can now type in Christmas themes and identify all of their books that contain these themes.
There’s also an interesting product that we’re developing right now called the Manuscript Evaluator. And what it is doing is it’s comparing a non-published work to all the works that have been published to try to find the closest match. That may provide some level of an indication of commercial success if the publisher knows what the success was for the books that it’s being recommended to.
KENNEALLY: Well, it’s fascinating, Jim Bryant, because at one point publishing was really a kind of – if not a seat of the pants business, it was a business based on gut, and there were a lot of (inaudible) out there that told someone – an editor – whether a book was going to be a success. It sounds like those guts aren’t really needed anymore.
BRYANT: Yeah. We hear the term moneyball for books being applied to what we’re doing. (laughter) And for those of you who’re familiar with the book Moneyball, of course, which applied to baseball, there’s a bit of that happening right now, I think. But discovery is really a huge challenge facing the industry today. And it’s not just with English language books in North America. It’s English language books globally but also other languages globally, as publishers try to reach out beyond their borders.
KENNEALLY: Right. But the thing about Moneyball that was so fascinating is that all the guts out there were wrong so much of the time.
BRYANT: Yeah. (laughter)
KENNEALLY: If a game started with a strike, supposedly that was good. But, well, the numbers didn’t show that.
BRYANT: Well, exactly. I think, though, to be fair ,I think, in publishing, there’s such a nuance to good writing and to an editor’s gut feeling on what may be successful, that will probably always outbalance any statistical data that we can do. But what I’m very happy with is the fact that we’re able to deliver candidates for comparison purposes at a much more efficient basis than an editor may be able to generate on their own or –
KENNEALLY: You’re helping out with the slush pile, as it were.
BRYANT: Exactly. Yeah, exactly.
KENNEALLY: Now, one aspect of what you’ve been doing that I think is really fascinating is the potential for crosspollination as far as language goes. You’ve got a partnership with a Chinese publisher called Tencent that is helping to begin to sell some of their titles into this marketplace, as well as English language titles into China. Tell us about that.
BRYANT: Yeah. So we’re actually working with a couple dozen different Chinese publishers, most of whom are pretty large. And we’re helping them export their titles in Chinese around the world right now. And we’ve also developed a series of algorithms that we’re applying to Chinese books. And this is helping nonnative Chinese speakers identify what these books are. And real simply, one of the things we’re doing is we’re generating keywords from the Chinese language books, and we’re translating those into English and German and Spanish. And we’re doing the reverse with English language books and German language books right now too, where we’re processing those three languages currently.
One of the really interesting things, as you mentioned, on this crosspollination side, is having the ability to identify not only the closest matching English language book but also the closest matching Chinese book. So if you’re a student of Chinese and you’re looking for something similar to a subject matter that you’re familiar with or to a book you’ve just finished reading, this will help you identify those books.
KENNEALLY: Well, you know, Jim Bryant, finally, it seems to me that – I think I got into the media world and into publishing because I’m no good at math. But it sounds like, if I want to get into publishing these days, I’m going to have to be pretty good at numbers.
BRYANT: Yeah. So I think that it’s fascinating to take a look at statistics and how statistics can be used to measure the elements of the base material of what this industry is constructed on, which is content. So how can we use that? How can we use statistical analysis to be able to determine – better understand what books may be selling better at what time of year is something that’s really fascinating.
KENNEALLY: Indeed it is. And thank you, Jim Bryant, cofounder and CEO of Trajectory, for joining me today on Beyond the Book.
BRYANT: Chris, it was great being here with you.
KENNEALLY: Beyond the Book is produced by Copyright Clearance Center, a global rights broker for the world’s most sought-after materials, including millions of books and e-books, journals, newspapers, magazines and blogs as well as images, movies and television shows. You can follow us on Twitter, find Beyond the Book on Facebook and subscribe to the free podcast series on iTunes or at our Website, beyondthebook.com.
Our engineer and co-producer is Jeremy Brieske of Burst Marketing. My name is Christopher Kenneally. For all of us at Copyright Clearance Center, thanks for listening to Beyond the Book.