When the machines read your book
The Bookseller on the stands in London this morning reports that Bowker -- the ProQuest-owned US ISBN agency and publishing research firm -- is in final talks for a partnership with a company called Trajectory.
The aim at Bowker is to offer authors and small publishers a new way to generate book recommendations for their readers.
Trajectory, based in Boston and founded in 2012, is also in talks with major retail and distribution companies. It has worked with many of them in developing this new approach to the discoverability challenge for both commercial and library settings.
Trajectory has developed a machine-based process that for the first time in history is able to recommend related books based on a variety of proprietary algorithms.
That's Trajectory's c.e.o. Jim Bryant (pictured, left), formerly with Sony's Data Discman. Bryant and his partner Scott Beatty -- who with Bryant created InfoPlease.com, which they sold to Pearson -- this week are meeting with potential clients and partners: retailers and distributors.
And what they're discussing is a process that begins with a form of "machine learning."
- A book is loaded into Trajectory's system.
- Automated "Natural Language Processing" (NLP) parses the text, "reading' and categorizing the data.
- When an analysis of the book's key elements is made, the book's characteristic profile, its "personality," is exposed.
- Then search algorithms find matches to the book within the growing database.
- Those matches generate recommendations.
This is numbers applied to words, a series of mathematical interpretations of textual data based in techniques of vector space modeling; "term frequency-inverse document frequency"; cosine similarity; and least squares. If the Trajectory textual analysis programming and algorithms are implemented deeply and widely, its executive team says that a new mode of discoverability will soon be at hand. The company has referred to this at times as a "grace note for ebooks."
You don't have to understand the mathematical analysis behind Trajectory to know that the grace of a little automated discoverability would make a lot of people in the publishing industry very happy about now.
In response to The Bookseller's inquiry, Bowker director of identifier services Beat Barblan said, "We see tremendous value in offering authors and publishers the opportunity to process their works with the Trajectory system for matching readers to books. Their natural language processing and recommendation results are indeed impressive."
The Trajectory-Bowker agreement may create one of the first showcases in which the publishing industry can see the shift represented in the Trajectory technology's potential.
Text as data
Simply put, where most algorithmic recommendations so far have been based on sales (as in Amazon's "customers who bought this item also bought..."), Trajectory's algorithms deploy a book's text. First that text is broken down into various "vectors."
The Trajectory analysis uses myriad vectors to model "high-level abstractions."
By comparing one book's unique characteristics to others' -- time period, action, pace, "intensity," word types, book length, dialog, "distinct word prevalence," mood, gender, movement, specific references, and more -- the system then is able to offer matches, recommendations of ebooks based on their content rather than on their sales history.
eBooks a reader discovers through Trajectory's analysis, then, are reflective of other work the reader has enjoyed.
It's easy to see how a major online retailer might want to use such technology. If you buy a book, the Trajectory system can then generate recommendations of other books based on the first purchase's characteristics.
Bryant points out that the same capability is available to libraries.
"Imagine you're a library," Bryant says, "and you're eager to increase your funding. Your funding is based on the number of books you check out. Imagine being proactive and able to email your library patrons and say, 'Hey, you read this book, and we think there are some other books you might be interested in,'" based on the kind of textual analysis that Trajectory's system can provide.
"I think we'll see a lot of libraries try to redefine themselves this year," Bryant says, "and start to be more proactive, reaching out to patrons to give them an edge."
Bowker's partnership announcement is significant because it provides a way for authors and small publishers to utilize the Trajectory technology, which until now has been primarily -- and quietly -- introduced only to points of retail distribution and to some library distributors. Bryant and Beatty say that even as they formalize deals with major retail players and publishers, they're also intent on getting the technology into the hands of independent authors, as the competition of millions of titles mounts.
"Our primary target market now are the points of distribution," Bryant says, "meaning every ebook retailer or distributor that we're currently working with. One of the competitive advantages we think we have in launching this is that the deliverable is pretty straightforward. It's a list of keywords that can be ingested into a retailer's own search engine to allow readers on their site to find books that contain certain people, places, or subject matter."
Graphing a book -- on sentiment, intensity, keywords
Some of the "deep learning" representations the system creates, including a "sentiment curve" that characterizes sentences, paragraphs, chapters, and entire books with numeric ratings of a book's emotional qualities -- which Trajectory's algorithms then match to other books.
In one comparison, for example, Charles Dickens' A Christmas Carol has an emotional peak about two-thirds of the way into the book, then soars to a high at the end, while Anna Sewell's Black Beauty, graphed against it, runs a far subtler course, never reaching the emotional intensity of the Dickens book. Frank Baum's The Wonderful Wizard of Oz provides a nearly opposite curve from the Dickens, dropping in intensity two-thirds of the way in, but then rising powerfully to an emotional finish.
Similar to the process used by a retailer, Bryant says, "through the API, we can allow a library distributor to let a library generate specific shelves [of ebooks] for a patron. If the library shares with us a list of books the patron has read, we'll then be able to present a customized shelf" of similar books for that patron.
From analysis to algorithmic recommendation
Beatty and Bryant's Trajectory.com site displays collections of books being analyzed by the system and resulting recommendations.
One collection focuses on London, another on outdoors stories, another one on news-related books referencing the CIA, etc.
George Tenet's At the Center of the Storm: My Years at the CIA (HarperCollins, 2009), for example, can be viewed not only for its "sentiment" graph but also for its "intensity," as well as for its places, people, adverbs, adjectives, verbs, nouns, and for statistics including its "adult reading time" (11 hours, 27 minutes), the average lengths of its sentences and words, total word count (171,852), unique words (8,086), and more.
In At the Center of the Storm's recommendations section, you see Bush vs. the Beltway: How the CIA and the State Department Tried to Stop the War on Terror by Laurie Mylroie (HarperCollins, 2010) making a strong showing in many categories of comparison.
"The recommendations, I think, are our finest work product," Bryant says. "We're continuing to enhance them with new sets of data that we can fold in. Last week, we found a way of identifying and refining the level of complexity of a book."
As a side-product of this work, Bryant adds, plagiarism can be detected by the Trajectory system as it attaches confidence ratings to its recommendations. A match of text at an extremely high level between two books might indicate plagiarised content.
Trajectory "is providing a decimal" to clients as it operates, Bryant says, "and the decimal reflects the uniqueness of a keyword and its use in the book and in the English language. If the client -- a retailer or library -- has the ability to weight words provided to them and only use words of a certain weight, then another criterion of recommendation is available involving the linguistic sophistication -- or at least range -- of a book.
"It's also really interesting to pull out vectors like sentiment, then take all of an author's books and see the similarities within one writer's work" on such a scale. Can you, as it were, recognize an author's work on such a concept as a sentiment graph across his or her books?
What's more, Trajectory's analysis is also being used now, Bryant says, to allow customers of certain retail/distribution partners in China to select English-language books. By matching simplified Mandarin characters to keywords that Trajectory develops from a book's text, that book may become discoverable in a Chinese search. Publishers in China eventually may be interested, on the other hand, in having their own books searchable to English readers -- enough enough search interest in a title could be a prompt to get a text translated and produced for new markets.
As authors including the self- and traditionally published author of the new The Shell Collector, Hugh Howey, has written, something many in publishing have wanted from their technology is the ability to tell when readers stop reading an ebook, or skip over passages, or re-read or abandon sections or entire books.
One partner Bryant declines to name -- contract talks are ongoing -- is using complexity ratings to help discern what may be causing readers to stop reading a book at a given point. "Complexity," he says, "turns out to have a huge impact on this 'exit data'" on when a reader "exits" the book. "In terms of recommendations, if somebody likes a book that's fairly complex, then the opportunity to use that in making recommendations is good."
And we'll hear more about this type of inquiry next week at Digital Book World (DBW), the conference in New York City produced by F+W Media. There, UK-based Andrew Rhomberg is to discuss an element of his Jellybooks "Reader Analytics" offering. According to information at the Rhomberg site, the approach, now in focus-group testing, will be able to indicate for authors, agents and publishers:
- When readers open a new ebook chapter
- Average reading speed
- Length of reading sessions
- Position at which readers abandon an ebook
- Time of day an ebook is read
- Readers clicking on links or images in the ebook
Competition and comprehension
The fact that Trajectory's offering is capable of similar output reflects on how competitive the marketplace becomes in various areas of publishing at given times.
"Once raw data is exposed," as Bryant puts it, it might come down to whose idea of how to apply and deploy that data is the most attractive and salable.
For example, he points out, "At the end of the year, there are so many popular-book lists. And Scott [Beatty] had circulated a link to Google's most popular search terms for 2014. What we'd like to do is publish a statistical bank that would show the most popular words of a year used in books published at the time. We could see trends in character and keywords over time." Something like this, especially compared to high-ranking keywords in a matched historical time period, might help explicate how literature reflects cultural concepts in westerns or romance or science-fiction.
The list of ebook distributor-partners Trajectory is working with is a who's who of international channels, including Ingram, Apple, Amazon -- with particularly strong ties to JD.com and Amazon China ("a very progressive company," Bryant says) -- as well as Google, Foyles, OverDrive and more.
And in its Small Demons asset acquisition, the company gained the HarperCollins library, and also is working with publishers including Abrams, Hachette Audio, Scholastic, Canongate, and more.
Bryant doesn't say this, but it seems logical to think that if one of the major's libraries -- HarperCollins' -- is in the Trajectory database because of the Small Demons acquisition, then other majors might be keen on seeing their own catalogs "read" by the machines in Boston so that searches for recommendations can turn up their books, as well.
More announcements of formalized partnerships are to come shortly, Bryant and Beatty say, as Trajectory's system of semantic analysis comes online.
And, meanwhile, Bryant says, "One of the coolest early advantages we see is an opportunity to elevate books in backlists," particularly in subscription settings where many largely unknown titles can go unnoticed. "It would probably be to their advantage to offer books to customers that aren't being read."
Top image - Shutterstock: Marina Sun