Part 2: KU or KO? On streaming, subscription and big data

Editor's note: The FutureBook community’s Eric Briys, pictured, co-founder of France’s Cyberlibris, has provided us with this extensive discussion of ebook subscription services. We have it for you in two parts, of which this is the second.
• Part 2: Is there life beyond (proper) pricing? The hidden treasure of reading data

Is there life beyond (proper) pricing ? The hidden treasure of reading data

On top of straight money, publishing houses can get and learn a lot from properly designed subscription services. Similar strategies to the ones that led publishing houses to market hardcovers first, paperbacks afterward, pocket books, and so on, are available.

Take the following example: Assume the latest book of a best-selling author. What about having a pre-release of the book in digital format readable with no additional fee, say for one week, in the subscription based library? After a week of availability, the book is accessible through purchase/download only. Finally, after a few months, the book is again available in the digital library through the subscription service. After all, this is what Le Livre de Poche is all about.

In the medium term, such a string of events does help screen the impatient readers from the patient ones. There are many ideas along the same vein that can be explored such as testing manuscripts before fully investing in them. Whatever the idea tested, it will trigger data points. This is where subscription-based streaming services are hard to beat: They produce formidable reading data flows. Readers leave footprints that are most valuable if properly understood, managed and used.

To understand what's at stake, let's go for a moment to places where books are traditionally read and sold, namely physical brick-and-mortar libraries and bookstores. These spaces are a tribute to Euclid, the famous Greek mathematician, the Father of Geometry and its no less famous postulates. Between two given books goes one straight shelf only. Two straight shelves never cross each other (otherwise... books would fall, etc.).

As a result, musing in a physical library or a bookstore is a highly structured, organized experience. Books sit (and wait) on shelves. Not any shelves, though. These are the shelves that librarians’ classifications have elaborated over years of metadata efforts in classifying books. As a result, libraries or bookstores are great interfaces (not all of them though!) where the eye can collect lots of information in one go. The physical space is structured so that the books’ attractions are maximized for patrons or clients walking along shelves. This is a great plus. On the minus side, these are spaces where not many reading data are collected.

Musing in a digital library is very different. Even though it is easily accessible once the subscription is paid, it can be a frustrating experience. Frustration again! Indeed, the reader is limited to a small 2D space where not much can be displayed in one go. This a hard constraint. This is a minus. On the plus side, lots of data can be gathered: books that are read or not; books that are assigned to digital bookshelves; books that are commented on, etc. New metadata are created as we are all metadata of the books we read and love. As a result we have two sets of data, the traditional metadata and the usage metadata.

[Cyberlibris ready] A natural question to ask is the following: How would one (spatially) organize the digital library, once one takes into account all the available data? Which book would be close to which others? Which books would not? The question is more ambitious than than Amazon’s famous recommendation device. It addresses the organization of the whole library. It is not restricted to the neighborhood of a given book. To use a biological metaphor, with traditional metadata one can “compute” the genome of the library. With additional data provided by streaming services, one can “compute” its phenome.

This is why we called our internal project the Book Phenome Project (BPP). The project had a clear objective: To be able to compute for any digital library its social graph in order to, then, visualize it.

By social graph, we mean the graph of the books contained in the library. The structure of the graph (the location of books, the links and distances between books) is influenced by both traditional metadata (genotype) and extra environmental data (reading habits, phenotype). To achieve this ambitious goal requires some modern artillery, namely the use of machine learning techniques and other big data and information-visualisation related tools.

The end result can be quite stunning. The BPP output is called the DICE : Digital Content Explorer. Here are are two screen captures of the DICE, one computed on ScholarVox (digital library for business schools) and one for BiblioVox (digital library for public libraries) :




Each point in the visualisation is a book. Each book emits a color halo which indicates the topic to which it is related.

For instance, in the ScholarVox Dice example, red means Finance, blue means Economics and Decision Sciences.

There is a striking difference between the two graphs. One mixes colors a lot and the other does not. This shows that academic users don't read the same way public library users do. The algorithm detects this from the data. As a result, books are not mapped the same way. In an academic setting, the motivation for reading is tightly coupled to the curriculum and to how the curriculum is organized over the academic year. In a public library setting patrons are not guided by any curriculum. They go freely by their tastes. This explains why cooking books are well identified, close to gardening books (art de vivre) while business books are in a different area. The obvious point from these two maps is that there is a lot to learn from “machine learning processed” reading data.

Assume that a given publisher wants to know what its social map -- in other words its clients graph -- looks like globally (all countries spanned by the service included); then locally (say France or Senegal); and finally at a given institution in France. This is indeed possible and easily visualised.

The next capture retains the following scenario :
Service = BiblioVox
Publisher = Eyrolles
Field = Sciences
Location = Levallois-Perret (suburb of Paris)



For the first time, publishers benefit from a highly granular information drawn from the thorough analysis of reading data.

The DICE and its avatars will soon land on publishers desktops and allow them to map their books, to filter these maps (across users, location, topics etc.) and to trigger proper business actions. Publishing has so far evolved without any proper map, amid a lack of proper and relevant information. Those days are over. Rich maps are now available. Publishers will no longer venture in the dark.

This visualisation is made possible thanks to the vast flow of reading data that subscription/streaming data are able to capture. Without these data it would have been impossible to design the maps. The maps are valuable not only to publishers but also to all parties involved in some way with the service. This is, for instance, a great curation tool for librarians. One can easily imagine large screens on library walls displaying the DICE or its equivalents.

Last but not least, data trigger a virtuous circle whereby as more data are deciphered, pricing gets closer to fairness, and appeal to more people, and so forth.

Show me the money, show me the map!

With the disruption of Amazon's KU, publishers face what was obvious 14 years ago to the few pioneers of subscription/streaming/revenue sharing services such as Cyberlibris or ebrary: The digitisation of books means that it will change the way they are consumed (not to mention the way they are produced and the way they are written).

This has happened in the music industry, in the movie industry. It has taken longer because a book is not the same thing as a CD. Gutenberg (and predecessors) have provided us with a genius invention that is both hardware and software. The print book is a marvel of design and autonomy. Content and device, in one. Hat tip, Mr. Gutenberg!

It would be silly to go head-to-head with Gutenberg. One has to figure out how to be orthogonal to Gutenberg.

A good place to start is in uncovering current reading (and publishing) frustrations.

Another good idea is to look at recipes that have proven their value over a long period of time. In that respect, the library is a great place to focus on. A print library has a lot to offer. This is why we still use libraries. But a library also has weaknesses which Internet and digitisation may help circumvent. As for most things in life, we have to face a recurring trade-off that Kevin Maney aptly named the fidelity-convenience trade-off: As long as what we loose on the fidelity front is regained on the convenience front, we're all right and willing to accept the new deal. This is what distinguishes a good design from a poor one.

If subscription/streaming services are to succeed, they have to offer convenience gains that compensate fidelity losses.

Properly designed (and priced) subscription/streaming services can deliver what print books and physical libraries can't deliver. This does not mean that both print books and physical libraries are doomed. Again, what matters is the trade-off equation.

· A print book scores high on fidelity, less on convenience.
· A digital library scores high on convenience, less on fidelity.

The good news is that we can access both trade-offs, print and digital, and be better off as a result.

What do all these business musings tell us about the KU event ?

One word comes to my mind: coordination. Markets often fail because of lack of coordination between market players, because of coordination costs. This is the famous prisoners’ dilemma whereby the two convicts, unable to coordinate their actions, end up with the maximum jail sentence. Had they been able to devise a joint strategy, their jail time would have been significantly reduced.

The book market is replete with coordination costs: Think of book inventories, for instance. KU again rings the bell of coordination. If there is a firm that has efficiently internalized coordination costs, this is indeed Amazon. KU is a natural output of Amazon's coordination efforts. That's why Amazon is powerful enough to dictate its rules. If players don't try to fairly and openly coordinate, Amazon will do the coordination!

This does not however mean that “a messe est dite”. Publishers have a lot of dear assets in their hands; firms like Cyberlibris and 24Symbols do, too.

The key to combine those assets synergistically is to talk, think together, spend the time to fully uncover (as I have tried to do here) what a subscription/revenue sharing model truly is and truly does, and what dividends one can draw from it.

Yes, it takes time, patience and will. It is often said that well-established firms should think and act like start-ups. After 14 years of spreading the subscription gospel, I am still puzzled by how poorly understood subscription is, how feared it still is. This is odd!

Again, one has to put the reader and his or her reading environment at the forefront because this is where the money is. Given technological developments, how can one enhance the lives of readers? -- Putting oneself in readers’ shoes is a great and profitable exercise that helps a lot reconsider Terra Cognita and embrace more actively Terra Incognita. And Terra Incognita is not that scary anymore, once one starts designing proper maps.

Subscription services will not be designed in a vacuum. Yes, firms like Cyberlibris are there to push boundaries, to provide the initial momentum. But, for the momentum to be sustainable and even stronger requires tight collaboration/coordination of all the players.

One should no longer be begging for content as if it were in a sacred safe. One should rather open the safe and, as Carl Shapiro and Hal Varian once put it in their Information Rules book, be aggressive instead of being greedy!

Productivity gains are achievable thanks to digitisation and the Internet: A smart and profitable move is to redistribute them fairly. And, in that last respect, there is a lot that subscription services can deliver as long as life is made easier.

There is no reason to be KU or KO, unless, of course, the status quo becomes the business rule.

