Data Rights?

A long, long time ago - well, about four years ago, but back then people were still saying things like “ebooks will only ever make up 1% of the market” and other stuff which now looks a little odd - I suggested looking at the idea that using a book as a data set for research using data analysis tools, and then using the results to create digital tools such as algorithms might be a new derivative use subject to the usual rules of copyright ownership. Since it’s a contentious idea in a digital setting to extend IP - the general direction of most discussion at the moment is towards loosening IP rules - let me just unpack my motivations a bit. They’re not sophisticated, and I am definitely floating this idea speculatively rather than strongly advocating it.

With due respect both to the valid point that obscurity is more problematic than “piracy” - which I tend to think of as eTWOCing - and the contention that unsanctioned sharing leads to more sales, I do occasionally worry about the creator business model over the next few decades. Specifically, I worry that ad-funded and freemium models may not support very many creators, and that it’s possible to envisage a scenario where the price people are willing to pay direct to content creators for a given work is not sufficient in most cases to cover the costs of production, even in what would traditionally be the mid-list (or the B-movie, or the support act) layer from which a decent percentage of A-list favourites emerge. (This is an avoidable problem in many ways, but I’m not convinced that anyone is doing enough to avoid it; the drive I see in mainstream publishing, for example, is towards fewer and more expensively produced ‘event books’ - the kind which can be supported with pretty hardbacks and which make good seasonal gifts, which are responsible for the bulk of the 4.2% in the UK’s hardback market last year. I won’t labour the downsides of that way of doing things for consumers, as I think they’re obvious enough.)

So here’s the thing: let us suppose that the Pink & Purple Banking Corporation of Frankfurt buys a load of ebooks, analyses the text as data, and produces a natural language interface tool which allows them to handle the first tier of their online customer support system with algorithms at a fraction of the cost of their call centre operation. In case that sounds a bit far-fetched: there are already a few automated systems around, used by the likes of Michelin and SFR, and within their limits they work pretty well - they provide the right service, but they fall down on the rhythms and nuances of natural human speech - exactly where a broader base of language use could help. I should say, incidentally, that I have no idea how the presently-existing systems were created, and I'm not suggesting that they specifically fall into the same category as the system in my example.

So by deploying the tricks it learns from the work of authors (who after all specialise in creating credible and sympathetic characters through nuances in language) the Pink & Purple Banking Corporation’s new system can pass a medium-tricky Turing Test inside its area of competence with flying colours - not hard in a situation where many call human centre employees are required to stick to a script in a second or third language anyway. Since the machine never gets pissed off, and since it’s programmed to be polite and helpful as well as credible, P&PBC actually see an uptick in positive reactions to their support team, and the bank profits.

See where I’m going? Why shouldn’t the authors whose skills they’re using get paid a royalty for their part in the creation of the system, just as they would if P&PBC was making movies or games? All these uses require that the raw material be processed into a new form which can then be used in to engage with the audience in another setting, and all require the input of good writing to work. That the work is transformed into a new medium does not alter the fact that the platform on which it rests is a copyright property - or more likely, in the case of the algorithms, a lot of them.

The interesting thing about this as an idea is that it means dealing largely with corporate use on a macro scale rather than chasing individuals for (effectively) micropayments. It’s even quantifiable how much an author has influenced use of language - you can find their footsteps in the text. It seems inevitable, given that the system is learning to interact with a modern audience, that it would use more modern - and hence still copyrighted - works to learn its language. And if companies using this technology are trading at the level of Michelin, which made just over €1bn in profit in 2010, it seems unlikely that a reasonable payment for use of such a system would be an insurmountable obstacle to its success.

Data, former US FTC member and technology lawyer Pamela Jones Harbour wrote in the New York Times recently, is a new asset class. It’s true: all manner of data, from our personal details to music sales charts to fish stocks are traded freely around the world. That’s the meaning of the “information economy” - and authors are producers of meaningful information: data arranged in a deliberate way according to an acquired skill or an inborn talent or whatever mixture of the two produces a Jeanette Winterson or a Tan Twan Eng - or even a William MacGonagall. It's also possible with modern analysis to pick the footsteps of one author out of the surrounding noise: our skill is detectable and to some extent quantifiable, it seems - in which case, surely, so is an author's contribution to a specific pattern of interaction in a piece of software derived from the work of many. So how is it the case that a company can make free with a unique assemblage of data, creating a derivative work in another medium from it, and not expect to pay? Well, maybe it’s just not the case and no one’s ever made an issue of it.

When I first pondered this idea, I didn’t think about it very hard and no one really took me up on it, but it came back into my mind when I was looking over the Hargreaves Report on IP, which goes out of its way to suggest an exemption

“ enable scientific and other researchers to use modern text and data mining techniques, which copyright prohibits.”
If there’s a need to produce an exemption, doesn’t that suggest that such a right does exist? Or at least, that there’s an argument - all the law ever really affords us by way of certainty - that it might?

In this new environment of shifting models and new interpretations of IP - and challenges to the need for it at all - authors are supposed to be looking for secondary and tertiary revenue streams to survive. I don’t insist on this one, but I do think it’s worth exploring, not least on the simple principle that work done and work used should be work remunerated.