Technologies involving big data analysis are becoming so cheap to work with that academic publishers should be innovating in the area even if they are unsure what they are looking to create, Sage's Ian Mulvany told the Association of Learned and Professional Society Publishers (ALPSP) conference yesterday (15th September).
Mulvany, formerly of open access journal eLife, joined Sage as head of product innovation earlier this month.
Speaking in a panel discussion on Research and Scholarly Publishing in the Age of Big Data, Mulvany warned: "Increasingly we are going to have to understand the needs researchers face in their challenge with data." While some science disciplines, such as astronomy, have their data issues "nailed", other disciplines now have "researchers who realise they have lots of data but no idea what to do with it," he said.
An example was a typical experiment in microscopy in life sciences, which could generate 15-20 terabytes of data a week. "How do you publish that and share it with colleagues?" he asked. "There are increasing requirements from funders to make data available - but people with data too big to publish."
Sage global publishing director Ziyad Marar said the publishing house was interested in the use of big data in social research, where it is being done "on a scale never seen before." As an example, he cited Berkeley sociologist Nick Adams, who works on social protest movements that turn violent. Adams studied the phenomenon of the Occupy movement across 192 cities, using reports generated by police, journalists and others, to build up a corpus of text describing what had taken place. This generated information machine algorithms could work with, meaning the whole corpus of text could be coded in a single year - whereas using traditional methods it would have taken 15 years. The result was to uncover information about police strategy patterns.
"The work has shown the potential of asking new questions, and old, and finding new ways to answer them," Marar said.
He warned that social research was coming into the use of big data "slowly and cautiously" because of ethical concerns around informed consent to the use of the data, and the potential for sampling bias. But he said: "It's an incredibly important and exciting thing to be making headway on."
Francine Bennett, c.e.o. of tech company Mastodon C, which has worked on big data analysis with publishers as well as several other industries, and DEFRA, said there were now standard things most businesses could do with big data as "bread and butter" activities, such as predictive analysis to show them which of their customers were likely to leave them in the near future so they could be targeted with incentives to stay.
While she had come across the problem of experts in a given field distrusting the use of algorithms, it was important to work with them to identify the parts of their jobs that an algorithm could safely do, to save them time and effort, she said. It was also important to create algorithms without unconscious bias, Bennett added.
The ALPSP annual conference is being held at the Park Inn conference centre at London Heathrow and will finish on Friday (16th September).