Springer Nature hopes to spark an industry-wide AI debate with its first machine-generated chemistry book

Springer Nature hopes to spark an industry-wide AI debate with its first machine-generated chemistry book

From the new book by Adobe’s Chris Duffey to Edition at Play’s latest project We Kiss The Screens, it seems that using artificial intelligence to help create books is all the rage. Now Springer Nature has thrown a title into the AI ring with its first machine-generated book in chemistry - and the company is hoping it will be the first of many to come.

Working in close collaboration, the team at Springer Nature and researchers from Goethe University Frankfurt created an algorithm called Beta Writer to select, consume and process relevant publications in the field of lithium-ion batteries, sourced from Springer Nature’s content platform SpringerLink. Based on this peer-reviewed and published content, the Beta Writer used a similarity-based clustering routine to arrange the source documents into coherent chapters and sections, then created succinct summaries of the articles. The extracted quotes are referenced by hyperlinks which allow readers to further explore the original source documents, while automatically created introductions, table of contents and references help them navigate within the book.

But has the tech really enhanced the reader experience? What were the struggles in working with an algorithmic author? And what other applications might this model have for books in the future? We interviewed Dr. Niels Peter Thomas, managing director of books at Springer Nature and Henning Schoenenberger, director of product data & metadata management and project lead for the development of the book, to find out.

What motivated you to create a machine-generated book?

NPT: Building on a long tradition and expertise in academic book publishing, Springer Nature is aiming to shape the future of book publishing and reading. Progress in natural language generation, which focuses on producing text narratives from data sets, is advancing fast, and new technologies around artificial intelligence offer promising opportunities for generating scientific content automatically.

With this prototype, we want to explore both the opportunities and limitations of machine-generated research content, while also addressing a number of questions related to the impact of artificial intelligence on scholarly publishing and its potential implication, such as who is the originator of machine-generated content and who is accountable for it? We hope that our first machine-generated book initiates a public debate on the opportunities, implications and potential risks of machine-generated content in scholarly publishing. As a global publisher, it is our responsibility to also take potential implications into consideration and provide a solid framework for this new type of content.

What does it provide that a human-generated book couldn’t?

NPT: With such a large amount of new findings and data, it is increasingly difficult for scientists to keep track of their fields of research, and this prototype is one solution to the problem of managing information overload efficiently. With the help of algorithms we can create new works based on existing published content and data sets, make substantial connections in the existing literature and come to new conclusions.

By providing a structured summary from a potentially vast set of papers, a machine-generated book can be a great help to anyone having to write a literature survey or find their way into a topic. Even if the excerpts are somewhat clunky and clearly recognizable as being machine-generated, it should still speed up the literature digestion process. At the same time, if needed, readers are always able to identify and click through to the underlying original source in order to dig deeper and further explore the subject.

In layman’s terms, how does the technology work?

HS: A machine-generated book is a book which has been automatically generated by a computer algorithm programmed to extract material on a particular area or topic.  Take “lithium-ion batteries” which is the scope of our book. The algorithm pipeline retrieves relevant content from a given repository, for example SpringerLink, and automatically clusters it. The algorithm then generates a chapter structure, and summarizes and paraphrases the most relevant documents. All quotes which have been extracted are referenced by hyperlinks which allows for tracing back the quotes to their source documents. Table of contents and references are created automatically to help navigate the book.

What was the greatest technical challenge?

HS: The greatest challenge was to not get over-excited by the technology, but to find the right balance between the use case, the creation of meaningful text and technical feasibility of the chosen methods. This technology has great potential uses such as auto-summarization and abstractive summarization. However, as the resulting texts are not yet of good enough quality we decided to focus on extractive summarization for the time being which has been well-tested.

We did explore more advanced modules and while we expect them to eventually yield better results, for now they will be held in reserve as we move forward upon the solid foundation of this initial publication.

Did you make any major pivots along the way?

HS: Throughout the entire development process, full transparency was essential for us to explore both the opportunities of machine-generated content and the current limitations that technology still confronts us with. Using agile methods, we were able to immediately address and solve any upcoming issues and constantly adjust our own procedures. We are genuinely convinced that exposing the way we work, being clear that failure is an integral part of the progress, ensuring there is a continuous feedback loop into the development, encouraging criticism and learning from it, will help us turn this prototype into a successful long term product which will allow researchers to spend their time more effectively. This is why we have outlined the technological developments of the implementation in the book’s preface.

What other potential applications can you see for AI in academic publishing?

NPT: We believe that in the future content will be created in many different ways, from entirely human-created content to a variety of blended man-machine text generation to entirely machine-generated text.  What is certain is that AI technology, with its ability to help authors create text or parts of texts from scratch, or to generate a data-driven scope (table of contents) of a given area which can then be re-edited, has the potential to be a valuable asset for authors.

If the technology turns out to be reliable, we plan to increase the use and creation of machine-generated content. However, research articles and books written by researchers and authors will continue to have the most important role in scientific publishing. Artificial intelligence is not yet able to generate anything similar to a full-scope and meaningful research article. Algorithms, due to a lack of contextual understanding, still have a very hard time to remember what was said three pages before, and to build a storyline that appeals to readers. Therefore, we believe that the future will show a promising co-existence between traditional publishing models and machine-generated content.

What do you think might be the greatest dead ends/pitfalls?

HS: One of the greatest pitfalls is that machine-generated content might be perceived as more reliable than content created by humans. Similar to other new and emerging technologies, quality will most likely not be comparable with human-created content in the beginning and readers should take this under consideration. We are also aware of the fact that the quality of machine-generated content can only be as good as the underlying sources which have been used to curate it. Therefore, we have decided to only use peer-reviewed, robust research from our content platform SpringerLink for this prototype.

Hence, we aim to be transparent about the current stage of this development and also address potential limitations and shortcomings. Further improvements will be necessary to constantly increase the level of quality which can be delivered by machine-generated content.

What’s next for Springer Nature and AI?

NPT: The current implementation will be subject to ongoing refinement. We will use the prototype on Lithium-Ion Batteries as a basis to explore further development of the technology. This also includes how it can be integrated into production workflows of scientific literature.

For the first prototype, we decided to focus on a current chemistry topic. We are planning to publish prototypes in other subject areas as well, including the Humanities and Social Sciences, with special emphasis on an interdisciplinary approach, acknowledging how difficult it often is to keep an overview across the disciplines.