You are viewing your 1 free article this month.
Sign in to make the most of your access to expert book trade coverage.
The mammoth project to digitise our archive has thrown up a host of challenges and discoveries worth sharing.
At De Gruyter, we’ve reached the final stages of the largest digitisation project we’ve ever undertaken. Between 2017 and 2021, our publishing house has digitised over two and a half centuries of scholarly history. The archive now contains 53,000 titles – every book we’ve published (or at least the ones we’ve located so far) since 1749.
Now one of the largest collections of digital humanities books in the world, the archive contains rare gems by Johann Wolfgang von Goethe and The Brothers Grimm and early works from more contemporary scholars such as Milton Friedman and Noam Chomsky. So, what did we learn, what would we do differently, and what tips would we pass on to other publishers?
If you’re embarking on a project of this scale, step one is to understand why you’re doing it. We digitised every book because we believe it’s our role to publish, not decide how future generations of researchers will use the archive. We also wanted to make a rich cultural heritage locked away on library books shelves more readily available to future generations of researchers. While every publisher will have their own reasons, deciding on your "why" early on will give you the clarity you need when the tough decisions arise.
During the digitisation process we had to make many decisions we never thought we’d have to make. What’s the difference between a "journal" from a hundred years ago and a book? Should we digitise the 1st, 5th and 15th edition of a book, or is that a duplication? Should every volume of every collection be included? In the end – we stuck to our "why" and digitised everything. While we’d have still had to grapple with these thorny questions, having a more detailed inventory before we started may have made the decision-making process simpler and faster. Dealing with issues such as these took more time which explains why…
Digitisation sounds like it should be a swift, automated process but that’s far from the case when you’re working with centuries-old books stored in two libraries. Overall, the project involved more time and more people-power than we expected. Working with our valued partners, it was only when we started taking books off shelves and began scanning them that we found issues. Many were fragile, prone to cracking and flaking and contained ornate gothic lettering which baffled the scanning machines. Plus, many titles had no metadata or the metadata we had was in a poor shape. Again, this added to the amount of time and person-power (not just from within the company) we needed to dedicate to it.
The commitment to digitise everything came with some complicated ethical considerations. As a Berlin-based publisher, we had to decide what to do with highly sensitive content published during Nazi rule between 1933 and 1945
As a commercial publisher we needed to partner with customers in the decision-making process. One key learning point for us was to know what books were on the shelves (and the condition of them), before offering pre-publication lists to customers. Don’t assume that the lists you have accurately reflect what will be digitised – always expect surprises! Sometimes we’d go to the shelves and books were missing. Then we’d find extra titles not on our lists that we’d need to find metadata for. Many books were damaged which meant scanning was impossible. Sometimes pages and whole sections would be missing. Our advice? Be upfront with early customers but don’t advertise specifics – just a broad understanding what subject areas or authors might be featured.
We thought that in terms of usage, the titles that had been digitised before we started the project (around 11,000), would be less popular than our freshly-digitised books. Not so. Digitised titles that had been in the market for years continue to be very valuable to customers. While our project as a whole is now at break-even, when you digitise an archive as a commercial publisher be aware that the break-even for an individual title is most likely not going to arrive within the first two to five years – without significant marketing spend.
The commitment to digitise everything came with some complicated ethical considerations. As a Berlin-based publisher, we had to decide what to do with highly sensitive content published during Nazi rule between 1933 and 1945. We worked with a subject expert to identify this content and decided that while it should be digitised, it should not be for sale and available on a request-only basis. While every academic publisher is different, most will have some sensitive content somewhere in their archive. We decided not to hide it – something I’m personally very proud of – but to make sure access was managed. We also made sure that anyone who finds sensitive content has a route to get in touch so we can act on it.
With a project like ours, you never really get closure. Don’t expect it to have a neat start and end date – it needs to be someone’s responsibility moving forward. There will always be loose ends that need tying up. There will always be follow-up questions that need answering. When you’ve been publishing books for over 270 years, researchers and libraries will contact you with titles that need investigating, and new and as yet undiscovered collections that might need to be included. When you embark on a digitisation project of this scale, think of it as a long-term commitment that needs a strategy and buy-in from every aspect of the organisation.
While every digitisation project is different with different challenges, obstacles and barriers along the way, learning from your experience is key. What went well? What didn’t go so well? What would you change for next time? Our review process – something I’ve just shared with you now – was crucial to keep us on track and may prove useful to you should you be considering a similar approach. Good luck!
Learn more about how the archive was digitised and why digital archives matter to researchers and librarians.
