Digital researcher Bernard Ogden reflects on using experimental technology to foster co-learning between digital and collection experts at Wellcome Collection.

Digital technologies promise new ways for archives and other heritage institutions to search, interrogate, interpret, and understand our collections. However, they also bring a challenge for successful collaboration between different specialists.

While digital specialists (such as me) understand the potential of software and data, they do not necessarily understand the collections, their history, or their context. In contrast, collections specialists have deep knowledge of the collections and their audiences, but do not necessarily understand how to best use digital methods to work with them. Each discipline can have a different understanding and assumptions on how things can or should work.

How can we work together to make digital potential a reality?

One answer is to make everything as concrete as possible. A way to do this is to borrow from a common software development method called Agile. An Agile approach features short periods of programming followed by demonstrations of results. As well as talking about what we might be able to do, we also try to actually do it.

Making digital potential a reality

In 2023, I had an opportunity to explore this in more detail when I was selected to become a fellow of The National Archives and Research Libraries UK (RLUK) Professional Fellowship Scheme. The scheme allows a member of The National Archives staff to work with a member of Research Libraries UK (or vice versa) and allows 'staff from both organisations to gain experience and insight from one another, to strengthen and diversify the relationship between them, and to overcome some of the collective challenges facing research and cultural organisations.'

My project set out to contribute to current conversations around the role of research software engineers (RSEs) by investigating RSE-led prototyping as a method of interdisciplinary collaboration and co-learning in digital cultural heritage.

My RLUK host was Wellcome Collection, which has a large collection of digitised text from its library of historical medical books. Over the course of the last year, I visited Wellcome several times to run workshops with staff from its digital and collections teams. These workshops focused on methods for working with Wellcome’s digitised collection of (broadly) medical books from the 1700s, particularly using topic modelling.

Using Voyant tools to demonstrate text mining of a part of Wellcome Collection’s library.

What is topic modelling?

Topic modelling is a statistical technique for discovering what a collection of text is 'about'. By looking at associations between words, it collects words that somehow appear to be important to a so-called topic.

But topic modelling cannot tell you what the topic actually is, it just tells you which words are important to it. A human has to look at the topics and make a judgement about what, if anything, they are about.

At each workshop during the main part of the project, I showed a prototype applying topic modelling to Wellcome’s 18th-century books. We discussed some of its implications for the collection, and participants spent time considering topics and interacting with the latest prototype. Between workshops I developed a new version of the prototype, learning the techniques that I was using as I went. This meant that prototypes were thrown together very quickly based on the understanding that I had at the time. While this works for exploring ideas and possibilities, it also means that nobody should be drawing any research conclusions about the collection based on them!

Six topics generated from a Wellcome Collection corpus. Word clouds like these are an intuitive way to represent topics, but also easy to read meaning into. Some topics may be more clear than others – some may just be nonsense.

What did we learn?

For one thing, Wellcome is an even more varied collection than I realised. Topics about horses and agriculture were a surprise to me, but not to Wellcome’s specialists, which perhaps speaks to the potential of topics as an exploratory tool.

While this project was really about experimenting with cross-disciplinary co-learning and collaboration, the experience made me think that organisations would benefit from specialists engaged with digitised collections. Just as organisations benefit from the expertise of specialists engaged with their physical collections, digital collection specialists would build a deep knowledge of how specific collections respond to specific digital methods.

Title page of one of the books in the main corpus used for this project. Source: Wellcome Collection.

The prototype and the software

One key part of the project’s intention was to root the work in concrete detail by building real prototypes for a real collection. I followed this approach for early workshops, but later workshops took a step back to look at ideas for developing Wellcome’s digital audience.

This higher-level approach seemed to generate more useful discussion, but the earlier detailed work had laid the ground for this. To paraphrase one of the participants, we needed to start with the detailed technical work to understand what we were thinking about. It does, therefore, seem that an early focus on detail can be effective.

On the software side, while learning on the go and working at speed resulted in a lot of learning, it was also quite a stressful way to work. It may well be that more pre-project training, or a slower pace, would be beneficial.

There is also a question around the balance between using existing tools and developing new software. Existing tools can be a quick option, but they take time to learn and may need adaptation. Developing new software tailored to the project’s needs can be time-consuming, but perhaps no more so than experimenting with the suitability of a range of tools, arguably making it a less risky method for an experienced software developer. Starting from open source code is a sort of middle ground and my work here owes a lot to the HaToRI project, which topic modelled Hansard.

Time vs cost

As with any experiment, some parts worked better than others, and I would change a lot of things if I were to do it over again.

I would spend more time early on working on understanding participants' needs, and I would try to develop a more effective method of assessing the actual effectiveness of the project. I would also slow the pace down to allow more time for learning, reflection, and to build on that foundation of detailed technical work.

However, the more time we spend on something, the more it costs. We must be confident that it is worth the time. For organisations with the capacity to spare a digital specialist, and for digital specialists with the temperament to work at this kind of pace, there may be some potential in the approach.

Looking to the future

The project produced several outputs, including slides from the workshops, a poster and a Jupyter notebook which walks through an example engagement with Wellcome’s digitised text. Prototype software developed during the project can be found – in a roughy and ready state! – in the wellcome_pipe GitHub repository. A report on the project will appear in due course.

We are still learning how to explore collections digitally, and learning how to collaborate to do so. The more we try, the more we can refine our methods and the better we can get at learning together about applying digital methods to collections. Whether we go about it like this, or in some other way, digital and collections specialists surely will carry on working together. The more we do it, the better we will get at it, and the more we will realise the promise of digital methods for our collections.