This paper focuses on the challenges and the implications of an AHRC-funded Big Data project that will make searchable online over a million book illustrations from the British Library’s collections. The images span the late eighteenth to the early twentieth century, cover a variety of reproductive techniques (including etching, wood engraving, lithography and photography), and are taken from around 68,000 works of literature, history, geography and philosophy.
The paper identifies the following issues, which impact on our understanding of ‘the image’ in Digital Humanities and the negotiation of Big Data more generally:
1. Adding to bibliographic metadata
Although the images are accompanied by the BL catalogue entry, this information is not always complete. Moreover, data from the title (e.g. the name of the illustrator/engraver) needs to be identified in order to make the archive searchable using these terms. We will discuss the algorithms that we have used to add to this metadata.
2. Analysing the iconographic features of the images
This is a particular challenge because of the sheer number of images in the dataset. Our approach combines image recognition software, crowdsourced tagging and machine learning.
3. New research questions
We will outline the ways in which this searchable illustration archive will offer new ways of ‘reading’ images, allowing for the further development of Illustration Studies.