Skip to main content

Machine Learning Suite for IIIF Resources

Hosting organisations
Österreichische Nationalbibliothek - Abt. für Forschung u. Entwicklung
Responsible persons
Christoph Steindl
Start
End
Tags
IIIF (500), Jupyter notebooks (1071), machine learning (644), object detection (1072), colorisation (1073), and tutorials (981)

GLAM institutions like the Austrian National Library (ÖNB) are responsible for archiving a vast amount of objects and data. Much information about these objects is available in digital form, and in many cases the object itself is already digitized. To manage, explore and analyze these collections of data, machine learning (ML) approaches have been developed to extract new information in order to create new views on certain collections.

Many GLAM institutions use the International Image Interoperability Framework (IIIF) to give access to metadata and digital material. IIIF allows for an easy integration of digital resources in websites and sharing across different institutions in collections and workflows. Thus IIIF-ready data is ideal for use cases with ML. The following graphic illustrates the stages of a conventional ML workflow (see the original machine learning workflow here ) and the potential areas of application of the IIIF framework (ie. different IIIF APIs) within.

Many ML pipelines that are publicly available (e.g. on GitHub) use Jupyter notebooks tot rain, test and apply their models. Jupyter notebooks are also very common in DH, in particular to document the generation, manipulation or analysis of datasets. In addition, Jupyter notebooks are used to fulfill teaching aspects. As an example, the NewsEye project with a focus on European newspapers published a notebook collection with an in-depth- analysis of their corpus (e.g. text classification or text similarity) dedicated for the use at university courses .

The main goal of this project is to combine the different technologies – IIIF, ML, Jupyter notebooks – in order to support researchers in generating new knowledge from the digitized cultural assets. Bringing together these technologies has many benefits: (1) it provides an easy, standardized and reusable way to integrate IIIF materials into ML applications in general and (2) it publishes these ML pipelines as Jupyter notebooks. They will be (3) well documented and can therefore (4) be used as a boilerplate for new projects and (5) can easily be applied by other institutions that support IIIF for their data. In addition to raw source code the project also aims to use interactive widget components in the notebooks in order to make the software suite easy to use for users with less previous knowledge in computer sciences.

Machine learning applications are and will be an essential part of data analysis in the scientific context. Regardless of whether small data collections or big data are processed, it is necessary to train users to use machine learning modules. This way, it is possible to understand complex and multidimensional problems and thus create new insights into collections. The project encourages applying these innovative methods to already digitized data.