Skip to main content

MHDBDB goes AI. Data preparation for the OER-LLM ParzivAI

Hosting organisations
University of Salzburg (MHDBD)
Responsible persons
Katharina Zeppezauer-Wachauer, Julia Hintersteiner, and Alan Lena van Beek
Start
End

The Middle High German Conceptual Database ( MHDBDB ) at the Paris Lodron University of Salzburg is developing an innovative Digital Humanities project that applies artificial intelligence to historical language data.
During the Heidelberg Hackathon “AI and the Middle Ages” , the concept for ParzivAI emerged – a Large Language Model (LLM) capable of automatically translating Middle High German texts into modern German.
This system will be released as an Open Educational Resource (OER) and follows the FAIR principles for open and sustainable research.
Objective: to make medieval texts easily accessible for schools, universities, and the general public.

About the Project

ParzivAI is designed for use in disciplines such as German Studies, Linguistics, Digital Humanities, Computational Linguistics, and Medieval Studies. The prototype was developed by Dr. Florian Nieser (Heidelberg Center for Digital Humanities, HCDH ) and Thomas Renkert (Heidelberg School of Education). Initial tests with works such as “Armer Heinrich”, “Erec”, and “Parzival” have demonstrated the potential of AI-based Middle High German translation for research and education.

The MHDBDB contributed by providing selected historical text and metadata sets and by performing the data preprocessing in collaboration with Dr. Alan van Beek and Peter Färberböck.

To fully leverage the MHDBDB’s extensive data, further data preparation steps are required, including:

  • Repairing faulty exports
  • Converting data into open formats (JSON, XML)
  • Providing TEI and RDF datasets
  • Developing an open API for lemma queries – a prototype for Retrieval Augmented Generation (RAG)
  • Creating a Zotero interface for secondary literature to generate additional synthetic training data

Continuing the cooperation with Heidelberg requires targeted investment in external IT development and project-related travel expenses.