Skip to main content

Sentiment-annotated corpus of Austrian historical newspapers

Hosting organisations
University of Graz (Department of Digital Humanities)
Responsible persons
Lucija Krušić
Start
End

The project is developing the first sentiment-annotated corpus of historical Austrian newspapers spanning the years 1800 to 1938. The aim is to significantly advance sentiment analysis within Digital Humanities and historical language processing.
Sentiment analysis – the automated detection and classification of emotions, attitudes, and opinions in texts – is a vital method in text mining and computational linguistics. Currently, there is no dedicated sentiment-annotated resource for Austrian German from the 19th and early 20th centuries. This project addresses this gap, enabling researchers to explore social narratives about migration, minority groups, and political discourse across different historical periods and ideological contexts.

Project Information

Using curated collections from ANNO (Austrian National Library) and DIGITARIUM (Austrian Academy of Sciences), key newspapers such as Wienerisches Diarium, Neue Freie Presse, and Arbeiter Zeitung are being manually annotated for sentiment. The annotation process is conducted by trained annotators applying established corpus annotation methodologies with the Doccano tool. Quality control is ensured through inter-annotator agreement metrics.
Completed work includes:

  • Improving OCR accuracy to 86% using machine learning techniques.
  • Thematic structuring with BERTopic to focus on migration, minority groups, labor, education, and nationalism.
  • Annotating over 700 instances covering multiple timeframes and topics.

The project’s next steps involve expanding the sentiment annotation to additional periods (1800–1850, 1900–1938) and publishing the final dataset as an open-access resource on Zenodo in compliance with FAIR data principles. It will also be integrated into the GAMS repository and made discoverable through the CLARIN Virtual Language Observatory ( VLO ), strengthening global research infrastructure in Digital Humanities.