Skip to main content

Esperanto Newspaper Excerpts

Hosting organisations
Austrian National Library (ÖNB)
Responsible persons
Simon Mayer

The Project CLARIAH Esperanto Newspaper Excerpts deals with digitized newspaper excerpts about Esperanto, which are preserved in the Department of Planned Languages and Esperanto Museum of the Austrian National Library. The collection contains excerpts from a wide array of newspapers from different countries and in 22 languages. It is of significant interest to the Esperanto community because it allows to track the history and development of the language across time. The goal of the project is to make the collection full-text searchable. Due to the complex layout of the newspaper articles, we employ a two-stage approach. First, a layout recognition model detects all text blocks. Second, each text block is passed to Optical Character Recognition (OCR) software which recognizes the individual letters. Our approach aims to solve the complex layout problem as well as the challenging multi-language nature of the data set. The results comprise a data set where each image is annotated with meta data and full text. The publication will adhere to the FAIR and Open Data principles and will be accessible via ONB Labs.

Additional images

  • bild2.png