Skip to main content


Hosting organisations
Uni Innsbruck - Sprachwissenschaft
Responsible persons
Claudia Posch

No activity in healthcare is possible without language: All of the various settings in which healthcare is done rely on different forms of communication. Previous studies suggest that implicit forms of discrimination in healthcare communication persist and that patients are often discriminated depending on different social factors such as age, status, nationality or gender. This mostly occurs on an implicit, unconscious level; thus, discrimination is difficult to grasp. The project MedCorpInn wants to look for patterns of language use which are connected to such biases by investigating a large data set. The pre-existing digital data collection (corpus KARBUN) consists of 100.000 pre-anonymized radiology reports from the University Clinic of Innsbruck. Since there are only few similar corpora which consist of clinical texts, this data set can be referred to as unique.

The project aims to technically improve and develop the existing data set and to analyse the corpus with different tools and methods. For the technical improvements, the project group will work closely with their partners from DBIS (Databases and Informations Systems, UIBK). This collaboration will include the enhancement of the extensive metadata (age, gender, type of insurance, mode of examination etc.) as well as measures for automated data processing and for further anonymization. Moreover, the existing part-of-speech annotation will be improved; in this way e.g. the most frequent parts of speech can be found within the corpus.

Different corpus linguistic and discourse analytic methods and tools can be applied with the ready-made data. Thus, the applicants want to find out if differences of language use (if any) are connected to social factors. Therefore, the texts will be separated along the lines of certain categories (e.g. female/male, private insurance/public insurance etc.) and investigated in regards to statistically significant linguistic differences and patterns.

Furthermore, gender medical questions can be examined within the corpus. E.g. if medical procedures are linked with certain social factors, or if there are gender specific differences with regards to the precision of the measurements (e.g. of lengths/diameters of organs, tumours or injuries) and if yes, why.

MedCorpInn wants to find new methods of detecting structural biases on the linguistic surface of large datasets. Also, the project aims to contribute to the medical practice by creating ideas for guidelines to eliminate implicit biases and discriminations which are potentially harmful to patients.


Additional images

  • language-logo.png