SARIT—Search and Retrieval of Indic Texts

SARIT—Search and Retrieval of Indic Texts—is an international effort towards making the vast corpus of Indic texts accessible in a uniform way. “Indic texts” refers to premodern literary works in Indian languages.

SARIT develops guidelines on how to encode Indic texts, applies these guidelines to its own set of texts, and provides an example interface for accessing and searching these texts.

SARIT’s output thus belongs to three distinct areas:

SARIT’s documentation of how to encode Indic texts: https://github.com/sarit/SARIT-corpus/blob/master/schemas/odd/sarit-guidelines.xml (rendered at https://sarit.indology.info/sarit-pm/docs/encoding-guidelines-simple.html). SARIT marks up its texts using a defined and documented subset of the vocabulary provided by the Text Encoding Initiative Consortium’s Guidelines for Electronic Text Encoding and Interchange (https://www.tei-c.org/P5/). This documentation has resulted in the formation of the Special Interest Group “Indic Texts” under the Text Encoding Initiative’s umbrella (https://wiki.tei-c.org/index.php/SIG:IndicTexts).
SARIT maintains a library of texts encoded in that way, hosted at https://github.com/sarit/sarit-corpus: these texts are freely available under liberal Creative Commons licenses. This library contains Indic texts in several languages and from several different genres, corresponding to the different skills and interests of the involved researchers.
A client to read and search these texts: this client runs on https://sarit.indology.info/. Currently, this client is a heavily modified implementation of the “eXist-db Native XML Database” (https://github.com/eXist-db/exist). Its source code is freely available under open-source licenses from https://github.com/sarit/sarit-existdb.

SARIT received major funding from the NEH/DFG Bilateral Digital Humanities Program within a project directed by Prof. Sheldon Pollock (Columbia University) and Prof. Birgit Kellner (University of Heidelberg), between 2013 and 2017. Currently, it is being developed through individual efforts by members of several institutes, including the IKGA.

SARIT—Search and Retrieval of Indic Texts

Links