DiaUk – Corpus for research into the history of Ukrainian (16th-18th c.)

Corpus queries can be carried out via the search and visualization environment Annis (Krause & Zeldes 2016) under https://korpling.org/annis/diauk or here.

The first version of the DiaUk corpus was compiled by Iryna Parkhomenko as part of her dissertation "Agreement and Transitivity in Middle Ukrainian Resultative and Passive -no/-to Constructions: A Corpus-Based Diachronic Investigation" (2016), funded by the German Research Foundation as part of the project "Corpus linguistics and diachronic syntax: Subject case, finiteness and agreement in Slavonic languages" (ME4125/1-2, PIs: Roland Meyer and Björn Hansen). Most of the texts stem from the website http://izbornyk.org.ua/ (with kind permission by the author of the site); the quality of the digitization was checked by Iryna Parkhomenko and Olesia Lazarenko. The inclusion of two larger administrative texts, the court records from Žytomyr (1590-1635) and Poltava (1668-1740), was kindly made possible by the Ukrainian Academy of Sciences (P. Ju. Hrycenko, V. M. Mojsijenko and U. M. Štandenko). The composition of the corpus is documented here.

The corpus texts were manually divided into sentences and clauses by Olesia Lazarenko and Iryna Parkhomenko. Iryna Parkhomenko also explicitly annotated the grammatical information relevant to Parkhomenko (2016) at token level. Rule-based tokenization was carried out in the annotation tool GATE; after export, the texts were automatically tagged ofr parts of speech and dependency-parsed using Stanford Stanza NLP (UD-Ukrainian). This automatic annotation is still heavily error-prone and will be successively improved. Roland Meyer is responsible for the technical realization.

We would like to thank Martin Klotz and Thomas Krause for their support with Annis and Olesia Lazarenko for her invaluable help with the annotation.

If you have any questions or comments, please do write to us at the contact address below.

References: