Beserman Udmurt documentation project is a long-term undertaking aimed primarily at collecting lexicographic and corpus data in the field. During our work on the project, we developed a pipeline for collecting, annotating and publishing our data. In this paper, we describe this pipeline and present the online web interface we developed for providing public access to Beserman materials. We use TLex lexicographic software for working on the dictionary and Fieldworks FLEX for annotating the corpus. After the data have been annotated, they are exported to XML and stored in the online web interface, where these two types of data become interconnected and searchable. We propose solutions to challenges that arise in projects of such kind and reflect on various constraints imposed on lexicographic databases being developed in long-term projects aimed at description of underresourced languages. We suggest that the proposed pipeline and the web interface we developed could be employed by similar projects dealing with other minority languages. The web interface based on the database and a corpus of oral Beserman texts is available online at beserman.ru.
Apresyan, Yuriy D. (ed.). 2009. Issledovaniya po semantike i leksikografii. Vol. I: Paradigmatika [Studies of semantics and lexicography. Vol. 1: Paradigmatics]. Moscow: Yazyki slavyanskikh kul’tur.
Arkhangelskiy, Timofey and Maria Usacheva. 2015. Syntactic and morphosyntactic properties of postpositional phrases in Beserman Udmurt as part-of-speech criteria. SKY Journal of Linguistics 28. 103–137.
Atkins, B. T. Sue and Michael Rundell. 2008. The Oxford guide to practical lexicography. Oxford: Oxford University Press.
Breen, Jim W. 2003. Word usage examples in an electronic dictionary. Manuscript. Papillon (Multi-lingual Dictionary) Project Workshop, Sapporo, July 2003.
Budin, Gerhard, Stefan Majewski and Karlheinz Mörth. 2012. Creating lexical resources in TEI P5. Journal of the Text Encoding Initiative 3.
Cobb, Tom. 2003. Do corpus-based electronic dictionaries replace concordancers? In B. Morrison, C. Green and G. Motteram (eds.) Directions in CALL: Experience, experiments. Hong Kong: Polytechnic University. 179–206.
Facchinetti, Roberta. 2007. Theoretical description and practical applications of linguistic corpora. Verona: QuiEdit.
Hanks, Patrick. 2009. The impact of corpora on dictionaries. In P. Baker (ed.) Contemporary corpus linguistics. London: Continuum. 214–236.
Kel’makov, Valej K. 1998. Kratkiy kurs udmurtskoy dialektologii. Vvedenie. Fonetika. Morfologiya. Dialektnye teksty. Bibliografiya [A brief sketch of Udmurt dialectology. Introduction. Phonetics. Morphology. Texts in Udmurt dialects. References]. Izhevsk.
Kibrik, Andrey A., Dobrov Grigoriy B., Zalmanov Dmitriy A., Linnik Anastasia S. and Lukashevich Natalia V. 2010. Referentsial’nyj vybor kak mnogofaktornyy veroyatnostnyy protsess [Referential choice as a multifactorial probabilistic process]. Kompyuternaya lingvistika i intellektual’nye tekhnologi 9. 173–181.
Kibrik, Andrey A. and Vera I. Podlesskaya. 2009. “Rasskazy o snovideniyakh”: korpusnoe issledovanie ustnogo russkogo diskursa [“Dream stories”: A corpus study of Russian oral discourse]. Moscow: Yazyki slavyanskikh kul’tur.
Kirillova, Lyudmila E. et al. (eds.). 2008. Udmurtsko—russkiy slovar’ [Udmurt—Russian dictionary]. Izhevsk: UUIYaL UrO RAN.
Kuznetsova, Ariadna I. et al. 2013. Slovar’ besermyanskogo dialekta udmurtskogo yazyka [Dictionary of the Beserman dialect of Udmurt]. Moscow: Tezaurus.
Lyukina, Nadezhda M. 2008. Osobennosti yazyka balezinskikh i yukamenskikh besermyan (sravnitel’naya kharakteristika) [The peculiarities of the language of Balezino and Yukamenskoe Besermans (a comparison)]. Doctoral dissertation.
Izhevsk. Miller, Evgeniya O. 2017. Avtomaticheskoe vyravnivanie slovarey literaturnogo udmurtskogo yazyka i besermyanskogo dialekta [Automatic alignment of Literary and Beserman Udmurt dictionaries]. In Proceedings of Elektronnaya pis’mennost’ narodov Rossiyskoy Federatsii: Opyt, problemy i perspektivy [Electronic literacy of the peoples of the Russian Federation: Experience, challenges and perspectives]. Syktyvkar. 109–111.
Ranchhod, Elisabete Marques. 2005. Using corpora to increase Portuguese MWU dictionaries: Tagging MWU in a Portuguese corpus. In Proceedings from the Corpus Linguistics Conference Series. Birmingham: University of Birmingham.
Starosta, Stan. 1985. Relator nouns as a source of case inflection. In V. Z. Acson and R. L. Leed (eds.) For Gordon H. Fairbanks. Honolulu: University Press of Hawaii. 111–133.
Teplyashina, Tamara I. 1970. Yazyk besermyan [The language of the Besermans]. Moscow: Nauka.
Usacheva, Maria N. et al. 2017. Tezaurus besermyanskogo narechiya: Imena i sluzhebnye chasti rechi (govor derevni Shamardan) [Thesaurus of the Beserman dialect: Nouns and auxiliary parts of speech (Shamardan village variety)]. Moscow: Izdatel’skie resheniya.
Zeljko, Miran. 2009. Improvements of dictionaries—Suggestions by Evroterm. In H. Stančić, S. Seljan, D. Bawden, J. Lasić-Lazić and A. Slavić (eds.) Future2009: Digital resources and knowledge sharing. Zagreb: University of Zagreb. 269–279.
Zipf, George Kingsley. 1949. Human behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley Press.