Parallel text corpora supply researchers with data for multilingual lexicographic research, translation studies, and language typology. The objectives of the ParRus research project at the University of Tampere are to compile a Russian-Finnish parallel corpus and to develop the software for the maintenance of the corpus. Text aligning is the crucial problem in compiling parallel corpora. The study of parallel texts shows that, in most cases, the translator retains paragraphs of the original in the translation. The Source Language – Target Language quotient (ratio of number of words in originals to number of words in translations) is also a stable value. The aligning programme developed at the Department compares original with translation, paragraph by paragraph, adding new paragraphs to the extracts being aligned until the extracts match the SL-TL quotient. The system only produces good results if the translation is structurally close to the original. However, the study of parallel texts shows that frequency of words and their translation equivalents does not usually match. Therefore, paragraphs and larger text units are the only elements of formal text structure which can be used for comparing parallel texts, unless knowledge structures are exploited.
Barlow, M. 1995. ParaConc: A Concordance for Parallel Texts. Computers & Texts. Vol. 10.
'ParaConc: A Concordance for Parallel Texts ' () Vol. 10 Computers & Texts .
Mikhailov, M. 2000. Automatic Text Aligning in a Parallel Text Corpus. Paper presented at The 12th Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities. Glasgow.
'Automatic Text Aligning in a Parallel Text Corpus ' , , .
Mikhailov, M. & Tommola, H. 1999. Developing a Russian-Finnish Parallel Text Corpus for Lexicographic Work and Translation Studies: Towards Automation of Routine Procedures. 4th TELRI (Trans European Language Infrastructure) Seminar Newsletter 9. 24-25.
Parallel Corpora. http://www.ruf.rice.edu/~barlow/para.html
Rundell, M. 1998. The corpus of the future, and the future of the corpus. http://www.ruf.rice.edu/~barlow/futcrp.html
Schank, R. & Abelson, R. 1977. Scripts, Plans, Goals, and Understanding: an Inquiry into Human Knowledge Structures. Hillsdale (N. J.)
Scripts, Plans, Goals, and Understanding: an Inquiry into Human Knowledge Structures. , ().
Svartvik, J. 1992. Corpus linguistics comes of age. In: Sinclair, J.M. (ed.) Directions in corpus linguistics, Proceedings of Nobel Symposium 82, Stockholm 4-8 August 1991. Berlin and New York: Mouton de Gruyter.
Corpus linguistics comes of age. , ().
The English-Norwegian Parallel Corpus. http://www.hf.uio.no/iba/prosjekt.