The study of features that affect the linguistic form of translated texts has been one of the central questions within the field of corpus-based translation studies. In the partially overlapping field of computational linguistics, previous studies have shown that source languages of individual texts can be detected automatically in direct translations and indirect translations (i.e., translations done from translations). However, computationally oriented approaches have paid limited attention to what specific linguistic features make successful classification possible. Consequently, the types of linguistic phenomena characterizing translations and the kinds of linguistic interference that can be detected in them remain underexplored. In this study, we study the linguistic features that contribute to the identification of the source language of direct translations from English, French, German, Greek, and Swedish, as well as indirect translations from Greek into Finnish, with English, French, German, and Swedish as mediating languages. Theoretically, this study builds on Halverson's (2017) gravitational pull model to explain the mechanisms behind our findings in a theoretically sound fashion and to generate theoretically motivated, specific hypotheses to be tested by future research. The analysis makes use of keyness analysis as a supervised machine learning technique, as well as exploratory factor analysis (EFA) as an unsupervised machine learning technique. The results indicate that sentence length, sentence-initial adverbs and sentence-final specification are the linguistic features that set the different types of translations apart from each other. Furthermore, the salient features of the ultimate source language outweigh those of the mediating languages in indirect translations or the entrenched parallels between specific language pairs.
Assis Rosa, A., Pięta, H., & Bueno Maia, R. (2017). Theoretical, methodological and terminological issues regarding indirect translation: An overview. Translation Studies, 10(2), 113–132. https://doi.org/10.1080/14781700.2017.1285247.
Berber Sardinha, T., & Pinto, M. V. (Eds.) (2019). Multi-Dimensional analysis: Research methods and current issues. Bloomsbury Academic.
Biber, D. (1988). Variation across speech and writing. Cambridge University Press.
Biber, D. (1989). A typology of English texts. Linguistics, 27(1), 3–44. https://doi.org/10.1515/ling.1989.27.1.3.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
Čermák, F., & Rosen, A. (2012). The case of InterCorp: A multilingual parallel corpus. International Journal of Corpus Linguistics, 17(3), 411–427. https://doi.org/10.1075/ijcl.17.3.05cer.
Egbert, J., & Staples, S. (2019). Doing multi-dimensional analysis in SPSS, SAS, and R. In T. Berber Sardinha & M. Veirano Pinto (Eds.), Multi-dimensional analysis: Research methods and current issues (pp. 99–114). Bloomsbury Academic.
Fabrigar, L. R. (2012). Exploratory factor analysis. Oxford University Press.
Gabrielatos, C. (2018). Keyness analysis: Nature, metrics and techniques. In C. Taylor, & A. Marchi (Eds.), Corpus approaches to discourse: A critical review (pp. 225–258). Routledge.
Hakulinen, A., Vilkuna, M., Korhonen, R., Koivisto, V., Heinonen, T. R., & Alho, I. (2004). Iso suomen kielioppi. Suomalaisen Kirjallisuuden Seura. http://scripta.kotus.fi/viskURN:ISBN:978-952-5446-35-7.
Halverson, S. L. (2015). Cognitive Translation Studies and the merging of empirical paradigms: The case of ‘literal translation’. Translation Spaces, 4(2), 310–340. https://doi.org/10.1075/ts.4.2.07hal.
Halverson, S. L. (2017). Gravitational pull in translation. Testing a revised model. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical translation studies: New methodological and theoretical traditions (pp. 9–46). De Gruyter.
Halverson, S. L. (2019). ‘Default’ translation: A construct for cognitive translation and interpreting studies. Translation, Cognition & BehaviorTranslation, Cognition & Behavior, 2(2), 187–210. https://doi.org/10.1075/tcb.00023.hal.
Hareide, L. (2016). Is there gravitational pull in translation? A corpus-based test of the gravitational pull hypothesis on the language pairs Norwegian–Spanish and English–Spanish. In M. Ji, M. Oakes, L. Defeng, & L. Hareide (Eds.), Corpus methodologies explained. An empirical approach to translation studies (pp. 188–231). Routledge.
Islam, Z., & Hoenen, A. (2013). Source and translation classification using most frequent words. In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 1299–1305). https://www.aclweb.org/anthology/I13-1185.
Ivaska, L. (2019). Distinguishing translations from non-translations and identifying (in)direct translations’ source languages. In J. H. Jantunen, S. Brunni, N. Kunnas, S. Palviainen, & K. Västi (Eds.) Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools, 2019 (pp. 125–138). University of Oulu. https://urn.fi/URN:ISBN:9789526223216.
Ivaska, L. (2020). A mixed-methods approach to indirect translation: A case study of the Finnish translations of modern Greek prose 1952–2004. University of Turku. https://urn.fi/URN:ISBN:978-951-29-8234-9.
Ivaska, I., & Bernardini, S. (2020). Constrained language use in Finnish: A corpus-driven approach. Nordic Journal of Linguistics, 43(1), 33–57. https://doi.org/10.1017/S0332586520000013.
Ivaska, I., Bernardini, S., & Ferraresi, A. (2024). The complex case of constrained communication: A corpus-driven, multilingual and multi-register search for the common ground between non-native and translated language. In B. van Rooy, & H. Kotze (Eds.), Constraints on language variation and change in complex multilingual contact settings (pp. 191–222). John Benjamins Publishing Company.
Ivaska, I., & Ivaska, L. (2022). Source language classification of indirect translations. Target, [Special Issue]: What Can Indirect Translation Research Do for Translation Studies?, 34(3), 370–394. https://doi.org/10.1075/target.00006.iva.
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36. https://doi.org/10.1007/BF02291575.
Kanerva, J., Ginter, F., Miekka, N., Leino, A., & Salakoski, T. (2018). Turku neural parser pipeline: An end-to-end system for the CoNLL 2018 shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (pp. 133–142).
Koppel, M., & Ordan, N. (2011). Translationese and its dialects. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 1318–1326). http://www.aclweb.org/anthology/P11-1132.
Kotze, H. (2020). Converging what and how to find out why. In L. Vandevoorde, J. Daems, & B. Defrancq (Eds.), New empirical perspectives on translation and interpreting (pp. 333–371). Routledge.
Kruger, H., & van Rooy, B. (2018). Register variation in written contact varieties of English. English World-Wide. A Journal of Varieties of English, 39(2), 214–242. https://doi.org/10.1075/eww.00011.kru.
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, Articles, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11.
Langacker, R. (2008). Cognitive grammar: A basic introduction. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195331967.001.0001.
Leech, G. (2006). New resources, or just better old ones? The holy grail of representativeness. In N. Nesselhauf, & C. Biewer (Eds.), Corpus linguistics and the web (pp. 133–149). Brill.
Lefer, M.-A., & De Sutter, G. (2022). Using the Gravitational Pull Hypothesis to explain patterns in interpreting and translation: The case of concatenated nouns in mediated European Parliament discourse. In M. Kajzer-Wietrzny, A. Ferraresi, I. Ivaska, & S. Bernardini (Eds.), Mediated discourse at the European parliament: Empirical investigations (pp. 133–159). Language Science Press. https://doi.org/10.5281/ZENODO.6977046.
Lynch, G., & Vogel, C. (2012). Towards the automatic detection of the source language of a literary translation. In Proceedings of the COLING 2012: [Posters] (pp. 775–784). https://www.aclweb.org/anthology/C12-2076.
Mauranen, A. (2004). Corpora, universals and interference. In A. Mauranen, & P. Kujamäki (Eds.), Translation universals: Do they exist? (pp. 65–82). John Benjamins Publishing Company. https://doi.org/10.1075/btl.48.07mau.
Neumann, S. (2014). Contrastive register variation: A quantitative approach to the comparison of English and German. De Gruyter Mouton.
Pięta, H., Ivaska, L., & Gambier, Y. (2022). What can research on indirect translation do for Translation Studies? Target, 34(3), 349–369. https://doi.org/10.1075/target.00012.pie.
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.Rr-project.org/.
Rabinovich, E., Ordan, N., & Wintner, S. (2017). Found in translation: Reconstructing phylogenetic language trees from translations. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (pp. 530–540). https://doi.org/10.18653/v1/P17-1049.
St. André, J. (2020). Relay. In M. Baker, & G. Saldanha (Eds.), Routledge encyclopedia of translation studies (3rd ed., pp. 470–473). Routledge.
Tognini-Bonelli, E. (2001). Corpus linguistics at work. John Benjamins Publishing Company.
Toury, G. (2012). Descriptive translation studies – And beyond (rev. edition.). John Benjamins Publishing Company. http://ebookcentral.proquest.com/lib/kutu/detail.action?docID=1053083.
Ustaszewski, M. (2021). Towards a machine learning approach to the analysis of indirect translation. Translation Studies, 14(3), 313–331. https://doi.org/10.1080/14781700.2021.1894226.
Winter, B. (2020). Statistics for Linguists: An introduction using R. Routledge. https://doi.org/10.4324/9781315165547.
Woodstein, B. J. (2022). Translation and genre. Cambridge university press.
Wright, M. N., & Ziegler, A. (2017). Ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01.