Authors:
Xiaoman Wang Centre for Translation Studies, University of Leeds, United Kingdom

Search for other papers by Xiaoman Wang in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0001-5863-5517
and
Binhua Wang Centre for Translation Studies, University of Leeds, United Kingdom

Search for other papers by Binhua Wang in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-2404-5214
Restricted access

Abstract

In corpus-based interpreting studies, typical challenges exist in the time-consuming and labour-intensive nature of transcribing spoken data and in identifying prosodic properties. This paper addresses these challenges by exploring methods for the automatic compilation of multimodal interpreting corpora, with a focus on English/Chinese Consecutive Interpreting. The results show that: 1) automatic transcription can achieve an accuracy rate of 95.3% in transcribing consecutive interpretations; 2) prosodic properties related to filled pauses, unfilled pauses, articulation rate, and mispronounced words can be automatically extracted using our rule-based programming; 3) mispronounced words can be effectively identified by employing Confidence Measure, with any word having a Confidence Measure lower than 0.321 considered as mispronounced; 4) automatic alignment can be achieved through the utilisation of automatic segmentation, sentence embedding, and alignment techniques. This study contributes to interpreting studies by broadening the empirical understanding of orality, enabling multimodal analyses of interpreting products, and providing a new methodological solution for the construction and utilisation of multimodal interpreting corpora. It also has implications in exploring applicability of new technologies in interpreting studies.

Supplementary Materials

    • Supplemental Material
  • Artetxe, M., & Schwenk, H. (2019). Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7, 597610. https://doi.org/10.1162/tacl_a_00288.

    • Search Google Scholar
    • Export Citation
  • Barras, C., Geoffrois, E., Wu, Z., & Liberman, M. (2001). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication, 33(1–2), 522.

    • Search Google Scholar
    • Export Citation
  • Bendazzoli, C. (2012). From international conferences to machine-readable corpora and back: An ethnographic approach to simultaneous interpreter-mediated communicative events. In Breaking ground in corpus-based interpreting studies (Vol. 147, pp. 91117). Springer.

    • Search Google Scholar
    • Export Citation
  • Bendazzoli, C. (2017). Benefits and drawbacks of English as a lingua franca and as a working language: The case of conferences mediated by simultaneous interpreters. In English in Italy. LInguistic, educational and professional challenges (pp. 119141). FrancoAngeli.

    • Search Google Scholar
    • Export Citation
  • Bendazzoli, C. (2018). Corpus-based interpreting studies: Past, present and future developments of a (wired) cottage industry. In Making way in corpus-based interpreting studies (pp. 119). Springer.

    • Search Google Scholar
    • Export Citation
  • Bendazzoli, C., Bertozzi, M., & Russo, M. (2020). From text to multimodal resources: Advancing interpretation research from an already existing corpus. Meta, 65(1), 210235.

    • Search Google Scholar
    • Export Citation
  • Bendazzoli, C., Russo, M., & Defrancq, B. (2018). Corpus-based interpreting studies: A booming research field. INTRALINEA ON LINE TRANSLATION JOURNAL, 20, 12.

    • Search Google Scholar
    • Export Citation
  • Bernardini, S., Ferraresi, A., Russo, M., Collard, C., & Defrancq, B. (2018). Building interpreting and intermodal corpora: A how-to for a formidable task. In Making way in corpus-based interpreting studies (pp. 2142). Springer.

    • Search Google Scholar
    • Export Citation
  • Biagini, M. (2012). Data collection in the courtroom: Challenges and perspectives for the researcher. In Breaking ground in corpus-based interpreting studies (Vol. 147, pp. 231252). Springer.

    • Search Google Scholar
    • Export Citation
  • Boersma, P., & Van Heuven, V. (2001). Speak and unSpeak with PRAAT. Glot International, 5(9/10), 341347.

  • Braune, F., & Fraser, A. (2010). Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. Coling 2010: Posters, 8189.

    • Search Google Scholar
    • Export Citation
  • Bühler, H. (1986). Linguistic (semantic) and extra-linguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters. Multilingua, 5(4), 231235.

    • Search Google Scholar
    • Export Citation
  • Bührig, K., Kliche, O., Meyer, B., & Pawlack, B. (2012). The corpus “Interpreting in Hospitals”: Possible applications for research and communication training. In Multilingual corpora and multilingual corpus analysis (pp. 305315). John Benjamins.

    • Search Google Scholar
    • Export Citation
  • Chahuneau, V., Smith, N. A., & Dyer, C. (2013). Knowledge-rich morphological priors for bayesian language models. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 12061215).

    • Search Google Scholar
    • Export Citation
  • Collados Aís, Á. (2002). Quality assessment in simultaneous interpreting: The importance of nonverbal communication. The Interpreting Studies Reader, 327336.

    • Search Google Scholar
    • Export Citation
  • Collados Aís, Á., Fernández Sánchez, M. M., Iglesias Fernández, E., Pérez-Luzardo, J., Pradas Macías, E. M., Stévaux, E., Blasco Mayor, M. J., & Jiménez Ivars, A. (2004). Presentación de Proyecto de Investigación sobre Evaluación de la Calidad en Interpretación Simultánea (Bff2002-00579). Actas Del IX Seminario Hispano-Ruso de Traducción e Interpretación. Moscú (Moscow): Universidad Estatal Lingüística de Moscú (MGLU), 315.

    • Search Google Scholar
    • Export Citation
  • Conneau, A., Lample, G., Ranzato, M., Denoyer, L., & Jégou, H. (2018). Word translation without parallel data (arXiv:1710.04087). arXiv. https://doi.org/10.48550/arXiv.1710.04087.

    • Search Google Scholar
    • Export Citation
  • Dal Fovo, E. (2018). The use of dialogue interpreting corpora in healthcare interpreter training: Taking stock. The Interpreters’ Newsletter, 23, 83113.

    • Search Google Scholar
    • Export Citation
  • Defrancq, B. (2015). Corpus-based research into the presumed effects of short EVS. Interpreting, 17(1), 2645. https://doi.org/10.1075/intp.17.1.02def.

    • Search Google Scholar
    • Export Citation
  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-Training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805.

    • Search Google Scholar
    • Export Citation
  • Falbo, C. (2012). CorIT (Italian television interpreting corpus): Classification criteria. In S. S. Francesco, & C. Falbo (Eds.), Breaking ground in corpus-based interpreting studies (pp. 155186). Springer. https://doi.org/10.3726/978-3-0351-0377-9/6.

    • Search Google Scholar
    • Export Citation
  • Falbo, C. (2018). La collecte de corpus d’interprétation: Un défi permanent. Meta, 63(3), 649664. https://doi.org/10.7202/1060167ar.

    • Search Google Scholar
    • Export Citation
  • Fu, R. (2016). Comparing modal patterns in Chinese-English interpreted and translated discourses in diplomatic setting: A systemic functional approach. Babel, 62(1), 104121. https://doi.org/10.1075/babel.62.1.06fu.

    • Search Google Scholar
    • Export Citation
  • Gao, F., & Wang, B. (2017). A multimodal corpus approach to dialogue interpreting studies in the Chinese context: Towards a multi-layer analytic framework. The Interpreters’ Newsletter, 22, 1738. https://doi.org/10.13137/2421-714X/20736.

    • Search Google Scholar
    • Export Citation
  • Garside, R. (1987). The CLAWS word-tagging system. In R. Garside, G. Leech, & G. Sampson (Eds.), The computational analysis of English: A corpus-based approach. London: Longman.

    • Search Google Scholar
    • Export Citation
  • Gavioli, L. (2015). On the distribution of responsibilities in treating critical issues in interpreter-mediated medical consultations: The case of “le spieghi (amo)”. Journal of Pragmatics, 76, 169180.

    • Search Google Scholar
    • Export Citation
  • Halliday, M. A. K. (2014). Language as social semiotic. In The discourse studies reader (pp. 263272). Amsterdam: John Benjamins.

  • Han, C., Chen, S., Fu, R., & Fan, Q. (2020). Modeling the relationship between utterance fluency and raters’ perceived fluency of consecutive interpreting. Interpreting. International Journal of Research and Practice in Interpreting, 22(2), 211237. https://doi.org/10.1075/intp.00040.han.

    • Search Google Scholar
    • Export Citation
  • House, J., Meyer, B., & Schmidt, T. (2012). CoSi-A corpus of consecutive and simultaneous interpreting. Multilingual Corpora and Multilingual Corpus Analysis, 295304.

    • Search Google Scholar
    • Export Citation
  • Hu, K., & Tao, Q. (2013). The Chinese-English conference interpreting corpus: Uses and limitations. Meta, 58(3), 626642. https://doi.org/10.7202/1025055ar.

    • Search Google Scholar
    • Export Citation
  • Huang, P.-S., Kumar, K., Liu, C., Gong, Y., & Deng, L. (2013). Predicting speech recognition confidence using deep learning with word identity and score features. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 74137417). https://doi.org/10.1109/ICASSP.2013.6639103.

    • Search Google Scholar
    • Export Citation
  • Jiang, H. (2005). Confidence measures for speech recognition: A survey. Speech Communication, 45(4), 455470.

  • Kress, G. (2009). Multimodality: A social semiotic approach to contemporary communication. Routledge.

  • Kress, G., & Van Leeuwen, T. (2001). Multimodal discourse, the modes and media of contemporary communication.

  • Kurz, I. (1993). Conference interpretation: Expectations of different user groups. The Interpreters’ Newsletter, 5, 1321.

  • Kurz, I., & Pöchhacker, F. (1995). Quality in TV interpreting. Translatio-Nouvelles de La FIT-FIT Newsletter, 15(3), 4.

  • Mead, P. (2000). Control of pauses by trainee interpreters in their A and B languages. The Interpreters’ Newsletter, 10(200), 89102.

    • Search Google Scholar
    • Export Citation
  • Monti, C., Bendazzoli, C., Sandrelli, A., & Russo, M. (2005). Studying directionality in simultaneous interpreting through an electronic corpus: EPIC (European parliament interpreting corpus). Meta, 50(4). https://doi.org/10.7202/019850ar.

    • Search Google Scholar
    • Export Citation
  • Moser, P. (1996). Expectations of users of conference interpretation. Interpreting, 1(2), 145178. https://doi.org/10.1075/intp.1.2.01mos.

    • Search Google Scholar
    • Export Citation
  • Ouyang, Q. (2020). Effects of non-verbal paralanguage capturing on meaning transfer in consecutive interpreting | Semantic Scholar. In Multimodal approaches to Chinese-English translation and interpreting (1st ed., pp. 198218). Routledge.

    • Search Google Scholar
    • Export Citation
  • Pan, J. (2019). The Chinese/English political interpreting corpus (CEPIC): A new electronic resource for translators and interpreters. In Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019) (pp. 8288). https://doi.org/10.26615/issn.2683-0078.2019_010.

    • Search Google Scholar
    • Export Citation
  • Pietrandrea, P., Kahane, S., Lacheret-Dujour, A., & Sabio, F. (2014a). The notion of sentence and other discourse units in corpus annotation. Spoken Corpora and Linguistic Studies, 331364.

    • Search Google Scholar
    • Export Citation
  • Pietrandrea, P., Kahane, S., Lacheret-Dujour, A., & Sabio, F. (2014b). The notion of sentence and other discourse units in corpus annotation. John Benjamins.

    • Search Google Scholar
    • Export Citation
  • Plevoets, K., & Defrancq, B. (2016). The effect of informational load on disfluencies in interpreting: A corpus-based regression analysis. Translation and Interpreting Studies, 11(2), 202224. https://doi.org/10.1075/tis.11.2.04ple.

    • Search Google Scholar
    • Export Citation
  • Pöchhacker, F., & Zwischenberger, C. (2010). Survey on quality and role: Conference interpreters’ expectations and self-perceptions. AIIC Communicate! Spring, 53. https://www.google.com/url?sa=i&url=https%3A%2F%2Faiic.org%2Fdocument%2F9646%2F&psig=AOvVaw1Llr7q-D8wduUFUPEDnqmr&ust=1713378241518000&source=images&cd=vfe&opi=89978449&ved=0CAgQr5oMahcKEwio45K9rceFAxUAAAAAHQAAAAAQBA.

    • Search Google Scholar
    • Export Citation
  • Poyatos, F. (2002). Nonverbal communication across disciplines: Volume 1: Culture, sensory interaction, speech, conversation. John Benjamins.

    • Search Google Scholar
    • Export Citation
  • Rognes, T. (2001). ParAlign: A parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Research, 29(7), 16471652.

    • Search Google Scholar
    • Export Citation
  • Russo, M. (2018). Speaking patterns and gender in the European parliament interpreting corpus: A quantitative study as a premise for qualitative investigations. In Making way in corpus-based interpreting studies (pp. 115131). Springer.

    • Search Google Scholar
    • Export Citation
  • Russo, M., Bendazzoli, C., & Defrancq, B. (2018). Making way in corpus-based interpreting studies. Springer.

  • Russo, M., Bendazzoli, C., Sandrelli, A., & Spinolo, N. (2012). The European parliament interpreting corpus (EPIC): Implementation and developments. In Breaking ground in corpus-based interpreting studies (pp. 5390). Springer.

    • Search Google Scholar
    • Export Citation
  • Sandrelli, A. (2012). Introducing FOOTIE (Football in Europe): Simultaneous interpreting in football press conferences. In Breaking ground in corpus-based interpreting studies (pp. 119153). Springer.

    • Search Google Scholar
    • Export Citation
  • Sandrelli, A., & Bendazzoli, C. (2005). Lexical patterns in simultaneous interpreting: A preliminary investigation of EPIC (European parliament interpreting corpus). In Proceedings from the corpus linguistics conference series (Vol. 1).

    • Search Google Scholar
    • Export Citation
  • Schmid, H. (1999). Improvements in part-of-speech tagging with an application to German. In Natural language processing using very large corpora (pp. 1325). Springer.

    • Search Google Scholar
    • Export Citation
  • Schmidt, T., & Wörner, K. (2009). EXMARaLDA–Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics, 19(4), 565582.

    • Search Google Scholar
    • Export Citation
  • Setton, R. (2011). Corpus-based interpreting studies (CIS): Overview and prospects. Corpus-Based Translation Studies: Research and Applications, 3375.

    • Search Google Scholar
    • Export Citation
  • Shlesinger, M. (1998). Corpus-based interpreting studies as an offshoot of corpus-based translation studies. Meta, 43(4), 486493. https://doi.org/10.7202/004136ar.

    • Search Google Scholar
    • Export Citation
  • Stachowiak-Szymczak, K. (2019). Eye movements and gestures in simultaneous and consecutive interpreting. Springer.

  • Tavakoli, P., & Skehan, P. (2005). Strategic planning, task structure and performance testing. In Planning and task performance in a second language (pp. 239273). John Benjamins.

    • Search Google Scholar
    • Export Citation
  • Thompson, B., & Koehn, P. (2020). Exploiting sentence Order in document alignment (arXiv:2004.14523). arXiv. https://doi.org/10.48550/arXiv.2004.14523.

    • Search Google Scholar
    • Export Citation
  • Thormundsson, B. (2021). Speech-to-Text transcript accuracy rate among leading companies worldwide in 2021. https://www.statista.com/statistics/1133833/speech-to-text-transcript-accuracy-rate-among-leading-companies/.

    • Search Google Scholar
    • Export Citation
  • Tiedemann, J. (2011). Bitext alignment (1st ed.). Cham: Springer. https://doi.org/10.1007/978-3-031-02142-8.

  • Tiselius, E., & Sneed, K. (2020). Gaze and eye movement in dialogue interpreting: An eye-tracking study. Bilingualism: Language and Cognition, 23(4), 780787.

    • Search Google Scholar
    • Export Citation
  • Varga, D., Halácsy, P., Kornai, A., Nagy, V., Németh, L., & Trón, V. (2007). Parallel corpora for medium density languages. Amsterdam Studies In the Theory and History Of Linguistic Science Series, 4(292), 247.

    • Search Google Scholar
    • Export Citation
  • Vondřička, P. (2014). Aligning parallel texts with InterText. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 18751879).

    • Search Google Scholar
    • Export Citation
  • Wang, B. (2012). A descriptive study of norms in interpreting: Based on the Chinese-English consecutive interpreting corpus of Chinese premier press conferences. Meta, 57(1), 198212. https://doi.org/10.7202/1012749ar.

    • Search Google Scholar
    • Export Citation
  • Wang, B., & Li, T. (2015). An empirical study of pauses in Chinese-English simultaneous interpreting. Perspectives, 23(1), 124142. https://doi.org/10.1080/0907676X.2014.948885.

    • Search Google Scholar
    • Export Citation
  • Wang, X., & Wang, B. (2022). Identifying fluency parameters for a machine-learning-based automated interpreting assessment system. Perspectives, 117. https://doi.org/10.1080/0907676X.2022.2133618.

    • Search Google Scholar
    • Export Citation
  • Wang, B., & Zou, B. (2018). Exploring language specificity as a variable in Chinese-English interpreting. A corpus-based investigation. In Making way in corpus-based interpreting studies (pp. 6582). Springer.

    • Search Google Scholar
    • Export Citation
  • Yang, L. (2018). Effects of three tasks on interpreting fluency. The Interpreter and Translator Trainer, 12(4), 423443. https://doi.org/10.1080/1750399X.2018.1540211.

    • Search Google Scholar
    • Export Citation
  • Collapse
  • Expand

Editor-in-Chief: Krisztina KÁROLY (Eötvös Loránd University, Hungary)

Consulting Editor: Dániel MÁNY  (Semmelweis University, Hungary)

Managing Editor: Réka ESZENYI (Eötvös Loránd University, Hungary)

Founding Editor-in-Chief: Kinga KLAUDY (Eötvös Loránd University, Hungary)

EDITORIAL BOARD

  • Andrew CHESTERMAN (University of Helsinki, Finland)
  • Kirsten MALMKJÆR (University of Leicester, UK)
  • Christiane NORD (University of Free State, Bloemfontein, South Africa)
  • Anthony PYM (Universitat Rovira i Virgili, Tarragona, Spain, University of Melbourne, Australia)
  • Mary SNELL-HORNBY (University of Vienna, Austria)
  • Sonja TIRKKONEN-CONDIT (University of Eastern Finland, Joensuu, Finland)

ADVISORY BOARD

  • Mona BAKER (Shanghai International Studies University, China, University of Oslo, Norway)
  • Łucja BIEL (University of Warsaw, Poland)
  • Gloria CORPAS PASTOR (University of Malaga, Spain; University of Wolverhampton, UK)
  • Rodica DIMITRIU (Universitatea „Alexandru Ioan Cuza” Iasi, Romania)
  • Birgitta Englund DIMITROVA (Stockholm University, Sweden)
  • Sylvia KALINA (Cologne Technical University, Germany)
  • Haidee KOTZE (Utrecht University, The Netherlands)
  • Sara LAVIOSA (Università degli Studi di Bari Aldo Moro, Italy)
  • Brian MOSSOP (York University, Toronto, Canada)
  • Orero PILAR (Universidad Autónoma de Barcelona, Spain)
  • Gábor PRÓSZÉKY (Hungarian Research Institute for Linguistics, Hungary)
  • Alessandra RICCARDI (University of Trieste, Italy)
  • Edina ROBIN (Eötvös Loránd University, Hungary)
  • Myriam SALAMA-CARR (University of Manchester, UK)
  • Mohammad Saleh SANATIFAR (independent researcher, Iran)
  • Sanjun SUN (Beijing Foreign Studies University, China)
  • Anikó SOHÁR (Pázmány Péter Catholic University,  Hungary)
  • Sonia VANDEPITTE (University of Gent, Belgium)
  • Albert VERMES (Eszterházy Károly University, Hungary)
  • Yifan ZHU (Shanghai Jiao Tong Univeristy, China)

Prof. Dr. Krisztina KÁROLY 
School of English and American Studies, Eötvös Loránd University
H-1088 Budapest, Rákóczi út 5., Hungary 
E-mail: 

  • WoS Arts & Humanities Citation Index
  • Wos Social Sciences Citation Index
  • WoS Essential Science Indicators
  • Scopus
  • Linguistics Abstracts
  • Linguistics and Language Behaviour Abstracts
  • Translation Studies Abstractst
  • CABELLS Journalytics

2023  
Web of Science  
Journal Impact Factor 1.0
Rank by Impact Factor Q2 (Linguistics)
Journal Citation Indicator 0.76
Scopus  
CiteScore 1.7
CiteScore rank Q1 (Language and Linguistics)
SNIP 1.223
Scimago  
SJR index 0.671
SJR Q rank Q1

Across Languages and Cultures
Publication Model Hybrid
Submission Fee

none

Article Processing Charge 900 EUR/article
Printed Color Illustrations 40 EUR (or 10 000 HUF) + VAT / piece
Regional discounts on country of the funding agency World Bank Lower-middle-income economies: 50%
World Bank Low-income economies: 100%
Further Discounts Editorial Board / Advisory Board members: 50%
Corresponding authors, affiliated to an EISZ member institution subscribing to the journal package of Akadémiai Kiadó: 100%
Subscription fee 2025 Online subsscription: 362 EUR / 398 USD
Print + online subscription: 420 EUR / 462 USD
Subscription Information Online subscribers are entitled access to all back issues published by Akadémiai Kiadó for each title for the duration of the subscription, as well as Online First content for the subscribed content.
Purchase per Title Individual articles are sold on the displayed price.

Across Languages and Cultures
Language English
Size B5
Year of
Foundation
1999
Volumes
per Year
1
Issues
per Year
2
Founder Akadémiai Kiadó
Founder's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
Publisher Akadémiai Kiadó
Publisher's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
Responsible
Publisher
Chief Executive Officer, Akadémiai Kiadó
ISSN 1585-1923 (Print)
ISSN 1588-2519 (Online)

Monthly Content Usage

Abstract Views Full Text Views PDF Downloads
Aug 2024 98 3 6
Sep 2024 99 7 9
Oct 2024 256 8 11
Nov 2024 140 3 3
Dec 2024 71 1 1
Jan 2025 43 6 7
Feb 2025 0 0 0