Translation assessment represents a productive line of research in Translation Studies. An array of methods has been trialled to assess translation quality, ranging from intuitive assessment to error analysis and from rubric scoring to item-based assessment. In this article, we introduce a lesser-known approach to translation assessment called comparative judgement. Rooted in psychophysical analysis, comparative judgement grounds itself on the assumption that humans tend to be more accurate in making relative judgements than in making absolute judgements. We conducted an experiment, as both a methodological exploration and a feasibility investigation, in which novice and experienced judges were recruited to assess English-Chinese translation, using a computerised comparative judgement platform. The collected data were analysed to shed light on the validity and reliability of assessment results and the judges’ perceptions. Our analysis shows that (1) overall, comparative judgement produced valid measures and facilitated judgement reliability, although such results seemed to be affected by translation directionality and judges’ experience, and (2) the judges were generally confident about their decisions, despite some emergent factors undermining the validity of their decision making. Finally, we discuss the use of comparative judgement as a possible method in translation assessment and its implications for future practice and research.
Andrich, D. (1978). Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement, 2(3), 451–462.
Angelelli, V. C. (2009). Using a rubric to assess translation ability: Defining the construct. In C. V. Angelelli , & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies: A call for dialogue between research and practice (pp. 13–47). John Benjamins.
Bowker, L. (2000). A corpus-based approach to evaluating student translations. The Translator, 6(2), 183–210.
Bradley, A. R. , & Terry, E. T. (1952). Rank analysis of incomplete block designs: The method of paired comparisons. Biometrika, 39(3/4), 324–345.
Bramley, T. (2007). Paired comparison methods. In N. Paul , J. Baird , H. Goldstein , H. Patrick , & P. Tymms (Eds.), Techniques for monitoring the comparability of examination standards (pp. 246–300). Qualifications and Curriculum Authority.
Colina, S. (2008). Translation quality evaluation: Some empirical evidence for a functionalist approach. The Translator, 14(1), 97–134.
Colina, S. (2009). Further evidence for a functionalist approach to translation quality evaluation. Target, 21(2), 215–244.
De Sutter, G. , Cappelle, B. , De Clercq, O. , Loock, R. , & Plevoets, K. (2017). Towards a corpus-based, statistical approach of translation quality. Linguistica Antverpiensia, New Series: Themes in Translation Studies, 16, 25–39.
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Peter Lang.
Eyckmans, J. , & Anckaert, P. (2017). Item-based assessment of translation competence: Chimera of objectivity versus prospect of reliable measurement. Linguistica Antverpiensia, New Series: Themes in Translation Studies, 16, 40–56.
Eyckmans, J. , Anckaert, P. , & Segers, W. (2009). The perks of norm- referenced translation evaluation. In C. V. Angelelli , & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies: A call for dialogue between research and practice (pp. 73–93). John Benjamins.
Garant, M. (2009). A case for translation holistic assessment. AFinLA-e Soveltavan kielitieteen tutkimuksia, 1, 5–17.
Gijsen, M. , van Daal, T. , Lesterhuis, M. , Gijbels, D. , & De Maeyer, S. (2021). The complexity of comparative judgments in assessing argumentative writing: An eye tracking study. Frontiers in Education, 5, Article 582800.
Gouadec, D. (1981). Paramètres de l’evaluation des traductions. Meta, 26(2), 99–116.
Han, C. (2020). Translation quality assessment: a critical methodological review. The Translator, 26(3), 257–273.
Han, C. (2022). Assessing spoken-language interpreting: The method of comparative judgement. Interpreting, 24(1), 59–83.
Han, C. (2021). Analytic rubric scoring versus comparative judgment: A comparison of two approaches to assessing spoken-language interpreting. Meta, 66(2), 239–504.
Han, C. , Shang, X. , (Forthcoming). An item-based, Rasch-calibrated approach to assessing translation quality. Target. Submitted for publication.
Han, C. , & Xiao, X. (2021). A comparative judgment approach to assessing Chinese Sign Language interpreting. Language Testing. Advance online publication.
Hatim, B. , & Mason, I. (1997). The translator as communicator. Routledge.
House, J. (2014). Translation quality assessment: Past and present. In J. House (Ed.), Translation: A multidisciplinary approach. Palgrave advances in language and linguistics (pp. 241–264). Palgrave Macmillan.
Jones, I. , & Inglis, M. (2015). The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics, 89(3), 337–355.
Jones, I. , Swan, M. , & Pollitt, A. (2015). Assessing mathematical problem solving using comparative judgement. International Journal of Science and Mathematics Education, 13(1), 151–177.
Kockaert, J. H. , & Segers, W. (2014). Evaluation de la traduction: La méthode PIE (Preselected Items Evaluation). Turjuman, 23(2), 232–250.
Kockaert, J. H. , & Segers, W. (2017). Evaluation of legal translations: PIE method (Preselected Items Evaluation). Journal of Specialised Translation, (27), 148–163.
Luce, R. D. (1959). Individual choice behavior. Wiley.
Martínez Mateo, R. (2014). A deeper look into metrics for Translation Quality Assessment (TQA): A case study. Miscelanea, (49), 73–93.
McAlester, G. (2000). The evaluation of translation into a foreign language. In C. Schäffner , & B. Adab (Eds.), Developing translation competence (pp. 229–241). John Benjamins.
McMahon, S. , & Jones, I. (2015). A comparative judgement approach to teacher assessment. Assessment in Education: Principles, Policy & Practice, 22(3), 368–389.
Pym, A. (1992). Translation error analysis and the interface with language teaching. In C. Dollerup , & A. Loddegaard (Eds.), Teaching translation and interpreting: Training talent and experience (pp. 279–288). John Benjamins.
QSR International Pty Ltd . (2015) NVivo (Version 11). [Computer software]. https://www.qsrinternational.com/nvivo-qualitative-data-analysis-software/home.
Sager, J. (1989). Quality and standards: The evaluation of translations. In C. Picken (Ed.), The translator’s handbook (pp. 91–102). ASLIB.
Secară, A. (2005). Translation evaluation—A state of the art survey. In Proceeding of the eCoLoRe/MeLLANGE Workshop: Resources and tools for e-Learning in translation and localisation (pp. 39–44). St. Jerome.
Steedle, T. J. , & Ferrara, S. (2016). Evaluating comparative judgement as an approach to essay scoring. Applied Measurement in Education, 29(3), 211–223.
Thurstone, L. L. (1927). A law of comparative judgement. Psychological Review, 34(4), 273–286.
Thurstone, L. L. (1959). The measurement of values. Chicago: The University of Chicago Press.
Turner, B. , Miranda, L. , & Huang, N. (2010). Error deduction and descriptors – A comparison of two methods of translation test assessment. Translation & Interpreting, 2(1), 11–23.
Van Daal, T. , Lesterhuis, M. , Coertjens, L. , Donche, V. , & De Maeyer, S. (2019). Validity of comparative judgement to assess academic writing: Examining implications of its holistic character and building on a shared consensus. Assessment in Education: Principles, Policy & Practice, 26(1), 59–74.
Verhavert, S. , Bouwer, R. , Donche, V. , & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy and Practice, 26(5), 541–562.
Verhavert, S. , De Maeyer, S. , Donche, V. , & Coertjens, L. (2018). Scale separation reliability: What does it mean in the context of comparative judgement? Applied Psychological Measurement, 42(6), 428–445.
Waddington, C. (2001a). Different methods of evaluating student translations: The question of validity. Meta, 46(2), 311–325.
Waddington, C. (2001b). Should translations be assessed holistically or through error analysis? Hermes, 14(26), 15–38.
Williams, M. (1989). The assessment of professional translation quality: Creating credibility out of chaos. Traduction, Terminologie, Redaction, 2(2), 13–33.
Williams, M. (2001). The application of argumentation theory to translation quality assessment. Meta, 46(2), 327–344.
Wu, S. (2010). Assessing simultaneous interpreting: A study on test reliability and examiners’ assessment behaviour. Doctoral dissertation, Newcastle University.