With the advancement of machine translation, post-editing (PE) has become a dominant workflow, making the accurate assessment of post-machine translation editing competence (PMTE) critical. However, performance-based PMTE assessments are vulnerable to subjective rater effects, compromising their validity. This study employs the Many-Facet Rasch model (MFRM) to conduct an in-depth analysis of rater-criterion interaction, a complex bias in PMTE evaluation. The research involved 144 examinees translating a scientific text and four expert raters assessing the outputs using four criteria: Logical relations, completeness, terminology, and fluency. The MFRM analysis successfully calibrated examinee ability, demonstrating high reliability (.86), and revealed significant variations in rater severity and criterion difficulty. Critically, the analysis identified specific quality control issues, including inconsistent scoring by one rater and ambiguity in the “Terminology” criterion. The bias analysis uncovered significant rater-criterion interactions. These findings demonstrate that MFRM is a powerful diagnostic tool that transforms assessment from a purely evaluative act into a mechanism for data-driven improvement. It provides objective, actionable evidence for refining scoring rubrics and conducting targeted rater training, thereby enhancing the fairness and validity of PMTE assessment.
Lommel A, Melby AK, 2018, MQM-DQF: A Good Marriage (Translation Quality for the 21st Century). In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, 21–30.
Robert IS, Schrijver I, Ureel JJ, 2022, Measuring Translation Revision Competence and Post-editing Competence in Translation Trainees: Methodological Issues. The Interpreter and Translator Trainer, 16(3): 329–349.
Myford CM, Wolfe EW, 2003, Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. Journal of Applied Measurement, 4(4): 386–422.
Boone WJ, 2016, Rasch Analysis for Instrument Development: Why, When, and How. CBE—Life Sciences Education, 15(4): ar54.
Nitzke J, 2019, Risk Management and Post-editing Competence. The Journal of Specialised Translation, 2019(31): 126–146.
Myford CM, Wolfe EW, 2004, Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement, 5(2): 189–227.
Eskin D, 2023, A Many-Facets Rasch Analysis of Facet Main Effects and Interactions in a Writing Assessment. Journal of Applied Measurement, 24(1): 1–16.
Dismukes RK, 2000, Multifacet Rasch Analysis of Rater Training Effects on Scoring Performance. American Institutes for Research.
Linacre JM, 1989, Many-facet Rasch Measurement. MESA Press.
Eckes T, 2019, Many-facet Rasch Measurement: Implications for Rater-mediated Language Assessment. In Quantitative Data Analysis for Language Assessment Volume II. Routledge, London, 122–145.
Han C, 2015, Investigating Rater Severity/Leniency in Interpreter Performance Testing: A Multifaceted Rasch Measurement Approach. Interpreting, 17(2): 225–251.
Tseng WT, Su ZY, Nix JML, 2019, Validating Translation Test Items via the Many-facet Rasch Model. Psychological Reports, 122(2): 748–772.
House J, 2015, Translation Quality Assessment: Past and Present. Routledge, London.