A Study on the Performance of Large Language Models in Translating the Texts of Red Culture into English

Yiming Li; Yuanpeng Huang

doi:10.26689/jcer.v9i4.10369

Download PDF

Keywords

Red culture
Translation
AI translation evaluation
ChatGPT
ERNIE Bot
DeepSeek

DOI

10.26689/jcer.v9i4.10369

Submitted : 2025-03-30

Accepted : 2025-04-14

Published : 2025-04-29

Abstract

This paper takes Chinese red culture resources as its research subject and focuses on evaluating the Chinese-English translation quality of three major AI platforms: ChatGPT-4.0, ERNIE Bot, and DeepSeek. Through automatic quantitative evaluation, it systematically analyzes their performance in translating red culture texts. The study selects a diverse range of corpora, including historical documents, red classic texts, and culturally loaded terms. Three automatic evaluation metrics—GLEU, METEOR, and COMET—are employed for a comprehensive assessment.

References

Rivera-Trigueros I, 2022, Machine Translation Systems and Quality Assessment: A Systematic Review. Language Resources and Evaluation, 56(2): 593–619.

Han C, 2020, Translation Quality Assessment: A Critical Methodological Review. The Translator, 26(3): 257–273.

Lauscher S, 2014, Translation Quality Assessment: Where Can Theory and Practice Meet? Evaluation and Translation, Routledge, New York, 149–168.

Salvagno M, Taccone FS, Gerli AG, 2023, Can Artificial Intelligence Help for Scientific Writing? Critical Care, 27(1): 75.

Thorp H, 2023, ChatGPT Is Fun, But Not An Author. Science, 379(6630): 313.

Jiao W, Wang W, Huang JT, et al., 2023, Is ChatGPT a Good Translator? Yes with GPT-4 as the Engine. arXiv. https://doi.org/10.48550/arXiv.2301.08745

Ghassemiazghandi M, 2024, An Evaluation of ChatGPT’s Translation Accuracy Using BLEU Score. Theory and Practice in Language Studies, 14(4): 985–994.

Nemergut M, 2024, Machine Translation Quality Based on TER Analysis from English into Slovak. L10N Journal, 3(2): 60–86.

Wang D, Lin L, Zhao Z, et al., 2023, EvaHan2023: Overview of the First International Ancient Chinese Translation Bakeoff, Proceedings of ALT2023: Ancient Language Translation Workshop, 1–14.

Ali JKM, 2023, Benefits and Challenges of Using ChatGPT: An Exploratory Study on English Language Program. University of Bisha Journal for Humanities, 2(2): 629–641.

Sahari Y, Al-Kadi AMT, Ali JKM, 2023, A Cross-Sectional Study of ChatGPT in Translation: Magnitude of Use, Attitudes, and Uncertainties. Journal of Psycholinguistic Research, 52(6): 2937–2954.

Wang J, Wen Q, 2010, A Review of Automatic Scoring Systems at Home and Abroad and the Enlightenment for Chinese Students. Foreign Languages, 2010(1): 75–81.

Akhtarshenas A, Dini A, Ayoobi N, 2025, ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2503.17403

Guo D, Zhu Q, Yang D, et al., 2024, DeepSeek-Coder: When the Large Language Model Meets Programming—The Rise of Code Intelligence. ArXiv. https://doi.org/10.48550/arXiv.2401.14196

Lu H, Liu W, Zhang B, et al., 2024, Deepseek-VL: Towards Real-World Vision-Language Understanding. arXiv. https://doi.org/10.48550/arXiv.2403.05525

Wang A, Singh A, Michael J, et al., 2018, GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv. https://doi.org/10.48550/arXiv.1804.07461

Banerjee S, Lavie A, 2005, METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments, Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65–72.

Rei R, Stewart C, Farinha AC, et al., 2020, COMET: A Neural Framework for MT Evaluation. arXiv. https://doi.org/10.48550/arXiv.2009.09025

Graham Y, Baldwin T, Moffat A, et al., 2013, Continuous Measurement Scales in Human Evaluation of Machine Translation, Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 33–41.

Snover M, Dorr B, Schwartz R, et al., 2006, A Study of Translation Edit Rate with Targeted Human Annotation, Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (ACL), Cambridge, 223–231.

Lommel A, Uszkoreit H, Burchardt A, 2014, Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Revista Tradumatica: Tecnologies de la Traduccio, 12: 455–463.