Artificial intelligence (AI) is becoming routine infrastructure in higher vocational colleges, while labor education increasingly serves holistic talent cultivation through authentic tasks and practicum experiences. However, evaluating labor education is challenging because intended outcomes combine competence and occupational dispositions, and evidence is heterogeneous, process-oriented, and distributed across sites. This paper reframes labor education evaluation as a system design problem rather than a single-instrument measurement task. It proposes a layered conceptual model that separates constructs, evidence, analytics, decision procedures, and governance, and provides a minimal evidence taxonomy to support triangulated and longitudinal interpretation. Building on this foundation, the paper consolidates eight design principles and maps them to implementable system requirements and assurance considerations, emphasizing authentic-task anchoring, stake-sensitive human-in-the-loop decisions, traceability and explainability proportional to stakes, context-aware fairness, purpose-limited data practices, contestability, and continuous monitoring. The framework offers actionable guidance for designing AI-enhanced labor education evaluation systems and identifies directions for further research on construct operationalization, evidence integration, and governance-by-design.
Zawacki-Richter O, Marín VI, Bond M, et al., 2019, Systematic Review of Research on Artificial Intelligence Applications in Higher Education–Where Are the Educators? International Journal of Educational Technology in Higher Education, 16(1): 1–27.
Liu J, Zhao W, Gao G, 2022, Exploration of Practical Strategies of School Labor Education under The Information Environment. Advances in Multimedia, 2022(1): 8025036.
Xin Y, Corpuz GV, 2024, The Challenge of Labor Education in Higher Vocational Colleges. Frontiers in Educational Research, 7(3).
Chi Y, 2025, AI-Enhanced Blended Learning Models: Testing and Evaluating an Innovative Educational Framework, Proceedings of the 2025 International Conference on Digital Technology and Educational Psychology (DTEP 2025), 15.
Billett S, 2013, Learning Through Practice: Beyond Informal and Towards a Framework for Learning Through Practice. Revisiting Global Trends in TVET: Reflections on Theory and Practice, 123.
Smith R, Betts M, 2000, Learning as Partners: Realising The Potential Of Work-Based Learning. Journal of Vocational Education and Training, 52(4): 589–604.
Zhang S, Li X, Zhang C, et al., 2023, Measurement of Factor Mismatch in Industrial Enterprises with Labor Skills Heterogeneity. Journal of Business Research, 158: 113643.
Wu S, Duan J, Luo M, 2024, Evaluating and Analyzing Student Labor Literacy in China’s Higher Vocational Education: An Assessment Model Approach. Frontiers in Education, 2024: 1361224.
Messick S, 1995, Validity of Psychological Assessment: Validation of Inferences from Persons’ Responses and Performances as Scientific Inquiry into Score Meaning. American Psychologist, 50(9): 741.
Kane MT, 2013, Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1): 1–73.
Schoepp K, Tezcan-Unal B, 2017, Examining the Effectiveness of a Learning Outcomes Assessment Program: A Four Frames Perspective. Innovative Higher Education, 42(4): 305–319.
Vantassel-Baska J, 2021, Using Performance-Based Assessment to Document Authentic Learning, Alternative Assessments with Gifted and Talented Students, Routledge, 285–308.
Kaphle P, 2025, Teachers’ Perceptions and Practices of Portfolio-Based Assessment—A Narrative Inquiry, dissertation, Kathmandu University School of Education.
Darvishi A, Khosravi H, Sadiq S, et al., 2022, Incorporating AI and Learning Analytics to Build Trustworthy Peer Assessment Systems. British Journal of Educational Technology, 53(4): 844–875.
Matta SS, Bolli M. 2023, Trustworthy AI: Explainability & Fairness in Large-Scale Decision Systems. Review of Applied Science and Technology, 2(04): 54–93.
Pasquale F, 2019, Professional Judgment in an Era of Artificial Intelligence and Machine Learning. Boundary 2: An International Journal of Literature and Culture, 46(1): 73–101.
Ferdaus MM, Abdelguerfi M, Loup E, et al., 2026, Towards Trustworthy AI: A Review Of Ethical and Robust Large Language Models. ACM Computing Surveys, 58(7): 1–43.