Rethinking NLP Evaluation for LLM-Based Assistants in Education Through a Human-Centered Evaluation Framework

Rethinking NLP Evaluation for LLM-Based Assistants in Education Through a Human-Centered Evaluation Framework

ACL ARR 2026 January Submission8294 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: NLP Evaluation, Large Language Models (LLMs), Human-Centered AI Evaluation

Abstract: Large language models (LLMs) are increasingly adopted as educational assistants and feedback providers; however, dominant NLP evaluation practices continue to emphasize technical metrics such as accuracy, fluency, and automated judgment. We argue that these approaches are misaligned with educational contexts because they overlook human values, learner agency, and pedagogical goals. This paper presents an argument-driven critique of NLP evaluation practices as applied to educational LLMs and introduces a literature-informed, human-centered, and sociotechnical evaluation framework with guiding questions to support its use. We highlight criteria such as explainability, consistency, and refinement, which are critical to human-AI interactive instructional effectiveness but absent from existing NLP benchmarks, and argue for more human-aligned, co-creative, and long-term evaluation practices for educational LLM systems.

Paper Type: Short

Research Area: Human-AI Interaction/Cooperation and Human-Centric NLP

Research Area Keywords: NLP Evaluation, Large Language Models (LLMs), Human-Centered AI, Educational AI

Contribution Types: Position papers

Languages Studied: English

Submission Number: 8294

Loading