Rethinking AI Evaluation through TEACH-AI: A Human-Centered Benchmark and Toolkit for Evaluating AI Assistants in Education
Keywords: Generative AI Assistant, AI evaluation, Benchmark framework, Toolkit
TL;DR: The paper introduces the TEACH-AI framework and a practical toolkit for guiding the evaluation of AI assistants.
Abstract: As generative artificial intelligence (AI) continues to transform education, most existing AI benchmarks focus primarily on technical metrics (e.g., speed, accuracy) while overlooking human identity, agency, and ethical considerations. In this paper, we present TEACH-AI (Trustworthy and Effective AI Classroom Heuristics)—a domain-independent, pedagogically grounded, and stakeholder-aligned benchmark framework with measurable indicators and a practical toolkit to guide the design, development, and evaluation of generative AI systems in educational contexts. Built on an extensive literature review and synthesis, the ten-component assessment framework and toolkit checklist provide a foundation for scalable and value-aligned AI evaluation in education. The framework rethinks “evaluation” through sociotechnical, educational, theoretical, and applied lenses, engaging designers, developers, researchers, and policymakers across AI and education. Our work invites the community to reconsider what constitutes “effective” AI in education and to design model evaluations that promote co-creation, inclusivity, and long-term human, social, and educational impact.
Submission Number: 23
Loading