SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems

ACL ARR 2026 March Submission1982 Authors

17 Mar 2026 (modified: 07 Jun 2026)ACL ARR 2026 March SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI tutor, Pedagogical Safety
Abstract: Large language models are rapidly being deployed as AI tutors, yet current evaluation paradigms assess problem-solving accuracy and generic safety in isolation - failing to capture whether a model is simultaneously pedagogically effective and safe across sustained interaction. We argue that tutoring safety is fundamentally different from conventional LLM safety: the primary risk is not toxic content but the quiet erosion of learning through answer over-disclosure, misconception reinforcement, and the abdication of scaffolding. To systematically study this failure mode, we introduce SafeTutors, a benchmark that jointly evaluates safety and pedagogy across mathematics, physics, and chemistry. SafeTutors is organized around a theoretically grounded risk taxonomy comprising 11 harm dimensions and 48 sub-risks drawn from learning-science literature. We uncover that all models show broad harm; scale doesn’t reliably help; and multi-turn dialogue worsens behavior, with pedagogical failures rising from 17.7\% to 77.8\%. Harms also vary by subject, so mitigations must be discipline-aware, and single-turn “safe/helpful” results can mask systematic tutor failure over extended interaction.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: AI tutor benchmark and evaluation, pedagogical safety
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 1982
Loading