Beyond Human Preferences: Simulated-Student Preference Alignment for AI Tutors

Beyond Human Preferences: Simulated-Student Preference Alignment for AI Tutors

ACL ARR 2026 January Submission7017 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Tutors, Preference Optimization, Simulated Students, Educational AI, Learner-Centric Alignment

Abstract: Recent AI tutors are powered by large language models and their alignment are largely human-centric; relying on scarce, costly, and ethically constrained human preference data. Moreover, such alignment optimizes surface-level response quality and fails to reflect learner diversity. In this position paper, we identify challenges in current alignment methods and propose a framework that offers a scalable alternative to static human feedback by generating preference signals through interactions with diverse simulated students. This reframes alignment as a learner-conditioned optimization problem, enabling tutor policies to optimize for understanding, engagement, and productive struggle rather than surface-level response quality. The framework is compatible with modern preference-learning methods such as DPO, IPO, and KTO. Finally, a small scale study on algebra tutoring demonstrates that preference distributions vary systematically across learner profiles, highlighting the importance of learner-aware alignment while directly addressing data scarcity and scalability challenges in AI tutor alignment.

Paper Type: Long

Research Area: Human-AI Interaction/Cooperation and Human-Centric NLP

Research Area Keywords: human-in-the-loop, user-centered design

Contribution Types: Position papers

Languages Studied: English

Submission Number: 7017

Loading