Keywords: AI Tutors, Preference Optimization, Simulated Students, Educational AI, Learner-Centric Alignment
Abstract: Recent AI tutors are powered by large language models and their alignment are largely human-centric; relying on scarce, costly, and ethically constrained human preference data. Moreover, such alignment optimizes surface-level response quality and fails to reflect learner diversity. In this position paper, we identify challenges in current alignment methods and propose a framework that offers a scalable alternative to static human feedback by generating preference signals through interactions with diverse simulated students. This reframes alignment as a learner-conditioned optimization problem, enabling tutor policies to optimize for understanding, engagement, and productive struggle rather than surface-level response quality. The framework is compatible with modern preference-learning methods such as DPO, IPO, and KTO. Finally, a small scale study on algebra tutoring demonstrates that preference distributions vary systematically across learner profiles, highlighting the importance of learner-aware alignment while directly addressing data scarcity and scalability challenges in AI tutor alignment.
Paper Type: Long
Research Area: Human-AI Interaction/Cooperation and Human-Centric NLP
Research Area Keywords: human-in-the-loop, user-centered design
Contribution Types: Position papers
Languages Studied: English
Submission Number: 7017
Loading