From Rubrics to Rewards: Aligning Question Generation for Early Literacy

Yi-Jyun Sun; Sumuk Shashidhar; Dilek Hakkani-Tür; Xiaocheng Yang

From Rubrics to Rewards: Aligning Question Generation for Early Literacy

Yi-Jyun Sun, Sumuk Shashidhar, Dilek Hakkani-Tür, Xiaocheng Yang

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: AI4Edu, RL, DPO, Reward Model, Early Literacy, LLM, Question Generation

Abstract: Effective reading comprehension assessment is a cornerstone of early childhood literacy development, providing the necessary scaffolding for young learners to transition from decoding text to deep conceptual understanding. While Large Language Models (LLMs) offer scalable solutions for automated question generation, they frequently struggle to satisfy the specific pedagogical constraints required for emerging readers. This work proposes an extension to the research presented in Yang et al. (2025) by introducing a multi-stage alignment pipeline designed to ensure generated content is both pedagogically sound and developmentally appropriate. Our proposed methodology introduces Rubric Injection, a technique that explicitly embeds expert-defined literacy standards into the system prompt to ground the model’s initial outputs in established educational criteria. To bridge the gap between generic model outputs and specialized classroom needs, we propose an expert-in-the-loop alignment process. In this framework, domain experts from a literacy team provide qualitative rubric-based scoring and pairwise preference rankings on model-generated questions. Unlike standard alignment approaches that utilize preference data solely for direct fine-tuning, we propose leveraging this expert feedback to train a Question Quality Reward Model. This reward model is designed to assign objective scores to generated questions, serving as a scalable proxy for human pedagogical judgment. These scores subsequently facilitate Direct Preference Optimization (DPO) to iteratively refine the LLM. By centering the alignment process on expert-derived pedagogical rewards, this work aims to provide a robust framework for generating high-quality, accessible assessment tools that better support the unique needs of early learners.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 168

Loading