Optimal Adaptive Difficulty Calibration via Contextual Bandits with Information-Theoretic Regret Bounds

Optimal Adaptive Difficulty Calibration via Contextual Bandits with Information-Theoretic Regret Bounds

07 Mar 2026 (modified: 07 Mar 2026)MathAI 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Contextual bandits, intelligent tutoring systems, adaptive learning, Zone of Proximal Development, Bayesian knowledge tracing, ASSISTments, Junyi Academy, regret bounds

TL;DR: InfoTutor achieves $O(\sqrt{KT\log T})$ regret for difficulty calibration; ZPD converges at $O(1/\sqrt{T})$; 12--18\% better post-test vs baselines.

Abstract: Intelligent tutoring systems must dynamically adjust task difficulty to optimize student learning outcomes. This problem naturally frames as a contextual multi-armed bandit where the tutor selects from $K$ difficulty levels at each interaction based on evolving estimates of student knowledge. We derive fundamental information-theoretic lower bounds on the regret for any difficulty calibration policy, showing that $\Omega(\sqrt{KT})$ regret is unavoidable. We propose \texttt{InfoTutor}, a principled algorithm that achieves $O(\sqrt{KT\log T})$ regret by leveraging Bayesian knowledge tracing to construct feature representations of student states. A key theoretical contribution is proving that incorporating knowledge state estimation reduces the effective problem dimensionality, enabling convergence of the estimated Zone of Proximal Development (ZPD) to its true value at rate $O(1/\sqrt{T})$. Empirical validation on the ASSISTments and Junyi Academy datasets demonstrates that \texttt{InfoTutor} outperforms baselines by 12-18\% in post-test performance while maintaining computational efficiency. Our framework bridges bandit theory with learning science principles, providing both theoretical guarantees and practical improvements for adaptive educational technology.

Submission Number: 156

Loading