Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in solving mathematical problems, yet their solutions often rely on knowledge beyond the cognitive level of target student groups, limiting their educational value. In this paper, we propose a novel training framework that enables LLMs to generate mathematically correct yet pedagogically appropriate solutions aligned with students’ grade-level knowledge. By integrating a hierarchical knowledge graph(HKG) annotated with textbook-aligned difficulty levels and designing a multi-turn dialogue-based reward function, we extend Controllable Text Generation (CTG) to control the knowledge difficulty of generated content. Our adaptive cognition reward mechanism evaluates solutions based on their alignment with target-grade knowledge, guiding model optimization through a customized Group Relative Policy Optimization (GRPO) algorithm. Experimental results on a stratified subset of the OpenR1-Math-220k dataset demonstrate that our approach effectively reduces knowledge difficulty in generated solutions while maintaining correctness, offering a significant step toward grade-aware and instruction-friendly educational AI.
Paper Type: Long
Research Area: Generation
Research Area Keywords: Generation, Machine Learning for NLP, Interpretability and Analysis of Models for NLP
Contribution Types: NLP engineering experiment, Data resources, Theory
Languages Studied: English, Chinese
Submission Number: 6363
Loading