ExploraTutor: A Dataset for Children’s Exploratory Dialogue by Integrating Multiple Educational theories
Keywords: child-centric dialogue system; supervised fine-tuning; dataset
Abstract: Large Language Models (LLMs) often lack pedagogical intelligence for long-horizon, multi-turn interactions. This paper introduces an effective "Theory→Practice→Data→Model" pathway to address this challenge, focusing on guiding children's deep exploration. We distill implicit pedagogical knowledge from child-adult dialogues and abstract it into a systematic annotation framework. Leveraging this framework, we constructed the ExploraTutor dataset (2,045 high-quality dialogues, 17,682 Q\&A pairs) through a dual-pathway approach of real data augmentation and theory-guided synthesis. Experiments on mainstream models show that fine-tuned models significantly outperform baselines in heuristic guidance and cognitive adaptability. This process successfully internalizes educational principles as core model capabilities, transforming LLMs from knowledge-answerers into cognitive facilitators, thereby mitigating the "loss of alignment" in multi-turn interactions.
Supplementary Material: zip
Submission Number: 85
Loading