ExploraTutor: A Dataset for Children’s Exploratory Dialogue by Integrating Multiple Educational theories

Siqi Xie; Yaxin Xu

ExploraTutor: A Dataset for Children’s Exploratory Dialogue by Integrating Multiple Educational theories

Siqi Xie, Yaxin Xu

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY-ND 4.0

Keywords: child-centric dialogue system; supervised fine-tuning; dataset

Abstract: Large Language Models (LLMs) often lack pedagogical intelligence for long-horizon, multi-turn interactions. This paper introduces an effective "Theory→Practice→Data→Model" pathway to address this challenge, focusing on guiding children's deep exploration. We distill implicit pedagogical knowledge from child-adult dialogues and abstract it into a systematic annotation framework. Leveraging this framework, we constructed the ExploraTutor dataset (2,045 high-quality dialogues, 17,682 Q\&A pairs) through a dual-pathway approach of real data augmentation and theory-guided synthesis. Experiments on mainstream models show that fine-tuned models significantly outperform baselines in heuristic guidance and cognitive adaptability. This process successfully internalizes educational principles as core model capabilities, transforming LLMs from knowledge-answerers into cognitive facilitators, thereby mitigating the "loss of alignment" in multi-turn interactions.

Supplementary Material: zip

Submission Number: 85

Loading