Keywords: LLM Agents, Document-Grounded Generation, Educational Content Generation
Abstract: Knowing and teaching differ fundamentally: effective instruction requires transforming knowledge into forms learners can grasp. Large language models, when asked to generate lessons (a concrete form of teaching), produce content lacking pedagogical depth. We trace this failure to three decisions that expert teachers make: \textit{selecting} content by recognizing each source's instructional role, \textit{sequencing} topics so foundations precede applications, and \textit{synthesizing} components into a unified whole. To scaffold these decisions, we introduce \textbf{TeachCraft}, a framework with three agents: Explorer classifies sources by pedagogical intent to guide selection; Planner orders objectives from foundational to advanced; Generator produces lesson materials through a schema that ensures consistency across components. To evaluate this approach, we construct \textsc{LessonBench}, 40 expert-designed lessons paired with two to five heterogeneous source documents, on which TeachCraft achieves 67.8\% win rate in human evaluation and 79.6\% in LLM-based evaluation against eight baselines, with ablations confirming that each decision contributes independently to overall lesson quality.\footnote{Source code is available at \url{https://anonymous.4open.science/r/TeachCraft-1672}}
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: AI / LLM Agents, Generation, NLP Applications
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7160
Loading