KCVR: Knowledge-Centric Video Reconstruction for Structured Pedagogical Summarization via Dynamic Graph Planning

ACL ARR 2026 January Submission6555 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multimodal summarization;Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning;Generation;knowledge graphs
Abstract: Existing video summarization methods mainly compress content for gist browsing, but they often break the prerequisite logic in instructional videos and induce logical inversions (e.g., conclusions before premises). We formalize this problem as Structure-Pedagogical Reconstruction (\textbf{SPR}). SPR raises two challenges: (1) \textbf{Structure Hallucination}, where retrieved knowledge is topologically valid but not evidence-grounded by the blackboard; and (2) \textbf{Logical Inversion}, where soft prompt-level graph injection fails to enforce prerequisite order during decoding. To address these challenges, we propose \textbf{K}nowledge-\textbf{C}entric \textbf{V}ideo \textbf{R}econstruction \textbf{(KCVR)}, a Plan-then-Generate neuro-symbolic framework that decouples epistemic planning from content generation. KCVR prunes a Dual-Layer Epistemic Graph into a minimal video-supported plan, then realizes the plan with visually anchored attention and topology-constrained decoding. We additionally release \textbf{EduStruct}, a 10-discipline benchmark for SPR and structure-centric evaluation. Experiments show that KCVR outperforms strong end-to-end baselines on Knowledge Progression Consistency and Learning Objective Coverage. Our code and data are available at \url{https://anonymous.4open.science/r/video_sum-474D/}.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: multimodal summarization, video processing, neurosymbolic reasoning, knowledge graphs, educational applications, benchmarking, structured prediction, factuality, knowledge-augmented methods
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, Chinese
Submission Number: 6555
Loading