AutoQG: An Automated Framework for Evidence-Traceable Question Generation via Ontology-Guided Knowledge Graph Construction

18 Sept 2025 (modified: 05 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Graph Construction, Question Generation, Multi-Agent Systems
TL;DR: We propose AutoQG, a multi-agent framework for evidence-traceable QA generation, introducing AutoQG-20k benchmark with 20k QA pairs and AutoQG-7k subset, outperforming baselines on both human and LLM-as-a-judge evaluation.
Abstract: Large Language Models (LLMs) present unprecedented opportunities for generating scientific questions. However, existing approaches face two key limitations: heavy reliance on costly human annotations and the production of brittle, unverifiable outputs. To address these challenges, we propose AutoQG, a fully automated multi-agent framework for evidence-grounded scientific QA generation. AutoQG comprises three complementary agents: (i) KG Extraction Agent, which performs ontology-guided knowledge graph construction with section-aware prompts for precise information retrieval; (ii) KG Evaluation Agent, a multi-dimensional evaluation module with iterative refinement to ensure accuracy and consistency; and (iii) QA Generation Agent, which produces schema-constrained QA pairs grounded in reasoning paths and explicit textual evidence. Applied to over 4,000 scientific papers, AutoQG constructs 243k triples and introduces AutoQG-20k, a benchmark containing more than 20,000 QA pairs. Each pair is explicitly linked to its reasoning chains and supporting evidence, ensuring transparency and verifiability. We further release AutoQG-7k, a challenging subset designed with hard questions that strong LLMs struggle to answer. Extensive experiments demonstrate that AutoQG consistently outperforms strong baselines in both human evaluation and LLM-as-a-Judge assessments. By transforming LLM output into a controlled and auditable pipeline, AutoQG advances evidence-based AI for the understanding of reliable scientific knowledge. Source code will be released upon publication.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 13640
Loading