Abstract: Assessing and enhancing human learning through question-answering is vital, yet automating this process remains challenging. We propose Savaal, a scalable question-generation system using large language models (LLMs) with three objectives: (i) scalability, enabling question-generation from hundreds of pages of text (ii) depth of understanding, producing questions beyond factual recall to test conceptual reasoning, and (iii) domain-independence, automatically generating questions across diverse knowledge areas. Instead of providing an LLM with large documents as context, Savaal improves results with a three-stage processing pipeline.
Our evaluation with 76 human experts on 71 papers and PhD dissertations shows that Savaal generates questions that better test depth of understanding by 6.5$\times$ for dissertations and 1.5$\times$ for papers compared to a direct-prompting LLM baseline. Notably, as document length increases, Savaal's advantages in higher question quality and lower cost become more pronounced.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: educational applications, question generation, human evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 4724
Loading