Concept Extraction and Webb’s Depth of Knowledge: Comparing LLM Question Generation Pipelines for Educational Assessment

Dmitriy An; Andrew Paice; Petra Müller-Csernetzky; Aliaksei Andrushevich

Concept Extraction and Webb’s Depth of Knowledge: Comparing LLM Question Generation Pipelines for Educational Assessment

Dmitriy An, Andrew Paice, Petra Müller-Csernetzky, Aliaksei Andrushevich

11 Mar 2026 (modified: 19 May 2026)SwissText 2026 Conference SubmissionEveryoneRevisionsCC BY 4.0

Track: Scientific Track

Keywords: Large Language Models (LLMs), Automated Exercise Generation, Webb's Depth of Knowledge (DOK), Concept Extraction, Educational Assessment, Higher Education, RAG (Retrieval-Augmented Generation)

TL;DR: This study identifies that combining Concept Extraction with Webb’s Depth of Knowledge creates the most effective LLM pipeline for generating high-quality, complex academic exercises from unstructured course materials.

Abstract: This study compares LLM pipelines for automated exercise generation in higher education. We empirically compare two context preparation methods (Sliding Window vs. Concept Extraction) in combination with two instructional frameworks (Bloom’s Revised Taxonomy vs. Webb’s Depth of Knowledge). Through a mixed-methods evaluation with 21 university course coordinators, we find that Concept Extraction combined with Webb’s Depth of Knowledge yields the highest pedagogical quality, especially for technical disciplines. While human oversight remains necessary to mitigate out-of-scope hallucinations, these pipelines serve as efficient drafting engines for scalable, high-quality academic assessments.

Submission Number: 7

Loading