LOGOS: LLM-Driven End-to-End Grounded Theory Development and Schema Induction For Qualitative Reseach

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Grounded Theory, Schema Induction, Retrieval-augmented Generation, Large Language Model, Information Retrieval
TL;DR: This paper introduces LOGOS, a framework that formulates schema induction as a global question answering problem and develops an automated pipeline for generating, refining, and evaluating grounded theory codebooks using large language models.
Abstract: Grounded theory offers deep insights from qualitative data, but its reliance on expert-intensive manual coding presents a major scalability bottleneck. Existing computational tools either fail on full automation or lack flexible schema con- struction. We introduce LOGOS, a novel, end-to-end framework that fully au- tomates the grounded theory workflow, transforming raw text into a structured, hierarchical theory. LOGOS integrates LLM-driven coding, semantic cluster- ing, graph reasoning, and a novel iterative refinement process to build highly reusable codebooks. To ensure fair comparison, we also introduce a principled 5-dimensional metric and a train-test split protocol for standardized, unbiased evaluation. Across five diverse corpora, LOGOS consistently outperforms strong baselines and achieves a remarkable average 80.4% alignment with an expert- developed schema on complex datasets. LOGOS demonstrates a potential to de- mocratize and scale qualitative research without sacrificing theoretical nuance.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24078
Loading