LOGOS: LLM-Driven End-to-End Grounded Theory Development and Schema Induction For Qualitative Reseach
Keywords: Grounded Theory, Schema Induction, Retrieval-augmented Generation, Large Language Model, Information Retrieval
TL;DR: This paper introduces LOGOS, a framework that formulates schema induction as a global question answering problem and develops an automated pipeline for generating, refining, and evaluating grounded theory codebooks using large language models.
Abstract: Grounded theory offers deep insights from qualitative data, but its reliance on
expert-intensive manual coding presents a major scalability bottleneck. Existing
computational tools either fail on full automation or lack flexible schema con-
struction. We introduce LOGOS, a novel, end-to-end framework that fully au-
tomates the grounded theory workflow, transforming raw text into a structured,
hierarchical theory. LOGOS integrates LLM-driven coding, semantic cluster-
ing, graph reasoning, and a novel iterative refinement process to build highly
reusable codebooks. To ensure fair comparison, we also introduce a principled
5-dimensional metric and a train-test split protocol for standardized, unbiased
evaluation. Across five diverse corpora, LOGOS consistently outperforms strong
baselines and achieves a remarkable average 80.4% alignment with an expert-
developed schema on complex datasets. LOGOS demonstrates a potential to de-
mocratize and scale qualitative research without sacrificing theoretical nuance.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24078
Loading