Development and Benchmarking of a Blended Human-AI Qualitative Research Assistant

Published: 18 Apr 2026, Last Modified: 29 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: interactive systems, qualitative methods, benchmarking, LLMs, research tools
TL;DR: We present Muse, a deployed AI-assisted qualitative research system, achieving inter-rater reliability comparable to human research teams while offering steerability and scalability that extend beyond traditional computational approaches.
Abstract: Qualitative research emphasizes constructing meaning through iterative engagement with textual data. Traditionally, this human-driven process requires navigating coder fatigue and interpretive drift, thus posing challenges when scaling analysis to larger, more complex datasets. Computational approaches to augment qualitative research have been met with skepticism, partly due to their inability to replicate the nuance, context-awareness, and sophistication of human analysis. LLMs, however, present new opportunities to automate aspects of qualitative analysis while upholding rigor and research quality. In this work, we present and benchmark Muse, an interactive qualitative research system that allows researchers to identify themes and annotate datasets, achieving an inter-rater reliability between Muse and humans of Cohen’s $\kappa = 0.7$ for well-specified codes.
Submission Type: Deployed
Copyright Form: pdf
Submission Number: 454
Loading