SCALE: Augmenting Content Analysis via LLM Agents and AI-Human Collaboration

Chengshuai Zhao; Zhen Tan; Chau-Wai Wong; Xinyan Zhao; huan liu; Tianlong Chen

SCALE: Augmenting Content Analysis via LLM Agents and AI-Human Collaboration

Chengshuai Zhao, Zhen Tan, Chau-Wai Wong, Xinyan Zhao, huan liu, Tianlong Chen

26 Sept 2024 (modified: 19 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Content Analysis, Large Language Model, Multiagent, Simulation, Computational Social Science, AI for Science

TL;DR: We propose SCALE, a novel LLM multi-agent framework to automate content analysis, traditionally labor-intensive in social science, while integrating human oversight, enabling scalable, high-quality annotations approximating human judgment.

Abstract: Content analysis is a fundamental social science research method that breaks down complex, unstructured texts into theory-informed numerical categories. It has been widely applied across social science disciplines such as political science, media and communication, sociology, and psychology for over a century. This process often relies on multiple rounds of manual annotation and discussion. While rigorous, content analysis is domain knowledge-dependent, labor-intensive, and time-consuming, posing challenges of subjectivity and scalability. In this paper, we introduce SCALE, a transformative multi-agent framework to $\underline{\textbf{S}}$imulate $\underline{\textbf{C}}$ontent $\underline{\textbf{A}}$nalysis via large language model ($\underline{\textbf{L}}$LM) ag$\underline{\textbf{E}}$nts. This framework automates key phases including text coding, inter-agent discussion, and dynamic codebook updating, capturing human researchers' reflective depth and adaptive discussions. It also incorporates human intervention, enabling different modes of AI-human expert collaboration to mitigate algorithmic bias and enhance contextual sensitivity. Extensive evaluations across real-world datasets demonstrate that SCALE exhibits versatility across diverse contexts and approximates human judgment in complex annotation tasks commonly required for content analysis. Our findings have the potential to transform social science and machine learning by demonstrating how an appropriately designed multi-agent system can automate complex, domain-expert-dependent interactions and generate large-scale, quality outputs invaluable for social scientists.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6097

Loading