Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities

ACL ARR 2024 August Submission336 Authors

16 Aug 2024 (modified: 17 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation and broader applications in public health and social science research.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Computational Social Science and Cultural Analytics, Resources and Evaluation
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 336
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview