SRP: Understanding Reliability in LLM-based Human Behavior Simulation

SRP: Understanding Reliability in LLM-based Human Behavior Simulation

ACL ARR 2026 January Submission2819 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: human behavior simulation; simulation reliability

Abstract: Large language models (LLMs) are increasingly used to simulate human survey responses and behavioral reactions, yet the conditions under which such simulations are reliable remain unclear, making it difficult to pinpoint where errors arise and which configuration choices drive them. To make reliability analysis more interpretable and actionable, we propose the Simulation Reliability Prism (SRP), which decomposes simulation into three structured layers and analyzes error propagation across layers along three key configuration dimensions—model capacity, profile completeness, and population coverage, while jointly evaluating two complementary targets: individual-level reliability and population-level reliability. Across three survey tasks and eleven LLMs, we show that profile conditioning is necessary to avoid systematic distributional bias, while increasing profile completeness yields diminishing individual gains and transfers unreliably to population-level improvements, sometimes reversing. Increasing population coverage mainly reduces variance, and population-level reliability typically stabilizes with fewer than 100 samples. Our findings offer practical guidance for reliable LLM-based survey simulation.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: Computational Social Science and Cultural Analytics

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 2819

Loading