Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data

Published: 02 Jun 2026, Last Modified: 02 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM social simulation, Silicon sampling, Small-sample learning, Simulation fidelity, Computational social science
Abstract: Large language models (LLMs) are increasingly used to simulate social survey responses, yet their outputs exhibit systematic biases: marginal distributions are skewed, response variance is poorly calibrated, and predictor--outcome relationships are attenuated. We ask a simple question: given a small pilot sample of human responses, can an LLM recover the broader population? Using a COVID-19 misinformation survey, we benchmark three families of approaches: prompting, PPI (Prediction-Powered Inference) rectification, and PEFT (parameter-efficient fine-tuning). We decompose recovery along three axes: marginal fidelity, defined as cross-respondent distributional similarity; structural fidelity, defined as alignment in predictor--outcome relationships; and individual fidelity, defined as agreement on per-respondent summaries. PEFT applying LoRA adapters with an MLP classifier head performed best across nearly all axes. These findings suggest that fine-tuning on small pilot samples offers a balanced approach for achieving multiple forms of fidelity.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 139
Loading