Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence?

Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence?

ACL ARR 2026 January Submission4833 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Simulation, Computational Social Science

Abstract: There is ongoing debate about whether large language models (LLMs) can serve as sub- stitutes for human participants in survey and experimental research. While recent work in fields such as marketing and psychology has ex- plored the potential of LLM-based simulation, a growing body of evidence cautions against this practice: LLMs often fail to align with real human behavior, exhibiting limited diver- sity, systematic misalignment for minority sub- groups, insufficient within-group variance, and discrepancies between stated beliefs and ac- tions. This study examines an important and distinct question in this domain: whether fine- tuning on a small subset of human survey data, such as that obtainable from a pilot study, can mitigate these issues and yield realistic sim- ulated outcomes. Using a behavioral experi- ment on information disclosure, we compare human and LLM-generated responses across multiple dimensions, including distributional divergence, subgroup alignment, belief–action coherence, and the recovery of regression co- efficients. We find that finetuning on small human samples substantially improves hetero- geneity, alignment, and belief–action coher- ence relative to the base model. However, even the best-performing finetuned models fail to re- produce the regression coefficients of the origi- nal study, suggesting that LLM-generated data remain unsuitable for replacing human partici- pants in formal inferential analyses.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: Computational Social Science and Cultural Analytics, Ethics, Bias, and Fairness

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 4833

Loading