Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence?
Keywords: LLM, Simulation, Computational Social Science
Abstract: There is ongoing debate about whether large
language models (LLMs) can serve as sub-
stitutes for human participants in survey and
experimental research. While recent work in
fields such as marketing and psychology has ex-
plored the potential of LLM-based simulation,
a growing body of evidence cautions against
this practice: LLMs often fail to align with
real human behavior, exhibiting limited diver-
sity, systematic misalignment for minority sub-
groups, insufficient within-group variance, and
discrepancies between stated beliefs and ac-
tions. This study examines an important and
distinct question in this domain: whether fine-
tuning on a small subset of human survey data,
such as that obtainable from a pilot study, can
mitigate these issues and yield realistic sim-
ulated outcomes. Using a behavioral experi-
ment on information disclosure, we compare
human and LLM-generated responses across
multiple dimensions, including distributional
divergence, subgroup alignment, belief–action
coherence, and the recovery of regression co-
efficients. We find that finetuning on small
human samples substantially improves hetero-
geneity, alignment, and belief–action coher-
ence relative to the base model. However, even
the best-performing finetuned models fail to re-
produce the regression coefficients of the origi-
nal study, suggesting that LLM-generated data
remain unsuitable for replacing human partici-
pants in formal inferential analyses.
Paper Type: Long
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: Computational Social Science and Cultural Analytics, Ethics, Bias, and Fairness
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 4833
Loading