Uncertainty Quantification for LLM-Based Survey Simulations

Chengpiao Huang; Yuhang Wu; Kaizheng Wang

Uncertainty Quantification for LLM-Based Survey Simulations

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

Published: 10 Jun 2025, Last Modified: 14 Jul 2025ICML 2025 World Models WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: synthetic data, large language models, uncertainty quantification, simulation

Abstract: We investigate the use of large language models (LLMs) to simulate human responses to survey questions, and perform uncertainty quantification to assess the fidelity of the simulations. Our approach converts imperfect black-box LLM-simulated responses into confidence sets for population parameters of human responses. A key innovation lies in determining the optimal number of simulated responses: too many produce overly narrow confidence sets with poor coverage, while too few yield excessively loose estimates. Our method adaptively selects the simulation sample size that ensures valid average-case coverage guarantees. The selected sample size itself further provides a quantitative measure of LLM-human misalignment. Experiments on real survey datasets reveal heterogeneous fidelity gaps across different LLMs and domains.

Submission Number: 3

Loading