Keywords: synthetic data, large language models, uncertainty quantification, simulation
Abstract: We investigate the use of large language models (LLMs) to simulate human responses to survey questions, and perform uncertainty quantification to assess the fidelity of the simulations. Our approach converts imperfect black-box LLM-simulated responses into confidence sets for population parameters of human responses. A key innovation lies in determining the optimal number of simulated responses: too many produce overly narrow confidence sets with poor coverage, while too few yield excessively loose estimates. Our method adaptively selects the simulation sample size that ensures valid average-case coverage guarantees. The selected sample size itself further provides a quantitative measure of LLM-human misalignment. Experiments on real survey datasets reveal heterogeneous fidelity gaps across different LLMs and domains.
Submission Number: 3
Loading