Drawing Reliable Conclusions with Synthetic Simulations from Large Language Models

Yewon Byun; Shantanu Gupta; Zachary Chase Lipton; Rachel Leah Childers; Bryan Wilder

Drawing Reliable Conclusions with Synthetic Simulations from Large Language Models

Yewon Byun, Shantanu Gupta, Zachary Chase Lipton, Rachel Leah Childers, Bryan Wilder

Published: 24 Jul 2025, Last Modified: 01 Aug 2025Social Sim'25EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Human-AI Collaboration, Synthetic Simulations, LLMs for Social Science

TL;DR: We introduce a principled approach for reliably incorporating synthetic simulated samples from LLMs for downstream statistical analyses

Abstract: There is increasing interest in using large language models to generate synthetic simulations (e.g., social simulations) to support social science and human subject research, such as in responses to surveys or in human behavior simulation. However, it is not immediately clear by what means practitioners can incorporate such data alongside ground-truth human data and yet still draw reliable insights and conclusions upon them. In this work, we introduce a principled framework for reliably incorporating synthetic simulations from text-based foundation models into downstream statistical analyses. Our estimator offers a hyperparameter-free solution with strong theoretical guarantees, allowing practitioners to retain key statistical properties---even when incorporating imperfect, biased simulated data. We empirically validate the finite-sample performance of our estimator, which improves statistical efficiency, across different regression tasks in social science applications. To the best of our knowledge, our framework provides the first theoretically-sound approach for safely incorporating synthetic simulations from foundation models for reliable statistical inference.

Submission Number: 18

Loading