Keywords: Human-AI Collaboration, Synthetic Simulations, LLMs for Social Science
TL;DR: We introduce a principled approach for reliably incorporating synthetic simulated samples from LLMs for downstream statistical analyses
Abstract: There is increasing interest in using large language models to generate synthetic simulations (e.g., social simulations) to support social science and human subject research, such as in responses to surveys or in human behavior simulation. However, it is not immediately clear
by what means practitioners can incorporate such data alongside ground-truth human data
and yet still draw reliable insights and conclusions upon them.
In this work, we introduce a principled framework for reliably incorporating synthetic simulations from text-based foundation models into downstream statistical analyses. Our estimator offers a hyperparameter-free solution with strong theoretical guarantees, allowing practitioners to retain key statistical properties---even when incorporating imperfect, biased simulated data. We empirically validate the finite-sample performance of our estimator, which improves statistical efficiency, across different regression tasks in social science applications. To the best of our knowledge, our framework provides the first theoretically-sound approach for safely incorporating synthetic simulations from foundation models for reliable statistical inference.
Submission Number: 18
Loading