TL;DR: LLM social simulations face five key challenges (diversity, bias, sycophancy, alienness, and generalization), and there are promising directions to address each of them.
Abstract: Accurate and verifiable large language model (LLM) simulations of human research subjects promise an accessible data source for understanding human behavior and training new AI systems. However, results to date have been limited, and few social scientists have adopted this method. In this position paper, we argue that the promise of LLM social simulations can be achieved by addressing five tractable challenges. We ground our argument in a review of empirical comparisons between LLMs and human research subjects, commentaries on the topic, and related work. We identify promising directions, including context-rich prompting and fine-tuning with social science datasets. We believe that LLM social simulations can already be used for pilot and exploratory studies, and more widespread use may soon be possible with rapidly advancing LLM capabilities. Researchers should prioritize developing conceptual models and iterative evaluations to make the best use of new AI systems.
Lay Summary: In recent years, artificial intelligence (AI) systems have become much more powerful and humanlike. This has led many researchers to test using AI systems, particularly large language models (LLMs) such as ChatGPT and Gemini, to simulate human research subjects in studies of human behavior. However, many researchers remain skeptical of this approach, and there have not been many applications of LLM social simulations beyond initial testing and proof of concept work.
In this paper, we argue that five challenges (diversity, bias, sycophancy, alienness, and generalization) stand in the way of widespread use of LLM social simulations. These are significant challenges, but we see exciting opportunities for progress on each. Our argument builds on a literature review of studies run to date and related work. We identify promising directions, including context-rich prompting and fine-tuning LLMs with social science datasets.
We believe that LLM social simulations can already be used for exploratory research and building new scientific theories. More widespread use in more applications may soon be possible. Researchers should prioritize developing conceptual models—better ways to make sense of these “digital minds”—and evaluations of simulations so that we can track AI capabilities over time. Accurate and verifiable LLM social simulations can help humanity navigate technological and social change, and they can provide data to train safe and beneficial AI systems.
Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)
No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.
Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.
Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.
Paper Verification Code: ZWJiN
Permissions Form: pdf
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: LLM social simulations, sims, agents, machine learning, artificial intelligence, large language models, evaluation, fairness, economics, psychology, sociology
Submission Number: 70
Loading