Keywords: LLM Simulation, Computational Social Science, Digital Twins
Abstract: The use of large language models (LLMs) to simulate human behavior has gained significant attention, particularly through personas that approximate individual characteristics. Persona-based simulations hold promise for transforming disciplines that rely on population-level feedback, including social science, economic analysis, marketing research, and business operations. Traditional methods to collect realistic persona data face significant challenges. They are prohibitively expensive and logistically challenging due to privacy constraints, and often fail to capture multi-dimensional attributes, particularly subjective qualities. Consequently, synthetic persona generation with LLMs offers a scalable, cost-effective alternative. However, current approaches rely on ad hoc and heuristic generation techniques that do not guarantee methodological rigor or simulation precision, resulting in systematic biases in downstream tasks. Through extensive large-scale experiments including presidential election forecasts and general opinion surveys of the U.S. population, we reveal that these biases can lead to significant deviations from real-world outcomes. Based on the experimental results, this position paper argues that **a rigorous and systematic science of persona generation is needed to ensure the reliability of LLM-driven simulations of human behavior.** We call for not only methodological innovations and empirical foundations but also interdisciplinary organizational and institutional support for the development of this field. To support further research and development in this area, we have open-sourced approximately one million generated personas, available for public access and analysis.
Lay Summary: Large language models (LLMs) can now simulate real human by creating “personas” that mimic how real individuals might think and decide. These digital twins show promise to change how researchers study society: instead of running costly surveys or experiments, we can generate millions of virtual participants and ask them questions instantly. Yet, this convenience comes with hidden risks. Our study shows that current persona-generation methods, while scalable and realistic at first glance, systematically distort results comparing to the real world. When used to predict real-world outcomes such as U.S. elections or national surveys, LLM-based personas often produce biased and overly homogeneous opinions. These errors arise not from the models’ answers themselves, but from the biased personas we give them. We call for a new science of persona generation: one that grounds synthetic populations in rigorous data, evaluates bias quantitatively, and builds transparent benchmarks to guide progress. Beyond technical fixes, this effort requires collaboration between AI researchers, social scientists, and policymakers. Only by combining empirical grounding with methodological care can LLM-generated personas become reliable “silicon samples” for studying human behavior at scale.
Submission Number: 216
Loading