DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas

ICLR 2026 Conference Submission10030 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic Data Generation, Synthetic Personas, Persona Generation, Human Simulation, LLM Personalization, Social Simulation, Large Language Models
Abstract: Simulating human profiles by instilling personas into large language models (LLMs) is rapidly transforming research in personalization, social simulation, and human-AI alignment. However, most existing synthetic personas remain shallow and simplistic, capturing minimal attributes and failing to reflect the rich complexity and diversity of real human identities. We introduce DeepPersona, a scalable generative engine for synthesizing narrative-complete synthetic personas through a two-stage, taxonomy-guided method. First, we algorithmically construct the largest-ever human-attribute taxonomy, comprising over hundreds of hierarchically-organized attributes, by systematically mining thousands of real user-ChatGPT conversations. Second, we progressively sample attributes from this taxonomy, conditionally generating coherent and realistic personas, averaging hundreds of structured attributes and roughly 1 MB of narrative text, two orders of magnitude deeper than prior works. Intrinsic evaluations confirm significant improvements in attribute diversity (32% higher coverage) and profile uniqueness (44% greater) compared to state-of-the-art baselines. Extrinsically, our personas enhance GPT-4.1-mini’s personalized Q&A accuracy by 11.6% average on ten metrics, and substantially narrow (by 32%) the gap between simulated LLM ``citizens'' and authentic human responses in social surveys. DeepPersona thus provides a rigorous, scalable, and privacy-free platform for high-fidelity human simulation and personalized AI research.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 10030
Loading