Exposing and Patching the Flaws of Large Language Models in Social Character Simulation

Yue Huang; Zhengqing Yuan; Yujun Zhou; Xiangqi Wang; Kehan Guo; Haomin Zhuang; Weixiang Sun; Lichao Sun; Jindong Wang; Yanfang Ye; Xiangliang Zhang

Exposing and Patching the Flaws of Large Language Models in Social Character Simulation

Yue Huang, Zhengqing Yuan, Yujun Zhou, Xiangqi Wang, Kehan Guo, Haomin Zhuang, Weixiang Sun, Lichao Sun, Jindong Wang, Yanfang Ye, Xiangliang Zhang

Published: 09 Jun 2025, Last Modified: 08 Jul 2025KDD 2025 Workshop SciSocLLMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, social science, social simulation, reliability

Abstract: Large Language Models (LLMs) are increasingly used for social character simulations, enabling applications in role-playing agents and Computational Social Science (CSS). However, their inherent flaws—such as inconsistencies in simulated roles—raise concerns about their reliability and trustworthiness. In this paper, we systematically investigate these flaws and explore potential solutions. To assess the reliability of LLM-based simulations, we introduce TrustSim, a benchmark dataset covering 10 CSS-related topics. Through experiments on 14 LLMs, we uncover persistent inconsistencies in simulated roles and find that higher general model performance does not necessarily correlate with greater simulation reliability. To mitigate these flaws, we propose Adaptive Learning Rate Based ORPO (AdaORPO), a reinforcement learning-based algorithm that improves simulation consistency across seven LLMs. Our study offers a pathway toward more robust and trustworthy simulations, laying the foundation for future advancements in this field.

Submission Number: 17

Loading