Keywords: behavior simulation, socio-demograhic prompting
Abstract: Socio-demographic prompting (SDP), which prompts Large Language Models (LLMs) to generate culturally aligned behaviors using demographic proxies, is commonly used to assess cultural biases in LLMs. However, its sensitivity to the prompt raises questions about its reliability in cultural assessment and user behavior simulation. Here, we explore inverse socio-demographic prompting (ISDP), a method that prompts LLMs to predict users' cultural backgrounds based on their behaviors, offering a robust alternative by mapping behaviors to cultural proxies. We evaluate SDP and ISDP across four LLMs - Aya-23, Gemma-2, GPT-4o, and LLama-3.1 - using the Goodreads-CSI dataset (Saha et al., 2025), which captures cross-cultural non-understandability in book reviews from users in India, Mexico, and the USA. Our analysis reveals that ISDP is a much more robust way of assessing LLMs' cultural alignment than SDP. Next, we simulate user behavior and evaluate model performance by aggregating behavior at different levels. We observe that at a group-level, GPT-4o excels in ISDP with actual user behavior and struggles when the behavior is LLM-generated. Furthermore, at the user level, GPT-4o performs best when the behavior is generated by itself or by the actual users. In contrast, at the group level, other models perform better with LLM-generated behavior than with the actual user behavior. We reason that this is likely because LLMs generate stereotypical outputs due to maximum likelihood decoding, which deviates from real-world user behavior, which is more nuanced and less normative - individuals do not exhibit all stereotypes of a culture. These findings have significant implications for simulating user behavior using LLMs and position ISDP as a valuable framework for understanding the limitations of user behavior simulation and studying cultural representation in LLMs.
Submission Number: 10
Loading