Keywords: Large language models, Social Reasoning
Abstract: As LLMs are increasingly deployed in real-world interactions, their social reasoning in interpersonal situations becomes critical. To explore their capabilities, we introduce SCRTIPS, a 1k-dialogue dataset in English and Korean, sourced from movie scripts.
We furthermore propose a social reasoning task based on SCRTIPS that evaluates the capacity of LLMs to infer the social relationships (e.g., friends, sisters, lovers) between speakers in each dialogue.
Among nine models' evaluation results, current proprietary LLMs achieve around 75–80% on the English dataset and 58–69% in Korean.
Strikingly, models predict relationships labeled as Unlikely by humans in 10–25% of responses in both languages.
Furthermore, we find that thinking models and chain-of-thought prompting provide minimal benefits for social reasoning and occasionally amplify social biases.
In sum, there are significant limitations in current LLMs' social reasoning capabilities especially for Korean, highlighting the need for efforts to develop socially-aware LLMs.
Paper Type: Long
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: language/cultural bias analysis, sociolinguistics
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English, Korean
Submission Number: 7475
Loading