Abstract: Social skills such as collaboration and negotiation are essential for large language models (LLMs) to interact effectively with humans. While reinforcement learning (RL) has shown promise in enhancing the problem-solving abilities of LLMs, how to utilize RL to train social agents remains to be an open question. Compared with problem-solving tasks, social tasks have two key differences: (1) social tasks require interaction with other agents with unobservable social goals, making them non-Markov Decision Processes (MDPs); (2) social tasks require multi-dimensional evaluation due to their complexity. To address the unique challenges of using RL for social tasks, we propose an utterance-level, attribution-based, multi-dimensional social reward design method, trained using single-turn online RL, SOTOPIA-RL . We first attribute episode-level rewards for multi-turn social interactions to individual utterances with state-of-the-art LLMs. Then, we construct a combined reward that includes multiple dimensions of rewards besides goal completion, allowing us to regularize the optimization process for goal completion. These structured utterance-level rewards are used to guide the RL training of social agents. Experiments in SOTOPIA, an open-ended social learning environment, show that (1) SOTOPIA-RL achieves state-of-the-art goal completion scores, 7.17 on SOTOPIA-hard and 8.31 on SOTOPIA-full, surpassing all prior methods without reasoning; (2) both reward attribution and reward combination significantly improve RL training stability and overall performance.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Social Simulation; Reinforcement Learning
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 7444
Loading