Enhancing Social Intelligence in LLMs with Hierarchical Reasoning and Utterance-Level Goal Rewarding

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Reinforcement Learning, Social Intelligence
Abstract: Large language models (LLMs) excel in structured tasks but struggle with dynamic social interactions, where success requires long‐term goal coordination and rapid adaptation. Current methods often apply uniform goal‐based rewards to every utterance, overlooking the specificity of objectives at each dialogue turn and failing to account for the rationale of potential strategies. Inspired by the Theory of Planned Behavior, we propose the Think‐Strategy‐Response (TSR) framework, which decomposes social dialogue into two hierarchical stages: high‐level strategic planning and low‐level linguistic execution. To optimize TSR, we introduce Linearized Hierarchical Reinforcement Learning with Variance‐Gated Rewards (LHRL‐VGR), a novel algorithm that dynamically routes rewards—balancing goal completion and strategy adherence—based on the variance of goal achievement scores. Experiments on the SOTOPIA benchmark show that our approach fine‐tunes a Qwen2.5-7B agent to surpass the GPT‐4o baseline by 7.32% in goal completion success, demonstrating state‐of‐the‐art performance in multi‐agent social negotiation tasks.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7220
Loading