UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Cheng Qian; Zuxin Liu; Akshara Prabhakar; Jielin Qiu; Zhiwei Liu; Haolin Chen; Shirley Kokane; Heng Ji; Weiran Yao; Shelby Heinecke; Silvio Savarese; Caiming Xiong; Huan Wang

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Cheng Qian, Zuxin Liu, Akshara Prabhakar, Jielin Qiu, Zhiwei Liu, Haolin Chen, Shirley Kokane, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM agents, RL reward design, User-centric interaction

TL;DR: We present UserRL, a framework with diverse gym environments for training and evaluating user-centric agentic models through standardized interaction and reward shaping.

Abstract: Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this work, we propose **UserRL**, a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users. We systematically vary turn-level reward assignment and trajectory-level score calculation to analyze how different formulations affect learning under the GRPO algorithm. Our experiments across Qwen3 models reveal that: (i) SFT cold start is critical for unlocking initial interaction ability and sustained RL improvements; (ii) deliberate trajectory scoring yields more efficient and effective multi-turn interactions; and (iii) while stronger simulated users (e.g., GPT-4o) facilitates training, open-source simulators (e.g., Qwen3-32B) remain a cost-effective and transferable option. Together, these results highlight that careful design of reward shaping and user simulation choice is as crucial as model scale, and we establish UserRL as a practical pathway for developing robust user-centric agentic models.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 8034

Loading