Policy Improvement with Style-Specific Demonstrations

Lingfeng Li; Yunlong Lu; Yongyi Wang; Wenxin Li

Policy Improvement with Style-Specific Demonstrations

Lingfeng Li, Yunlong Lu, Yongyi Wang, Wenxin Li

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imitation Learning, Reinforcement Learning, Games

TL;DR: This paper proposes a method to enhance the proficiency of existing suboptimal agents while preserving their play styles.

Abstract: Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles. This work presents an effective approach for generating highly proficient and diverse game agents, ultimately contributing to more engaging gameplay experiences.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 13042

Loading