One-Shot Style Personalization for RL Agents via Latent Discriminator

One-Shot Style Personalization for RL Agents via Latent Discriminator

ICLR 2026 Conference Submission25633 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Agent Alignment

TL;DR: One-shot style alignment for RL agents via latent inference from a single trajectory and reward-guided finetuning, enabling controllable and generalizable behavior

Abstract: Reinforcement learning (RL) has achieved remarkable success in training agents with high-performing policies, and recent works have begun to address the critical challenge of aligning such policies with human preferences. While these efforts have shown promise, most approaches rely on large-scale data and do not generalize well to novel forms of preferences. In this work, we formalize one-shot style alignment as an extension of the preference alignment paradigm. The goal is to enable RL agents to adapt to human-specified styles from a single example, thereby eliminating the reliance on large-scale datasets and the need for retraining. To achieve this, we propose a framework that infers an interpretable latent style vector through a learned discriminator and adapts a pretrained base policy using a style reward signal during online interaction.Our design enables controllable and data-efficient alignment with target styles while maintaining strong task performance, and further enables smooth interpolation across unseen style compositions. Experiments across diverse environments with varying style preferences demonstrate precise style alignment, strong generalization, and task competence.

Primary Area: reinforcement learning

Submission Number: 25633

Loading