CSO: Refining Robotic Policies via Skill Distribution Alignment and Skill-Grained Optimization

Zhiyuan Xiang; Xiang Deng; Jianye HAO; Weili Guan; Liqiang Nie

CSO: Refining Robotic Policies via Skill Distribution Alignment and Skill-Grained Optimization

Zhiyuan Xiang, Xiang Deng, Jianye HAO, Weili Guan, Liqiang Nie

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Embodied AI, Imitation Learning, Reinforcement Learning, Latent Variable Models

TL;DR: We propose Cascaded Skills Optimization, a two-stage framework that first aligns a skill prior via rejection sampling, then refines it with our proposed skill-level policy optimization.

Abstract: Discretizing continuous actions into skills using methods like VQ-VAE has emerged as a powerful paradigm for robotic manipulation. However, the quantization errors in discretizing continuous actions yield a suboptimal training distribution for the prior, degrading its performance. While reinforcement learning offers a path for refinement, its direct application is challenging, suffering from unstable encoder updates and a granularity dilemma in importance sampling. To address these challenges, we introduce Cascaded Skills Optimization (CSO), a two-stage post-training framework. First, to rectify the initial policy's suboptimal distribution, CSO employs Rejection-Sampling Supervised Fine-tuning to align the model's observation-to-skill mapping with the distribution of successful online trajectories via supervised fine-tuning. Second, to resolve the granularity dilemma, CSO introduces Skills Policy Optimization, which computes an independent, clipped importance ratio for each skill, enabling more stable and efficient updates. Our post-training strategy delivers highly competitive performance on challenging benchmarks like LIBERO and MetaWorld, with its effectiveness further validated on a physical robot.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 10278

Loading