Keywords: Embodied AI, Imitation Learning, Reinforcement Learning, Latent Variable Models
TL;DR: We propose Cascaded Skills Optimization, a two-stage framework that first aligns a skill prior via rejection sampling, then refines it with our proposed skill-level policy optimization.
Abstract: Discretizing continuous actions into skills using methods like VQ-VAE has emerged as a powerful paradigm for robotic manipulation.
However, the quantization errors in discretizing continuous actions yield a suboptimal training distribution for the prior, degrading its performance.
While reinforcement learning offers a path for refinement, its direct application is challenging, suffering from unstable encoder updates and a granularity dilemma in importance sampling.
To address these challenges, we introduce Cascaded Skills Optimization (CSO), a two-stage post-training framework.
First, to rectify the initial policy's suboptimal distribution, CSO employs Rejection-Sampling Supervised Fine-tuning to align the model's observation-to-skill mapping with the distribution of successful online trajectories via supervised fine-tuning.
Second, to resolve the granularity dilemma, CSO introduces Skills Policy Optimization, which computes an independent, clipped importance ratio for each skill, enabling more stable and efficient updates.
Our post-training strategy delivers highly competitive performance on challenging benchmarks like LIBERO and MetaWorld, with its effectiveness further validated on a physical robot.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 10278
Loading