Keywords: Vision-Language-Action Models, Robotic Manipulation, Fine-Tuning, Preference Optimization, Reinforcement Learning, Stage-Aware Optimization
TL;DR: We propose Stage-Aware Optimization, which improves VLA fine-tuning by decomposing manipulation into stages for precise offline preference alignment and stage-conditioned online policy refinement.
Abstract: Recent advances in Vision-Language-Action (VLA) models, powered by large
language models and reinforcement learning-based fine-tuning, have shown re-
markable progress in robotic manipulation. Existing methods often treat long-
horizon actions as linguistic sequences and apply trajectory-level optimization
methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Pol-
icy Optimization (PPO), leading to coarse credit assignment and unstable training.
However, unlike language, where a unified semantic meaning is preserved de-
spite flexible sentence order, action trajectories progress through causally chained
stages with different learning difficulties. This motivates progressive stage opti-
mization. Thereby, we present Stage-Aware Reinforcement (STARE), a module
that decomposes a long-horizon action trajectory into semantically meaningful
stages and provides dense, interpretable, and stage-aligned reinforcement signals.
Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO)
and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online
intra-stage interaction, respectively. Further building on supervised fine-tuning as
initialization, we propose the Imitation→Preference→Interaction (IPI), a serial
fine-tuning pipeline for improving action accuracy in VLA models. Experiments
on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-
the-art success rates of 98.0% on SimplerEnv and 96.4% on ManiSkill3 tasks. Our
code will be released publicly.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 17844
Loading