Keywords: VLAs, open-source
TL;DR: When fine-tuned on extra demonstrations, small-size VLAs exhibit increased consistency during training
Abstract: Robotics increasingly leverages behavioral cloning for contact-rich tasks where accurate simulators are infeasible and dense reward functions difficult to define.
Collected by humans sequentially, input trajectories are non-i.i.d. data and thus randomized to mitigate non-stationarity, and more closely adhere to the fundamental theoretical assumptions underlying statistical learning.
Rather than modeling single actions, modern visuomotor policies are trained to model action chunks, which are crucially considered in complete isolation during training.
However, empirical evidence suggests that powerful visuomotor policies seem to pick up on the sequential nature of the input trajectories provided during training, reproducing increasingly more consistent chunks, despite not being instructed to do so.
In this opinion piece, we present initial empirical evidence substantiating the claim that, when fine-tuned on extra demonstrations, small-size VLAs might learn to exploit aspects of the input data self-learning consistency, conversely to larger models which in the same setting become less self-consistent.
Submission Number: 65
Loading