Keywords: data efficiency, vision-language-action models, data redundancy, coreset selection, robot learning, LIBERO
TL;DR: Random subsets (30%) of the LIBERO dataset match full dataset performance for VLA training, and outperform difficulty-based data selection, revealing significant data redundancy and favoring diversity over difficulty.
Abstract: Recent work suggests robotic benchmarks contain significant data redundancy. In this work, we empirically verify and quantify this on the popular LIBERO benchmark by evaluating two distinct data selection methods: random sampling and difficulty-based sampling. We first evaluate random sampling via random frame downsampling, finding that a sparse 30\% coverage of the dataset is sufficient to match the performance of the full dataset. We then investigate whether difficulty-based sampling can improve this further by implementing a temporal surprise score (TSS). TSS identifies high-volatility action frames and selects them as dense, continuous clusters (including temporal neighbors) while discarding low-information transit frames. We find that random sampling matches, and in some cases exceeds, full dataset performance, while difficulty-based sampling underperforms both. This suggests that maintaining broad coverage across diverse scenarios is more crucial than targeting difficult frames for VLA training. This work characterizes this diversity vs. difficulty trade-off, providing an initial empirical analysis of sparse versus dense data selection in VLA training.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 82
Loading