ProMix: Learning Optimal Data Mixtures for Robotic Imitation via Proxy-Reference Distillation

Published: 13 May 2026, Last Modified: 13 May 2026ICRA 2026: From Data to Decisions PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language-Action model, data mixture optimization, proxy-reference model, imitation learning
Abstract: This paper introduces ProMix, an efficient data-mixture optimization framework designed to handle the heterogeneity of large-scale robotic datasets. While Vision-Language-Action (VLA) models benefit from diverse data, naïve uniform mixing often leads to performance saturation. ProMix adopts a proxy-reference distillation architecture that learns optimal per-domain sampling weights by minimizing the worst-case excess loss against a frozen reference model. This mechanism effectively mitigates the "over-pessimism" common in traditional distributionally robust optimization (DRO) when applied to complex robotic distributions. Experimental results on 2.7M real-world actions from the Open-X Embodiment dataset demonstrate that ProMix improves average success rates from 68.4\% to 76.2\%, with substantial gains (12–18\%) in low-resource domains. Crucially, ProMix achieves a 17$\times$ reduction in computational overhead (11 vs. 192 GPU-hours) compared to existing adaptive mixing methods, providing a scalable pipeline for pre-training large-scale robotic foundation models.
Submission Number: 51
Loading