PyramidMix: Theoretically Grounded Data Mixture Scaling Laws for Generalizable Robot Policy Learning
Keywords: data mixing, scaling laws, robot policie
TL;DR: Theory-guided data mixture for robot policy learning yields closed-form optimal weights and significantly boosts unseen-embodiment success over uniform sampling.
Abstract: Training generalizable robot policies on heterogeneous multi-source datasets requires deciding how much data to draw from each source—a problem typically solved through grid search or intuition. We show this problem has a rigorous theoretical solution. Treating each data tier as a source with a fixed quality score $q_k$ (measured by held-out behavioral cloning loss), we prove that the policy loss under a $K$-tier mixture follows a power law in total trajectory count $N$ with an exponent that depends on the quality-weighted mixture vector $w$. From this result we derive closed-form optimal mixture weights $w_k^\* \propto q_k^{\alpha^\*}$, where the exponent $\alpha^\*$ is determined by the quality spread of the data pyramid and the loss scaling exponent $\beta$. We instantiate these results in PyramidMix, a practical training recipe that (i) estimates $q_k$ from lightweight proxy losses, (ii) initializes mixture weights from $w_k^\*$, and (iii) dynamically refines them during training via gradient alignment scores. Evaluated on the Open X-Embodiment dataset across 25 datasets and Octo and OpenVLA backbones, PyramidMix improves task success rate by 15.8 pp over uniform mixing and 11.6 pp over quality-only filtering when transferring to unseen embodiments, with all improvements significant at $p < 0.01$.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 13
Loading