Keywords: neural collapse, learning from demonstration, vision-based learning control
TL;DR: We discover a control-oriented law of clustering (similar to neural collapse) in the visual representation space of an end-to-end image-based control pipeline trained from behavior cloning.
Abstract: We initiate a study of the geometry of the visual representation space ---the information channel from the vision encoder to the action decoder--- in an image-based control pipeline learned from behavior cloning. Inspired by the phenomenon of *neural collapse* (NC) in image classification, we empirically demonstrate the prevalent emergence of a similar *law of clustering* in the visual representation space. Specifically,
- In discrete image-based control (e.g., Lunar Lander), the visual representations cluster according to the natural discrete action labels;
- In continuous image-based control (e.g., Planar Pushing and Block Stacking), the clustering emerges according to ``control-oriented'' classes that are based on (a) the relative pose between the object and the target in the input or (b) the relative pose of the object induced by expert actions in the output. Each of the classes corresponds to one relative pose orthant (REPO).
Beyond empirical observation, we show such a law of clustering can be leveraged as an algorithmic tool to improve test-time performance when training a policy with limited expert demonstrations. Particularly, we pretrain the vision encoder using NC as a regularization to encourage control-oriented clustering of the visual features. Surprisingly, such an NC-pretrained vision encoder, when finetuned end-to-end with the action decoder, boosts the test-time performance by 10% to 35%. Real-world vision-based planar pushing experiments confirmed the surprising advantage of control-oriented visual representation pretraining.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8620
Loading