Keywords: test-time adaptation, unsupervised learning, object classification
Abstract: One of the most fundamental goals of machine learning is to enable systems to perceive the world. When deploying pre-trained vision models, it is crucial for agents to adapt these models to new environments without relying on human annotations. In this paper, considering movable agents, we advocate a model adaptation framework for *learning to see by moving without manual labeling*. Our approach stems from the following observation: Predictions made by an agent exhibit significant quality variance as the agent moves, e.g., at closer versus farther distances for an object, implying that high-quality predictions can naturally serve as a teacher's output to adapt the model. Since incorrect teacher predictions can mislead adaptation, we develop a unified data sampling-and-weighting (SAW) framework according to prediction confidence, making the loss function an *unbiased* estimator of the clean loss. Experimental results demonstrate that our proposed scheme significantly outperforms prior schemes across various models and datasets.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 837
Loading