Textual Prototypes Guided Balanced Visual Feature Learning For Long-Tailed Vision Recognition

TMLR Paper4427 Authors

09 Mar 2025 (modified: 06 Apr 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In recent advancements, pre-trained contrastive models like CLIP have demonstrated remarkable multi-modal prowess in tackling diverse vision tasks. Yet, their potential in addressing the long-tailed vision recognition challenge has not been thoroughly investigated. In this study, we observe that textual features coming from CLIP exhibit a more discriminative and balanced distribution compared to their visual counterparts. Leveraging this insight, we propose a novel approach that uses these balanced textual features as prototypes to guide the learning of robust, disentangled representations from biased visual features. Our method begins with the fine-tuning of CLIP through contrastive learning, enabling the encoders to better adapt to the target dataset. Subsequently, we freeze the visual encoder and apply a linear adapter to enhance the visual representations. To achieve robust vision recognition, we integrate a linear classifier into our framework, which is initialized with the fine-tuned textual features and the weights can be viewed as prototypes. We then introduce a principled approach to robust vision representation learning by minimizing the optimal transport distance between the refined visual features and the prototypes, facilitating the disentanglement of biased features and the iterative optimization of prototypes towards class centroids. Additionally, we introduce a supervised contrastive learning loss based on the transport plan for further enhanced robust vision representation learning. Extensive experiments on long-tailed vision recognition benchmarks demonstrate the superiority of our method.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sungwoong_Kim2
Submission Number: 4427
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview