Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Long-tailed recognition (LTR) aims to learn balanced models from extremely unbalanced training data. Fine-tuning pretrained foundation models has recently emerged as a promising research direction for LTR. However, we observe that the fine-tuning process tends to degrade the intrinsic representation capability of pretrained models and lead to model bias towards certain classes, thereby hindering the overall recognition performance. To unleash the intrinsic representation capability of pretrained foundation models, in this work, we propose a new Parameter-Efficient Complementary Expert Learning (PECEL) for LTR. Specifically, PECEL consists of multiple experts, where individual experts are trained via Parameter-Efficient Fine-Tuning (PEFT) and encouraged to learn different expertise on complementary sub-categories via a new sample-aware logit adjustment loss. By aggregating the predictions of different experts, PECEL effectively achieves a balanced performance on long-tailed classes. Nevertheless, learning multiple experts generally introduces extra trainable parameters. To ensure parameter efficiency, we further propose a parameter sharing strategy which decomposes and shares the parameters in each expert. Extensive experiments on 4 LTR benchmarks show that the proposed PECEL can effectively learn multiple complementary experts without increasing the trainable parameters and achieve new state-of-the-art performance.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Real-word data typically conforts to a long-tailed distribution. In this work, we empirically observe that fine-tuning pretrained visual-linguistic foundation models (such as CLIP) on downstream long-tailed data will degrade the intrinsic representation ability. To alleviate this issue, we propose Parameter-Efficient Complementary Expert Learning (PECEL). To the best of our knowledge, PECEL is the first work to learn complementary experts from pretrained foundation models. Specifically, PECEL addresses the degradation of the representation ability by learning multiple complementary expert models specializing in different subcategories. By ensembling the outputs of these diverse experts, PECEL can achieve a more balanced performance in different classes. Besides, we also propose a parameter sharing strategy to ensure parameter efficiency when learning multiple experts. Our proposed PECEL achieves new state-of-the-art performance on 4 LTR datasets without increasing the number of trainable parameters. In a nutshell, the proposed PECEL can contribute to the multimodal community by providing a better paradigm for fine-tuning foundation models on long-tailed data.
Supplementary Material: zip
Submission Number: 47
Loading