TL;DR: RSMerge merges CLIP models fine-tuned on balanced subsets and retrains the classifier on the full dataset, achieving SOTA head/tail accuracy trade-offs across five benchmarks.
Abstract: Class imbalance is a pervasive challenge in machine learning, where head classes have abundant samples, while tail classes are severely underrepresented. This imbalance significantly impacts predictive performance, particularly in scenarios where maintaining balanced accuracy is critical. Traditional fine-tuning methods for foundational models such as CLIP often prioritize head-class accuracy but distort pre-trained representations for tail classes, leading to suboptimal overall performance. Conversely, parameter-efficient fine-tuning (PEFT) methods preserve tail-class features but struggle to fully leverage head-class information. In this study, we first show empirically how different head-to-tail class ratios affect model performance, highlighting the limitations of existing fine-tuning methods across various imbalance distributions. To address these limitations, we propose a two-stage learning framework that merges models fine-tuned on balanced subsets via full-rank updates and then freezes the encoder to retrain the classifier on the full dataset. Validated across five benchmark datasets with distinct imbalance patterns, our method achieves superior trade-offs between head and tail class accuracies while maintaining generalizability.
Primary Area: Deep Learning->Foundation Models
Keywords: imbalance classification, foundational model, fine-tuning, model merging
Submission Number: 5110
Loading