CO+3: Improved Collaborative Consortium of Foundation Models for Open-World Few-Shot Learning

Shuai Shao, Rui Xu, Bingfeng Zhang, Baodi Liu, Weifeng Liu, Yicong Zhou

Published: 01 Jan 2026, Last Modified: 05 Mar 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0
Abstract: Open-World Few-Shot Learning (OFSL) is a critical research domain focused on accurately identifying target samples under conditions where data is scarce and labels are unreliable. This field is highly relevant to real-world scenarios, holding significant practical implications. Currently, the field has only a few solutions, primarily relying on conventional methods such as metric learning and feature aggregation. However, these methods often struggle in more complex scenarios. Recent breakthroughs in foundation models such as CLIP and DINO have demonstrated their strong representational capabilities, even in resource-limited environments. These advancements have led to a shift from “training model from scratch” towards “exploiting the extensive capabilities and expertise of these pre-trained foundation models for OFSL”. Inspired by this shift, we introduce the Improved Collaborative Consortium of Foundation Models (CO+3), an extension of CO3, first presented in AAAI 2024. CO+3 significantly improves the accuracy of OFSL by integrating the strengths of four foundational models. It includes three decoupled blocks: (1) The Label Correction Block (LC-Block) rectifies unreliable labels, (2) the Data Augmentation Block (DA-Block) enriches the available data, and (3) the Text-guided Fusion Adapter (TeFu-Adapter) merges various features and reduces the impact of noisy labels through semantic constraints. We evaluate CO+3 across eleven benchmark datasets, comparing it against recent state-of-the-art methods. Our thorough evaluations demonstrate that the proposed CO+3 consistently surpasses existing methods by a substantial margin, particularly in high-noise scenarios.
Loading