Graph-Refined Representation Learning for Few-Shot Classification via CLIP Adaptation

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Few-shot representation learning, Vision-language models, Graph neural networks, CLIP adaptation, Knowledge representation
Abstract: Few-shot image classification remains a fundamental challenge, as learning transferable representations from only a handful of examples often fails to generalize to unseen concepts. Recent advances benefit pre-trained vision-language models such as CLIP, yet their inherent biases and limited task adaptability hinder robust performance. We propose a novel **graph-driven cache refinement framework** that improves CLIP's prior knowledge with task-specific representation learning while preserving its lightweight inference. The first stage, **Inductive Statistical Subspace Aggregation (ISSA)**, partitions each feature into subspaces, builds fully connected intra-sample graphs, and applies statistical aggregation to capture robust subspace-level dependencies. The second stage, **Feature Subspace Propagation (FSP)**, globally diffuses contextual signals across subspaces while preserving their individuality, resulting in enriched embeddings that drive cache-based retrieval models. In particular, this refinement branch is active only during training, producing enhanced cache keys while ensuring graph-free, efficient inference. Across multiple benchmarks, our method consistently outperforms state-of-the-art approaches, establishing new performance standards in few-shot learning while retaining computational efficiency. Source code will be released to support reproducibility and further research.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18130
Loading