No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

Walter Simoncini; Andrei Bursuc; Spyros Gidaris; Yuki M Asano

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

Walter Simoncini, Andrei Bursuc, Spyros Gidaris, Yuki M Asano

Published: 25 Sept 2024, Last Modified: 09 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: self-supervised, gradients, computer vision, transformers, k-nearest neighbor, classification, in-context learning, clustering, retrieval

TL;DR: We propose using self-supervised gradients to enhance pretrained embedding features and achieve significant improvements in k-nearest neighbor classification, in-context scene understanding, linear probing and clustering.

Abstract: This paper introduces FUNGI, **F**eatures from **UN**supervised **G**rad**I**ents, a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also show that using FUNGI features can benefit linear classification, clustering and image retrieval, and that they significantly improve the retrieval-based in-context scene understanding abilities of pretrained models, for example improving upon DINO by +17% for semantic segmentation - without any training. Code is available at https://github.com/WalterSimoncini/fungivision.

Primary Area: Machine vision

Submission Number: 17520

Loading