Keywords: Attentive Probing, Evaluation, Efficiency, Foundation Models, Vision-Language Models, Self-supervised Learning
Domains: Vision and Learning, Language and Learning
TL;DR: A lightweight attention-based probing method (EP) that evaluates some vision and vision-language models more accurately than linear probing, using fewer parameters than existing attentive probing approaches.
External Link: https://openreview.net/pdf?id=PXo0gtT7Al
Abstract: As fine-tuning becomes impractical at scale, probing is emerging as the preferred evaluation protocol. However, standard linear probing can understate the capability of models whose pre-training optimizes local representations rather than an explicit global representation. This motivates attentive probing, an alternative that uses attention to selectively aggregate patch-level features. Despite growing adoption, attentive probing is still underexplored: existing approaches are often overparameterized and computationally inefficient. In this work, we revisit attentive probing through the lens of the accuracy vs. parameter-efficiency trade-off. We present the first comprehensive study of existing methods, analyzing their design choices and benchmarking their performance. Building on these insights, we propose efficient probing (EP), a lightweight yet effective multi-query cross-attention mechanism that eliminates redundant projections and reduces the number of trainable parameters. Across multiple benchmarks and pre-training paradigms, EP consistently outperforms linear probing and previous attentive probing methods, and remains effective when combined with parameter-efficient fine-tuning. Beyond evaluation, our analysis uncovers emerging properties of EP, including complementary attention maps, which open new directions for leveraging probing beyond protocol design. Project page: https://vrg.fel.cvut.cz/ep/.
Submission Number: 140
Loading