Revealing Task-Dependent Layer Relevance via Attentive Multi-Layer Fusion

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: representation learning, attentive probe, intermediate representations, network hierachy
TL;DR: We show that intermediate layers contain information complementary to the final layer, by utilizing attentive probe.
Abstract: Efficiently adapting large-scale foundation models to downstream tasks is a central challenge in modern deep learning. While linear probing is a standard and computationally efficient method, it typically operates exclusively on the final layer's representation. In this work, we present experimental evidence that this approach discards crucial task-relevant information distributed across other layers of the network. To investigate this, we introduce Attentive Layer Fusion (ALF), a probing mechanism that dynamically fuses representations from all layers of Vision Transformers. Acting as a an investigation tool, ALF reveals that optimal representation depth is highly task-dependent: while tasks similar to the pre-training domain rely on the final layer, specialized domains (e.g., medical, satellite) benefit significantly from intermediate layers. Furthermore, by analyzing representational similarities, we show that intermediate layers often achieve high downstream performance despite having low similarity to the final layer, indicating they encode distinct, complementary features. Across 19 diverse datasets and 9 foundation models, our hierarchical approach achieves consistent gains, offering a new lens into how foundation models organize information.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Submission Number: 105
Loading