Look in the Middle: Structural Anchor Pruning for Scalable Visual RAG Indexing

Look in the Middle: Structural Anchor Pruning for Scalable Visual RAG Indexing

ACL ARR 2026 January Submission6506 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RAG, Visual Document Retrieval, Vision-Language Models, Efficient Indexing, Token Pruning, Multi-Vector Retrieval, Training-free Pruning, Attention Analysis

Abstract: Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive index vector size overheads. Training-free pruning solutions (e.g., EOS-attention based methods) can reduce index vector size by approximately 60\% without model adaptation, but often underperform random selection in high-compression scenarios ($\ge$ 80\%). Prior research (e.g., Light-ColPali) attributes this to the conclusion that visual token importance is inherently query-dependent, thereby questioning the feasibility of training-free pruning. In this work, we propose Structural Anchor Pruning (SAP), a training-free pruning method that identifies key visual patches from middle layers to achieve high performance compression. We also introduce Oracle Score Retention (OSR) protocol to evaluate how layer-wise information affects compression efficiency. Evaluations on the ViDoRe benchmark demonstrate that SAP reduces index vectors by over 90\% while maintaining robust retrieval fidelity, providing a highly scalable solution for Visual RAG. Furthermore, our OSR-based analysis reveals that semantic structural anchor patches persist in the middle layers, unlike traditional pruning solutions that focus on the final layer where structural signals dissipate.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: multimodality, image text matching, cross-modal application, cross-modal information extraction

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 6506

Loading