Efficient token pruning in Vision Transformers using an attention-based Multilayer Network

Published: 01 Jan 2025, Last Modified: 25 Jul 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•TRAM represents tokens using an attention-based multilayer network.•TRAM reduces ViT computational demand without requiring fine-tuning.•TRAM improves FPS and GFlops with near-Vanilla model accuracy.•Visual analysis reveals TRAM’s token selection process.
Loading