Abstract: Highlights•TRAM represents tokens using an attention-based multilayer network.•TRAM reduces ViT computational demand without requiring fine-tuning.•TRAM improves FPS and GFlops with near-Vanilla model accuracy.•Visual analysis reveals TRAM’s token selection process.
Loading