Attention Budget Scheduling: Token-Level Test-Time Scaling for Vision Transformers

Mahule Roy; Subhas Roy

Attention Budget Scheduling: Token-Level Test-Time Scaling for Vision Transformers

Mahule Roy, Subhas Roy

Published: 12 May 2026, Last Modified: 12 May 20262nd ViSCALE @ CVPR 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Vision Transformers, Test-time Scaling, Token-level Attention, Attention Budget Scheduling, Efficient Inference, CIFAR-100, CLEVR, Model Calibration, Robustness

TL;DR: Post-hoc token-level scaling for Vision Transformers improves accuracy and calibration with minimal extra computation.

Abstract: Test-time scaling enables vision models to improve inference performance without retraining by selectively allocating computation. Existing methods typically scale computation uniformly—via higher-resolution inputs, multi-crop ensembles, or extra sampling steps—ignoring spatial redundancy. We introduce Attention Budget Scheduling (ABS), a token-level test-time scaling method for Vision Transformers (ViTs) that reallocates attention computation toward uncertain or high-saliency tokens while leaving less informative regions unchanged. ABS operates post-hoc and requires no retraining. Evaluations on CIFAR-100 and CLEVR show modest but consistent improvements: ABS achieves up to 1.21\% higher accuracy on CIFAR-100 with only 10\% additional FLOPs, compared to resolution scaling requiring 69\% more FLOPs for 0.77\% gain, while also improving calibration. These results highlight token-level scaling as an efficient and practical approach for enhancing ViT inference.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 1

Loading