Understanding the Limits of Vision Test-Time Scaling: Path Redundancy, Instance Difficulty, and Adaptive Compute

Published: 12 May 2026, Last Modified: 12 May 20262nd ViSCALE @ CVPR 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Vision Test-Time Scaling, Test-Time Com- pute, Multi-Path Inference, CLIP, Zero-Shot Classification, Adaptive Inference, Path Diversity, Inference Redundancy, Compute-Accuracy Trade-offs, Information Scaling
TL;DR: Vision test-time scaling improves accuracy only when additional inference paths provide diverse information; otherwise, high path redundancy causes rapid saturation.
Abstract: Test-time scaling has shown strong gains in language rea- soning, yet its behavior in vision remains poorly under- stood. We present one of the first systematic studies of vi- sion test-time scaling through CLIP-based multi-path in- ference, where computation is increased via prompt en- sembles and test-time augmentations. Our results show that additional inference paths improve accuracy in early regimes but rapidly exhibit diminishing returns. Through correlation analysis, we demonstrate that strong path re- dundancy limits the effective value of additional compu- tation. We further show that compute gains concentrate on high-uncertainty samples, motivating adaptive infer- ence strategies. Although entropy-based adaptive stop- ping approaches favorable compute-accuracy trade-offs, our analysis reveals substantial remaining efficiency head- room. Overall, our findings suggest that the primary bottle- neck of vision test-time scaling is not computation itself, but the lack of informational diversity across inference paths.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 2
Loading