Keywords: test-time computing, fine-tuning, reflection, reasoning
Abstract: We ask: Can focusing on likely classes of a single, in-domain sample improve model predictions? Prior work argued no. We put forward a novel rationale in favor of yes: Sharedness of features among classes indicates their reliability for a single sample. We aim for an affirmative answer without using hand-engineered augmentations or auxiliary tasks. We propose two novel test-time fine-tuning methods to improve uncertain model predictions. Instead of greedily selecting the most likely class, we introduce an additional step, focus on the likely classes, to refine predictions. By applying a single gradient descent step with a large learning rate, we refine predictions when an initial forward pass indicates high uncertainty. The experimental evaluation demonstrates accuracy gains for one of our methods on average, which emphasizes shared features among likely classes. The gains are confirmed across diverse text and image domain models.
Primary Area: optimization
Submission Number: 6320
Loading