Improving Single-round Active Adaptation: A Prediction Variability Perspective

Xiaoyang Wang; Yibo Jacky Zhang; Olawale Elijah Salaudeen; Mingyuan Wu; Hongpeng Guo; Chaoyang He; Klara Nahrstedt; Sanmi Koyejo

Improving Single-round Active Adaptation: A Prediction Variability Perspective

Xiaoyang Wang, Yibo Jacky Zhang, Olawale Elijah Salaudeen, Mingyuan Wu, Hongpeng Guo, Chaoyang He, Klara Nahrstedt, Sanmi Koyejo

Published: 12 Oct 2025, Last Modified: 12 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Machine learning models trained with offline data often suffer from distribution shifts in online environments and require fast adaptation to online data. The high volume of online data further stimulates the study of active adaptation approaches that achieve competitive adaptation performance by selectively annotating only 5%-10% of online data and using it to continuously train a model. Despite the reduction in data annotation cost, many prior active adaptations assume a multi-round data annotation procedure during continuous training, which hinders timely adaptation. In this work, we study a single-round active adaptation problem with a minimum data annotation turnaround time but require the selected subset of data samples to help the entire continuous training procedure until convergence. In our theoretical analysis, we find that the prediction variability of each data sample throughout the training is crucial, in addition to the conventional data diversity. The prediction variability measures how much the prediction could possibly change during the continuous training procedure. To this end, we introduce a novel approach called feature-norm scaled gradient embedding (FORGE), which incorporates prediction variability and improves the single-round active adaptation performance when combined with standard data selection strategies (e.g., k-center greedy). In addition, we provide efficient implementations to construct our FORGE embedding analytically without explicitly backpropagating gradients. Empirical results further demonstrate that our approach consistently outperforms the random selection baseline by up to 1.26% for various vision and language tasks while other competitors often underperform the random selection baseline.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Soma_Biswas1

Submission Number: 4977

Loading