- Student First Author: Yes
- Keywords: Online learning, active learning, bandit algorithms
- Previously Published: Under submission to NeuRIPS 2020.
- Abstract: Online machine learning systems need to adapt to domain shifts. Meanwhile, acquiring label at every timestep is expensive. We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains. For online linear regression with oblivious adversaries, we provide a tight tradeoff that depends on the durations and dimensionalities of the hidden domains. Our algorithm can adaptively deal with interleaving spans of inputs from different domains. We also generalize our results to non-linear regression for hypothesis classes with bounded eluder dimension and adaptive adversaries. Experiments on synthetic and realistic datasets demonstrate that our algorithm achieves lower regret than uniform queries and greedy queries with equal labeling budget.
- TL;DR: We provide a tight tradeoff between the regret and the number of label queries in online learning settings where the data streams are from a mixture of hidden domains.