Adaptive Representation Selection in Contextual Bandit with Unlabeled History

Baihan Lin; Guillermo Cecchi; Djallel Bouneffouf; Irina Rish

Adaptive Representation Selection in Contextual Bandit with Unlabeled History

Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Irina Rish

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone

Abstract: We consider an extension of the contextual bandit setting, motivated by several practical applications, where an unlabeled history of contexts can become available for pre-training before the online decision-making begins. We propose an approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training on unlabeled history of contexts with online selection and modification of embedding functions. Our experiments on a variety of datasets and in different nonstationary environments demonstrate clear advantages of our approach over the standard contextual bandit.

Keywords: Adaptive Representation, Embedding Selection, Machine Learning, Online Learning, Reinforcement Learning, Meta-Learning

3 Replies

Loading