Adaptive Representation Selection in Contextual Bandit with Unlabeled History

Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Irina Rish

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: We consider an extension of the contextual bandit setting, motivated by several practical applications, where an unlabeled history of contexts can become available for pre-training before the online decision-making begins. We propose an approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training on unlabeled history of contexts with online selection and modification of embedding functions. Our experiments on a variety of datasets and in different nonstationary environments demonstrate clear advantages of our approach over the standard contextual bandit.
  • Keywords: Adaptive Representation, Embedding Selection, Machine Learning, Online Learning, Reinforcement Learning, Meta-Learning

Loading