Non-stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Zheqing Zhu; Yueyang Liu; Kuang Xu; Benjamin Van Roy

Non-stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Zheqing Zhu, Yueyang Liu, Kuang Xu, Benjamin Van Roy

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Nonstationary Contextual Bandit, Neural Bandit Learning, Continual Learning, Exploration vs Exploitation

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We introduce a pioneering non-stationary neural contextual bandit algorithm that is not only scalable with deep neural nets but also prioritizes acquiring pertinent information that remains relevant for a long period of time under nonstationarity.

Abstract: Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8538

Loading