Online Limited Memory Neural-Linear Bandits

Tom Zahavy; Ofir Nabati; Leor Cohen; Shie Mannor

Online Limited Memory Neural-Linear Bandits

Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Abstract: We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. Neural-linear bandits leverage the representation power of deep neural networks and combine it with efficient exploration mechanisms, designed for linear contextual bandits, on top of the last hidden layer. Since the representation is optimized during learning, information regarding exploration with “old” features is lost. We propose the first limited memory neural- linear bandit that is resilient to this catastrophic forgetting phenomenon by solving a semi-definite program. We then approximate the semi-definite program using stochastic gradient descent to make the algorithm practical and adjusted for online usage. We perform simulations on a variety of data sets, including regression, classification, and sentiment analysis. In addition, we evaluate our algorithm in a challenging uplink rate-control application. The bandit controls the transmission rates of data segments over cellular links to achieve optimal throughput. We observe that our algorithm achieves superior performance and shows resilience to catastrophic forgetting.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/online-limited-memory-neural-linear-bandits/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=XGTzhXjYP7

6 Replies

Loading