Neural Network Bandit Learning by Last Layer Marginalization

Noah Weber; Janez Starc; Arpit Mittal; Roi Blanco; Lluis Marquez

Neural Network Bandit Learning by Last Layer Marginalization

Noah Weber, Janez Starc, Arpit Mittal, Roi Blanco, Lluis Marquez

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: We propose a new method for training neural networks online in a bandit setting. Similar to prior work, we model the uncertainty only in the last layer of the network, treating the rest of the network as a feature extractor. This allows us to successfully balance between exploration and exploitation due to the efficient, closed-form uncertainty estimates available for linear models. To train the rest of the network, we take advantage of the posterior we have over the last layer, optimizing over all values in the last layer distribution weighted by probability. We derive a closed form, differential approximation to this objective and show empirically over various models and datasets that training the rest of the network in this fashion leads to both better online and offline performance when compared to other methods.

Keywords: Bandit learning, online learning, contextual bandits, neural network learning in online settings

TL;DR: This paper proposes a new method for neural network learning in online bandit settings by marginalizing over the last layer

1 Reply

Loading