Reinforcement Learning for Bandits with Continuous Actions and Large Context Spaces

Paul Duckworth; Bruno Lacerda; Katherine Vallis; Nick Hawes

Reinforcement Learning for Bandits with Continuous Actions and Large Context Spaces

Paul Duckworth, Bruno Lacerda, Katherine Vallis, Nick Hawes

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Contextual bandits, Continuous actions, Image context, reinforcement learning

TL;DR: We propose a reinforcement learning approach for the challenging contextual bandits scenario with continuous actions that can generalise to large context' spaces, unlike the current literature.

Abstract: We consider the challenging scenario of contextual bandits with continuous actions and large input ``context'' spaces, e.g. images. We posit that by modifying reinforcement learning (RL) algorithms for continuous control, we can outperform hand-crafted contextual bandit algorithms for continuous actions on standard benchmark datasets, i.e. vector contexts. We demonstrate that parametric policy networks outperform recently published tree-based policies in both average regret and costs on held-out samples. Furthermore, in contrast to previous work, we successfully demonstrate that RL algorithms can generalise contextual bandit problems with continuous actions to large context spaces. We obtain state-of-the-art performance using RL and significantly outperform previous methods on image contexts. Lastly, we introduce a new contextual bandits domain with multi-dimensional continuous action space and image context.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

9 Replies

Loading