Keywords: Exploration, Directional Statistics, Hyperspherical Embeddings, Reinforcement Learning, Scalability, von Mises-Fisher Distribution, Recommender Systems
TL;DR: When action sets are too large, it is not possible to sample from a softmax distribution in real-time ; we propose a new scalable approach to sample actions represented by embeddings and mathematically prove its ties to Boltzmann exploration.
Abstract: This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, then exploring this representation's nearest neighbors, which scales to virtually unlimited numbers of candidate actions.
We show that, under theoretical assumptions, vMF-exp asymptotically maintains the same probability of exploring each action as Boltzmann Exploration (B-exp), a popular alternative that, nonetheless, suffers from scalability issues as it requires computing softmax values for each action.
Consequently, vMF-exp serves as a scalable alternative to B-exp for exploring large action sets with hyperspherical embeddings.
In the final part of this paper, we further validate the empirical relevance of vMF-exp by discussing its successful deployment at scale on a music streaming service. On this service, vMF-exp has been employed for months to recommend playlists inspired by initial songs to millions of users, from millions of possible actions for each playlist.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7212
Loading