Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: When action sets are too large, it is not possible to sample from a softmax distribution in real-time ; we propose a new scalable approach to sample actions represented by embeddings and mathematically prove its ties to Boltzmann exploration.
Abstract: This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, then exploring this representation's nearest neighbors, which scales to virtually unlimited numbers of candidate actions. We show that, under theoretical assumptions, vMF-exp asymptotically maintains the same probability of exploring each action as Boltzmann Exploration (B-exp), a popular alternative that, nonetheless, suffers from scalability issues as it requires computing softmax values for each action. Consequently, vMF-exp serves as a scalable alternative to B-exp for exploring large action sets with hyperspherical embeddings. Experiments on simulated data, real-world public data, and the successful large-scale deployment of vMF-exp on the recommender system of a global music streaming service empirically validate the key properties of the proposed method.
Lay Summary: This paper presents a new way to help algorithms make better decisions—decisions that include an element of randomness, which is useful for exploring different options—when they have many choices to consider. The method, called von Mises-Fisher exploration (vMF-exp), is designed to work efficiently even when there are millions of possible choices—like recommending songs to users on a music app. It does this by first randomly selecting a direction that seems promising and then exploring options that are similar to it, rather than checking every single choice. The researchers show that this method performs as well as older techniques but is much faster and more practical for large systems. They tested it with both simulated and real-world data, including on a major music streaming platform, and found that it works effectively at scale.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/deezer/vMF-exploration
Primary Area: Probabilistic Methods->Monte Carlo and Sampling Methods
Keywords: Efficient Sampling, Probability Theory, Large-Scale, Embeddings, Real-World Application, Reinforcement Learning
Flagged For Ethics Review: true
Submission Number: 12018
Loading