Adaptive Submodular Policy Optimization

Published: 09 May 2025, Last Modified: 28 May 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: submodularity, adaptive submodularity, policy gradients
TL;DR: We propose KL-regularized policy optimization for adaptive submodular maximization, a framework for decision-making under uncertainty with submodular rewards.
Abstract: We propose KL-regularized policy optimization for adaptive submodular maximization. Adaptive submodularity is a framework for decision-making under uncertainty with submodular rewards. The benefit of policy optimization is that we can learn controllers for large action spaces that can utilize state-of-the-art large language model (LLM) priors. The benefit of submodularity are more efficient policy gradient updates because the gradient associated with an action only affects its immediate gain. When the reward model is correctly specified, we prove that our policies monotonically improve as the regularization diminishes and converge to the optimal greedy policy. Our experiments show major gains in statistical efficiency, in both synthetic problems and LLMs.
Submission Number: 341
Loading