everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
We propose KL-regularized policy optimization for adaptive submodular maximization. Adaptive submodularity is a framework for decision making under uncertainty with submodular rewards. The benefit of policy optimization is that we can learn controllers for large action spaces that can utilize state-of-the-art large language model (LLM) priors. The benefit of submodularity are more efficient policy gradient updates because the gradient associated with an action only affects its immediate gain. When the reward model is correctly specified, we prove that our policies monotonically improve as the regularization diminishes and converge to the optimal greedy policy. Our experiments show major gains in statistical efficiency, in both synthetic problems and LLMs.