Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Xinjie Fan; Yizhe Zhang; Zhendong Wang; Mingyuan Zhou

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou

Published: 20 Dec 2019, Last Modified: 22 Jun 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines.

Code: https://github.com/xinjiefan/ACMC_ICLR

Keywords: binary softmax, discrete variables, policy gradient, pseudo actions, reinforcement learning, variance reduction

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/adaptive-correlated-monte-carlo-for/code)

Original Pdf: pdf

7 Replies

Loading