Coupled Gradient Estimators for Discrete Latent Variables

Zhe Dong; Andriy Mnih; George Tucker

Coupled Gradient Estimators for Discrete Latent Variables

Zhe Dong, Andriy Mnih, George Tucker

Published: 21 Dec 2020, Last Modified: 05 May 2023AABI2020Readers: Everyone

Abstract: Training models with discrete latent variables is challenging due to the difficulty of estimating the gradients accurately. Much of the recent progress has been achieved by taking advantage of continuous relaxations of the system, which are not always available or even possible. The recently introduced Augment-REINFORCE-Swap (ARS) and Augment-REINFORCE-Swap-Merge (ARSM) estimators (Yin and Zhou, 2019) provide a promising alternative to relaxation-based gradient estimators for discrete latent variables. Instead of relaxing the variables, ARS and ARSM reparameterize them as deterministic transformations of underlying continuous variables. The estimators leverage coupled samples and a careful construction relying on symmetries of the Dirichlet distribution and exponential racing. We observe, however, that the continuous augmentation, which is the first step in ARS and ARSM, increases the variance of the REINFORCE estimator. Inspired by recent work (Dong et al., 2020), we improve both estimators by analytically integrating out unnecessary randomness introduced by the augmentation and reducing the variance of the estimator substantially. We show that the resulting estimators consistently outperform ARS and ARSM. However, we find that REINFORCE with a leave-one-out-baseline (Kool et al., 2019) greatly outperformsARS and ARSM in all cases and is competitive or outperforms our improved estimators. As it is a simpler estimator to implement, we recommend it in practice.

1 Reply

Loading