Bridging Discrete and Backpropagation: Straight-Through and Beyond

Liyuan Liu; Chengyu Dong; Xiaodong Liu; Bin Yu; Jianfeng Gao

Bridging Discrete and Backpropagation: Straight-Through and Beyond

Liyuan Liu, Chengyu Dong, Xiaodong Liu, Bin Yu, Jianfeng Gao

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 oralEveryoneRevisionsBibTeX

Keywords: discrete random variables, back-propagation, straight through

TL;DR: We show Straight-Through works as a first-order approximation of the gradient and propose ReinMax, which achieves second-order accuracy with negligible computation overheads.

Abstract: Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun’s method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on various tasks demonstrate the superiority of ReinMax over the state of the art.

Supplementary Material: zip

Submission Number: 3782

Loading