The Differentiable Cross-Entropy Method

Brandon Amos; Denis Yarats

The Differentiable Cross-Entropy Method

Brandon Amos, Denis Yarats

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: DCEM learns latent domains for optimization problems and helps bridge the gap between model-based and model-free RL --- we create a differentiable controller and fine-tune parts of it with PPO

Abstract: We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant (DCEM) that enables us to differentiate the output of CEM with respect to the objective function's parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline in cases this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In the control setting we show on the simulated cheetah and walker tasks that we can embed their optimal action sequences with DCEM and then use policy optimization to fine-tune components of the controller as a step towards combining model-based and model-free RL.

Keywords: machine learning, differentiable optimization, control, reinforcement learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/the-differentiable-cross-entropy-method/code)

Original Pdf: pdf

13 Replies

Loading