The Differentiable Cross-Entropy MethodDownload PDF

25 Sept 2019 (modified: 22 Oct 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
TL;DR: DCEM learns latent domains for optimization problems and helps bridge the gap between model-based and model-free RL --- we create a differentiable controller and fine-tune parts of it with PPO
Abstract: We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant (DCEM) that enables us to differentiate the output of CEM with respect to the objective function's parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline in cases this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In the control setting we show on the simulated cheetah and walker tasks that we can embed their optimal action sequences with DCEM and then use policy optimization to fine-tune components of the controller as a step towards combining model-based and model-free RL.
Keywords: machine learning, differentiable optimization, control, reinforcement learning
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:1909.12830/code)
Original Pdf: pdf
13 Replies

Loading