RL-DARTS: Differentiable Architecture Search for Reinforcement Learning

Yingjie Miao; Xingyou Song; Daiyi Peng; Summer Yue; John D Co-Reyes; Eugene Brevdo; Aleksandra Faust

RL-DARTS: Differentiable Architecture Search for Reinforcement Learning

Yingjie Miao, Xingyou Song, Daiyi Peng, Summer Yue, John D Co-Reyes, Eugene Brevdo, Aleksandra Faust

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: darts, differentiable, architecture, search, neural, nas, rl, reinforcement, learning, procgen, supernet, softmax, variable, ppo, rainbow, off-policy, on-policy, convolutional, autorl, automated, one-shot, efficient

Abstract: Recently, Differentiable Architecture Search (DARTS) has become one of the most popular Neural Architecture Search (NAS) methods successfully applied in supervised learning (SL). However, its applications in other domains, in particular for reinforcement learning (RL), has seldom been studied. This is due in part to RL possessing a significantly different optimization paradigm than SL, especially with regards to the notion of replay data, which is continually generated via inference in RL. In this paper, we introduce RL-DARTS, one of the first applications of end-to-end DARTS in RL to search for convolutional cells, applied to the challenging, infinitely procedurally generated Procgen benchmark. We demonstrate that the benefits of DARTS become amplified when applied to RL, namely search efficiency in terms of time and compute, as well as simplicity in integration with complex preexisting RL code via simply replacing the image encoder with a DARTS supernet, compatible with both off-policy and on-policy RL algorithms. At the same time however, we provide one of the first extensive studies of DARTS outside of the standard fixed dataset setting in SL via RL-DARTS. We show that throughout training, the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.

One-sentence Summary: Our work is one of the first to study DARTS optimization in reinforcement learning, a completely different paradigm than supervised learning, and we demonstrate DARTS's ability to find competitive and better architectures efficiently.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/rl-darts-differentiable-architecture-search/code)

17 Replies

Loading