Programmatic Reinforcement Learning without OraclesDownload PDF


Sep 29, 2021 (edited Oct 06, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: Reinforcement Learning, Programmatic Reinforcement Learning, Compositional Reinforcement Learning, Program Synthesis, Differentiable Architecture Search
  • Abstract: Deep reinforcement learning (RL) has led to encouraging successes in many challenging control tasks. However, a deep RL model lacks interpretability due to the difficulty of identifying how the model's control logic relates to its network structure. Programmatic policies structured in more interpretable representations emerge as a promising solution. Yet two shortcomings remain: First, synthesizing programmatic policies requires optimizing over the discrete and non-differentiable search space of program architectures. Previous works are suboptimal because they only enumerate program architectures greedily guided by a pretrained RL oracle. Second, these works do not exploit compositionality, an important programming concept, to reuse and compose primitive functions to form a complex function for new tasks. Our first contribution is a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules. Our algorithm allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle. Our second contribution is improving programmatic policies to support compositionality by integrating primitive functions learned to grasp task-agnostic skills as a composite program to solve novel RL problems. Experiment results demonstrate that our algorithm excels in discovering optimal programmatic policies that are highly interpretable.
  • One-sentence Summary: We present a differentiable program architecture search framework to synthesize interpretable, generalizable, and compositional programs for controlling reinforcement learning applications.
  • Supplementary Material: zip
0 Replies