Neuro-algorithmic Policies for Discrete PlanningDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: planning, reinforcement learning, combinatorial optimization, control, imitation learning
Abstract: Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to variations in the task are still unsatisfactory. Recent results suggest that generalization of standard architectures improves only after obtaining exhaustive amounts of data.We give evidence that the generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects. Further, we show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures. Many control problems require long-term planning that is hard to solve generically with neural networks alone. We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-depended shortest path solver. These policies can be trained end-to-end by blackbox differentiation. We show that this type of architecture generalizes well to unseen variations in the environment already after seeing a few examples. https://sites.google.com/view/neuro-algorithmic
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We introduce neuro-algorithmic policies embedding shortest path solvers for discrete planning.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=xYc57TD8CWa
9 Replies

Loading