Combination of Supervised and Reinforcement Learning For Vision-Based Autonomous Control

Dmitry Kangin; Nicolas Pugeault

Combination of Supervised and Reinforcement Learning For Vision-Based Autonomous Control

Dmitry Kangin, Nicolas Pugeault

15 Feb 2018 (modified: 10 Feb 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Reinforcement learning methods have recently achieved impressive results on a wide range of control problems. However, especially with complex inputs, they still require an extensive amount of training data in order to converge to a meaningful solution. This limitation largely prohibits their usage for complex input spaces such as video signals, and it is still impossible to use it for a number of complex problems in a real world environments, including many of those for video based control. Supervised learning, on the contrary, is capable of learning on a relatively small number of samples, however it does not take into account reward-based control policies and is not capable to provide independent control policies. In this article we propose a model-free control method, which uses a combination of reinforcement and supervised learning for autonomous control and paves the way towards policy based control in real world environments. We use SpeedDreams/TORCS video game to demonstrate that our approach requires much less samples (hundreds of thousands against millions or tens of millions) comparing to the state-of-the-art reinforcement learning techniques on similar data, and at the same time overcomes both supervised and reinforcement learning approaches in terms of quality. Additionally, we demonstrate the applicability of the method to MuJoCo control problems.

TL;DR: The new combination of reinforcement and supervised learning, dramatically decreasing the number of required samples for training on video

Keywords: Reinforcement learning, deep learning, autonomous control

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco), [TORCS](https://paperswithcode.com/dataset/torcs)

8 Replies

Loading