A Bio-inspired Reinforcement Learning Rule to Optimise Dynamical Neural Networks for Robot Control

Tianqi Wei, Barbara Webb

Published: 01 Jan 2018, Last Modified: 01 Oct 2024IROS 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most approaches for optimisation of neural networks are based on variants of back-propagation. This requires the network to be time invariant and differentiable; neural networks with dynamics are thus generally outside the scope of these methods. Biological neural circuits are highly dynamic yet clearly able to support learning. We propose a reinforcement learning approach inspired by the mechanisms and dynamics of biological synapses. The network weights undergo spontaneous fluctuations, and a reward signal modulates the centre and amplitude of fluctuations to converge to a desired network behaviour. We test the new learning rule on a 2D bipedal walking simulation, using a control system that combines a recurrent neural network, a bio-inspired central pattern generator layer and proportional-integral control, and demonstrate the first successful solution to this benchmark task.