Biological Neurons vs Deep Reinforcement Learning: Sample efficiency in a simulated game-world

Forough Habibollahi; Moein Khajehnejad; Amitesh Gaurav; Brett Joseph Kagan

Biological Neurons vs Deep Reinforcement Learning: Sample efficiency in a simulated game-world

Forough Habibollahi, Moein Khajehnejad, Amitesh Gaurav, Brett Joseph Kagan

09 Oct 2022 (modified: 05 May 2023)LMRL 2022 PaperReaders: Everyone

Keywords: Deep Reinforcement Learning, In Vitro Neuronal Cultures, Sample efficiency, Neural Networks

TL;DR: We compare the learning curve and the performance of biological neurons against time-matched learning from DQN, A2C, and PPO algorithms in the simulated game environment of Pong.

Abstract: How do synthetic biological systems and artificial neural networks compete in their performance in a game environment? Reinforcement learning has undergone significant advances, however remains behind biological neural intelligence in terms of sample efficiency. Yet most biological systems are significantly more complicated than most algorithms. Here we compare the inherent intelligence of in vitro biological neuronal networks to state-of-the-art deep reinforcement learning algorithms in the arcade game 'pong'. We employed DishBrain, a system that embodies in vitro neural networks with in silico computation using a high-density multielectrode array. We compared the learning curve and the performance of these biological systems against time-matched learning from DQN, A2C, and PPO algorithms. Agents were implemented in a reward-based environment of the `Pong' game. Key learning characteristics of the deep reinforcement learning agents were tested with those of the biological neuronal cultures in the same game environment. We find that even these very simple biological cultures typically outperform deep reinforcement learning systems in terms of various game performance characteristics, such as the average rally length implying a higher sample efficiency. Furthermore, the human cell cultures proved to have the overall highest relative improvement in the average number of hits in a rally when comparing the initial 5 minutes and the last 15 minutes of each designed gameplay session.

0 Replies

Loading