Distributional Reinforcement Learning in the Mammalian Brain
Keywords: reinforcement learning, dopamine, basal ganglia, striatum, population coding, probability
TL;DR: The parts of the mammalian brain involved in reward processing implement a form of distributional reinforcement learning
Abstract: Distributional reinforcement learning (dRL) — learning to predict not just the average return but the entire probability distribution of returns — has achieved impressive performance across a wide range of benchmark machine learning tasks. In vertebrates, the basal ganglia strongly encodes mean value and has long been thought to implement RL, but little is known about whether, where, and how populations of neurons in this circuit encode information about higher-order moments of reward distributions. To fill this gap, we used Neuropixels probes to acutely record striatal activity from well-trained, water-restricted mice performing a classical conditioning task. Across several measures of representational distance, odors associated with the same reward distribution were encoded more similarly to one another than to odors associated with the same mean reward but different reward variance, as predicted by dRL but not traditional RL. Optogenetic manipulations and computational modeling suggested that genetically distinct populations of neurons encoded the left and right tails of these distributions. Together, these results reveal a remarkable degree of convergence between dRL and the mammalian brain and hint at further biological specializations of the same overarching algorithm.
Track: Extended Abstract Track
Submission Number: 17