A Model of Place Field Reorganization During Reward Maximization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Optimizing basis function parameters using the Temporal Difference error recapitulates several neural phenomena, and improves the speed and flexibility of policy learning
Abstract: When rodents learn to navigate in a novel environment, a high density of place fields emerges at reward locations, fields elongate against the trajectory, and individual fields change spatial selectivity while demonstrating stable behavior. Why place fields demonstrate these characteristic phenomena during learning remains elusive. We develop a normative framework using a reward maximization objective, whereby the temporal difference (TD) error drives place field reorganization to improve policy learning. Place fields are modeled using Gaussian radial basis functions to represent states in an environment, and directly synapse to an actor-critic for policy learning. Each field's amplitude, center, and width, as well as downstream weights, are updated online at each time step to maximize rewards. We demonstrate that this framework unifies three disparate phenomena observed in navigation experiments. Furthermore, we show that these place field phenomena improve policy convergence when learning to navigate to a single target and relearning multiple new targets. To conclude, we develop a simple normative model that recapitulates several aspects of hippocampal place field learning dynamics and unifies mechanisms to offer testable predictions for future experiments.
Lay Summary: A place field is a localized area in an environment in which a place cell in the hippocampus fires. Their population activity allows one to decode the location, resembling a biological global position system. As animals navigate in a novel environment, randomly distributed place fields change during learning: clustering near rewards, elongating along paths, and changing their spatial selectivity ("drift") even after behavior stabilizes. Why these changes occur in individual cells, and how they aid learning remains unclear. We developed a simple computational model where place fields can adapt to maximize rewards. Using the feedback during trial-and-error based learning, the model adjusts each field's property—such as location, size, and strength. This reorganization mimics how animals refine their understanding of their environment i.e. location of the goal and home. Importantly, our model explains all three observed phenomena: reward clustering, field elongation, and drift. By linking individual place field changes to reward-driven learning, our work offers a unified theory for how cells in the brain optimizes its encoding of the environment for navigation. Additionally, we show that changing the encoding improves the speed of learning, compared to using a default random representation. Furthermore, the model suggests that field drift, once thought random, may help animals to quickly adapt to new goals. This could inspire more flexible AI navigation systems and guide experiments to test brain learning mechanisms.
Link To Code: https://github.com/Pehlevan-Group/placefield_reorg_agent
Primary Area: Applications->Neuroscience, Cognitive Science
Keywords: Reinforcement learning, Temporal Difference error, Hippocampus, Place cells, navigation
Submission Number: 8570
Loading