Modelling the control of offline processing with reinforcement learning

Eleanor Spens; Neil Burgess; Timothy Edward John Behrens

Modelling the control of offline processing with reinforcement learning

Eleanor Spens, Neil Burgess, Timothy Edward John Behrens

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: computational neuroscience, offline learning, reinforcement learning, hippocampal replay, generative replay

TL;DR: We propose a framework in which a meta-controller learns to coordinate offline learning in 'sleep' phases to maximise reward in an 'awake' phase, choosing between different actions which correspond to types of offline process in the brain.

Abstract: Brains reorganise knowledge offline to improve future behaviour, with 'replay' involved in consolidating memories, abstracting patterns from experience, and simulating new scenarios. However, there are few models of how the brain might orchestrate these processes, and of when different types of replay might be useful. Here we propose a framework in which a meta-controller learns to coordinate offline learning of a lower-level agent or model in 'sleep' phases to maximise reward in an 'awake' phase. The meta-controller selects among several actions, such as learning from recent memories in a hippocampal store, abstracting patterns from memories into a 'world model', and learning from generated data. In addition, the meta-controller learns to estimate the value of each episode, enabling the prioritisation of past events in memory replay, or of new simulations in generative replay. Using image classification, maze solving, and relational inference tasks, we show that the meta-controller learns an adaptive curriculum for offline learning. This lays the groundwork for normative predictions about replay in a range of experimental neuroscience tasks.

Primary Area: Neuroscience and cognitive science (e.g., neural coding, brain-computer interfaces)

Submission Number: 12241

Loading