Achieving Precise Control with Slow Hardware: Model-Based Reinforcement Learning for Action Sequence Learning
Keywords: Action Sequence Learning, Basal Ganglia, Prefrontal Cortex, Reinforcement Learning, Model Based
TL;DR: The paper introduces a biologically plausible model of sequence learning that achieves precise control by utilizing different temporal resolutions for training and control.
Abstract: Current reinforcement learning (RL) models are often claimed to explain animal behavior. However, they are designed for artificial agents that sense, think, and react much faster than the brain, and they tend to fail when operating under human-like sensory and reaction times. Despite using slow neurons, the brain achieves precise and low-latency control through a combination of predictive and sequence learning. The basal ganglia is hypothesized to learn compressed representations of action sequences, allowing the brain to produce a series of actions for a given input. We present the Hindsight-Sequence-Planner (HSP), a model of the basal ganglia and the prefrontal cortex that operates under "brain-like" conditions: slow information processing with quick sensing and actuation. Our "temporal recall" mechanism is inspired by the prefrontal cortex's role in sequence learning, where the agent uses an environmental model to replay memories at a finer temporal resolution than its processing speed while addressing the credit assignment problem caused by scalar rewards in sequence learning. HSP employs model-based training to achieve model-free control, resulting in precise and efficient behavior that appears low-latency despite running on slow hardware. We test HSP on various continuous control tasks, demonstrating that it not can achieve comparable performance 'human-like' frequencies by relying on significantly fewer observations and actor calls (actor sample complexity).
Supplementary Material: zip
Primary Area: Reinforcement learning
Submission Number: 18062
Loading