Keywords: successor features, deep reinforcement learning, synaptic consolidation, plasticity, stability
Abstract: A hallmark of intelligence is the ability to adapt in non-stationary environments, yet deep Reinforcement Learning (RL) agents often struggle in such settings. Most prior studies introduce non-stationarity through abrupt shifts in features or dynamics, whereas real-world changes might be more gradual, reflecting naturalistic continual drift in the underlying dynamics. This may have important implications for studies on the "stability versus plasticity dilemma" in RL, since abrupt changes in the task may necessitate more plasticity than real-world situations actually would demand. To address these concerns, we modify existing 3D Miniworld and MuJoCo environments to incorporate naturalistic, continual non-stationary changes, and use them to identify whether poor performance in RL systems arises from a loss of plasticity or stability. We find that in these settings, methods that preserve stability, such as synaptic consolidation, achieve better performance than those focused on plasticity, such as resetting of a subset of the parameters. Motivated by this finding, and prior evidence that successor features (SFs) reduce interference in non-stationary settings, we investigate whether SFs provide a better target than Q-values for consolidation. Across both environments, we find that applying a neuro-inspired synaptic consolidation mechanism to SFs rather than Q-values yields superior performance on the naturalistic, continual changing MuJoCo tasks. Furthermore, we find that consolidation is most effective when SFs are stabilized across multiple timescales, as different timescales capture complementary aspects of the gradually changing environment. Together, these results show that stability may be more important in continual learning settings when abrupt changes in tasks do not occur. Moreover, to enhance stability, multi-timescale consolidation of predictive representations is an effective approach.
Supplementary Material: pdf
Primary Area: reinforcement learning
Submission Number: 22725
Loading