Abstract: We introduce a novel technique to address continual reinforcement learning (CRL), i.e., reinforcement learning (RL) in non-stationary environments. This requires agents to rapidly update their policies to new statistics while avoiding the catastrophic forgetting (CF) of previous policies. In RL, CF is commonly circumvented by experience replay (ER) from a large buffer. As we show, this leads to slow updating of policies, since new statistics must be sufficiently represented in the buffer. In addition, non-stationarities can introduce contradictions that an agent needs to adapt to. Our approach, unlike traditional methods such as deep Q-networks (DQNs) with ER, enables fast reaction times under minimal memory requirements, making it suitable for real-world applications. It generalizes adiabatic replay (AR), a recently introduced generative replay (GR) method for continual learning (CL). For evaluation, we introduce two robotic simulation-based CRL benchmarks which are partitioned into tasks by environment shifts, showcasing our approach's ability to retain previously acquired policies while being able to learn novel skills.
External IDs:dblp:conf/is/KrawczykBDG24
Loading