Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

Qingfeng Lan; Yangchen Pan; Jun Luo; A. Rupam Mahmood

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood

Published: 18 Apr 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Event Certifications: lifelong-ml.cc/CoLLAs/2023/Journal_Track

Abstract: Artificial neural networks are promising for general function approximation but challenging to train on non-independent or non-identically distributed data due to catastrophic forgetting. The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later. However, a large replay buffer results in a heavy memory burden, especially for onboard and edge devices with limited memory capacities. We propose memory-efficient reinforcement learning algorithms based on the deep Q-network algorithm to alleviate this problem. Our algorithms reduce forgetting and maintain high sample efficiency by consolidating knowledge from the target Q-network to the current Q-network. Compared to baseline methods, our algorithms achieve comparable or better performance in both feature-based and image-based tasks while easing the burden of large experience replay buffers.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: There are two major changes. 1. The title is improved. 2. The Atari results are moved from the appendix to the main paper to better support our claims. The claims are revised to be limited to value-based control tasks.

Code: https://github.com/qlan3/MeDQN

Supplementary Material: zip

Assigned Action Editor: ~Branislav_Kveton1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 850

Loading