Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization

Nasik Muhammad Nafi, Raja Farrukh Ali, William H. Hsu

Published: 01 Jan 2023, Last Modified: 25 May 2024AAMAS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Shared policy-value representations in traditional actor-critic architectures have been shown to limit the generalization capabilities of a reinforcement learning (RL) agent. Fully decoupled/separated networks for policy and value avoid overfitting by addressing this representation asymmetry; however, this introduces additional computational overhead. Partial separation has been shown to reduce this overhead while still achieving the same level of generalization. This raises questions regarding the exact need for two separate networks and whether increasing the degree of separation in a partially separated network improves generalization. To investigate these questions, this paper compares four different degrees of network separation (fully shared, early separation, late separation, and full separation) on the RL generalization benchmark Procgen. Our results indicate that for environments without a distinct or explicit source of value estimation, partial late separation captures the necessary policy-value representation asymmetry and achieves better generalization performance than other architectural options in unseen scenarios, while early separation fails to perform adequately. This also gives us a model selection mechanism for those cases where full separation performs best.