Bridging the performance-gap between target-free and target-based reinforcement learning

Bridging the performance-gap between target-free and target-based reinforcement learning

ICLR 2026 Conference Submission7275 Authors

Published: 26 Jan 2026, Last Modified: 06 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep reinforcement learning, Q-learning, function approximation

TL;DR: We introduce iterated Shared Q-Network, a new algorithm improving the sample-efficiency of target-free algorithms to bridge the gap with target-based algorithms.

Abstract: The use of target networks in deep reinforcement learning is a widely popular solution to mitigate the brittleness of semi-gradient approaches and stabilize learning. However, target networks notoriously require additional memory and delay the propagation of Bellman updates compared to an ideal target-free approach. In this work, we step out of the binary choice between target-free and target-based algorithms. We introduce a new method that uses a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network. This simple modification enables us to keep the target-free's low-memory footprint while leveraging the target-based literature. We find that combining our approach with the concept of iterated $Q$-learning, which consists of learning consecutive Bellman updates in parallel, helps improve the sample-efficiency of target-free approaches. Our proposed method, iterated Shared $Q$-Learning (iS-QL), bridges the performance gap between target-free and target-based approaches across various problems while using a single $Q$-network, thus stepping towards resource-efficient reinforcement learning algorithms.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 7275

Loading