A Simple Scaling Model for Bootstrapped DQN

TMLR Paper6517 Authors

15 Nov 2025 (modified: 20 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present a large-scale empirical study of Bootstrapped DQN (BDQN) and Randomized-Prior BDQN (RP-BDQN) in the DeepSea environment designed to isolate and parameterize exploration difficulty. Our primary contribution is a simple scaling model that accurately captures the probability of reward discovery as a function of task hardness and ensemble size. This model is parameterized by a method-dependent effectiveness factor, $\psi$. Under this framework, RP-BDQN demonstrates substantially higher effectiveness ($\psi \approx 0.87$) compared to BDQN ($\psi \approx 0.80$), enabling it to solve more challenging tasks. Our analysis reveals that this advantage stems from RP-BDQN's sustained ensemble diversity, which mitigates the posterior collapse observed in BDQN. Interestingly, the model's success, despite assuming member independence, suggests that complex ensemble interactions may be a secondary factor in overall performance. Furthermore, we show how systematic deviations from this simple model can be used to diagnose more subtle dynamics like cooperation and diversity saturation. These results offer practical guidance for ensemble configuration and propose a methodological framework for future studies of deep exploration.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zheng_Wen1
Submission Number: 6517
Loading