Benchmarking Offline Reinforcement Learning in Factorisable Action Spaces

TMLR Paper2503 Authors

10 Apr 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Extending reinforcement learning (RL) to offline contexts is a promising prospect, particularly in sectors where data collection poses substantial challenges or risks. Pivotal to the success of transferring RL offline is mitigating overestimation bias in value estimates for state-action pairs absent from data. Whilst numerous approaches have been proposed in recent years, these tend to focus primarily on continuous or small-scale discrete action spaces. Factorised discrete action spaces, on the other hand, have received relatively little attention, despite many real-world problems naturally having factorisable actions. In this work, we undertake an initial formative investigation into offline reinforcement learning in factorisable action spaces. Using value-decomposition as formulated in DecQN as a foundation, we conduct an extensive empirical evaluation of several offline techniques adapted to the factorised setting. In the absence of established benchmarks, we introduce a suite of our own based on a discretised variant of the DeepMind Control Suite, comprising datasets of varying quality and task complexity. Advocating for reproducible research and innovation, we make all datasets available for public use, alongside our code base.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Following reviewer comments we have made the following changes: Added missing literature to Section 3.2 Removed Section 4 (The case for factorisation and decomposition in offline-RL) Added extra details on action discretisation procedure in Section 5 Added clarity around factorised behavioural cloning in Section 6.1 Added paragraph on benchmark limitations in Section 7 Removed proofs from Appendix Added aggregated scores to Tables 6 and 7 in the Appendix
Assigned Action Editor: ~Marc_Lanctot1
Submission Number: 2503
Loading