Abstract: Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favorable exploration characteristics, while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and improve energy efficiency, while regularization via action costs can be detrimental to exploration. Our work aims to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution. We take advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that enable surprisingly strong performance on continuous control tasks.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: Yes
Submission Number: 46
Loading