Hedge Your Actions: Flexible Reinforcement Learning for Complex Action SpacesDownload PDF

Anonymous

22 Sept 2022, 12:30 (modified: 26 Oct 2022, 13:54)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Keywords: Efficient Reinforcement Learning, Large Action Space, Listwise Action Retrieval
TL;DR: Flexible reinforcement learning under complex innumerable action spaces via listwise action retrieval
Abstract: Real-world decision-making is often associated with large and complex action representations, which can even be unsuited for the task. For instance, the items in recommender systems have generic representations that apply to each user differently, and the actuators of a household robot can be high-dimensional and noisy. Prior works in discrete and continuous action space reinforcement learning (RL) define a retrieval-selection framework to deal with problems of scale. The retrieval agent outputs in the space of action representations to retrieve a few samples for a selection critic to evaluate. But, learning such retrieval actors becomes increasingly inefficient as the complexity in the action space rises. Thus, we propose to treat the retrieval task as one of listwise RL to propose a list of action samples that enable the selection phase to maximize the environment reward. By hedging its action proposals, we show that our agent is more flexible and sample efficient than conventional approaches while learning under a complex action space. Results are also present on \url{https://sites.google.com/view/complexaction}.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
14 Replies

Loading