Reward Adaptation Via Q-Manipulation

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Modular RL, Reusable RL, Action Pruning, Reward Shaping
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Modular RL approach to solve Reward Adaptation problem via action pruning using source behaviors.
Abstract: In this paper, we introduce reward adaptation (RA), the problem where the learning agent adapts to a target reward function based on one or multiple existing behaviors learned a priori based on their corresponding source reward functions, providing a new perspective of modular reinforcement learning. Reward adaptation has many applications, such as adapting an autonomous driving agent that can already operate either fast or safe to operating both fast and safe. Learning the target behavior from scratch is possible but inefficient given the source behaviors available. Assuming that the target reward function is a polynomial function of the source reward functions, we propose an approach to reward adaptation by manipulating variants of the Q function for the source behaviors, which are assumed to be accessible and obtained when learning the source behaviors prior to learning the target behavior. It results in a novel method named ``Q-Manipulation'' that enables action pruning before learning the target. We formally prove that our pruning strategy for improving sample complexity does not affect the optimality of the returned policy. Comparison with baselines is performed in a variety of synthetic and simulation domains to demonstrate its effectiveness and generalizability.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3919
Loading