Action Shapley: A training data selection metric for high performance and cost efficient reinforcement learning

16 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: interpretability, control, environment model, training, reinforcement learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A metric for interpretable reinforcement learning
Abstract: Reinforcement learning (RL) deals with a goal-seeking agent learning to achieve its goal by a sequence of trial-and-error based decisions as it interacts with a stochastic environment. While RL achieves outstanding success in playing complex video games that allow a large number of trial-and-errors, errors are always undesirable in the real world. To reduce errors, model-based RL first develops an environment model in which trial-and-errors can take place without real costs. Different training actions produces different environment models which in turn produce different RL agents. Superior interpretability demands granular understanding of the differential impact of the training actions on the resulting RL agent performance. To aid this understanding, we offer Action Shapley, an agnostic metric for the selection of training actions. For Action Shapley computation, we include an algorithm for which avoids exponential complexity. We also show how Action Shapley can be used to select a high performance training action set. We demonstrate the effectiveness of Action Shapley through four real-world case studies involving dynamic controls of enterprise IT systems. First, the proposed Action Shapley computation algorithm saves more than 80\% computational cycles compared to the corresponding brute-force exponential time computation. Second, the proposed Action Shapley-based training action selection policy produces the high performance RL agents most of the times in four case studies.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 525
Loading