Policy Refinement with Human Feedback for Safe Reinforcement Learning

Published: 27 Apr 2023, Last Modified: 09 Jul 2023PRLEveryoneRevisionsBibTeX
Keywords: Safe RL, policy repair, human feedback
TL;DR: This position paper presents an approach for RL policy repair in safety-critical applications, combining optimization, learning, and human feedback to enhance safety and performance.
Abstract: In this position paper, we discuss policy refinement in reinforcement learning (RL), focusing on safety-critical applications. We propose an integrated approach that combines Bayesian optimization, inverse RL, human feedback, and natural language processing to address challenges in policy refinement. We also examine the limitations of these methods and provide an outlook for the future of policy refinement in RL. Our aim is to contribute to the ongoing conversation and foster collaboration in this crucial area, driving the development of safe and responsible RL policies for real-world, safety-critical applications.
Submission Number: 8
Loading