TL;DR: This paper introduces a strategically efficient exploration framework for Inverse Constrained Reinforcement Learning problems with theoretically tractable sample complexity.
Abstract: Optimizing objective functions subject to constraints is fundamental in many real-world applications. However, these constraints are often not readily defined and must be inferred from expert agent behaviors, a problem known as Inverse Constraint Inference. Inverse Constrained Reinforcement Learning (ICRL) is a common solver for recovering feasible constraints in complex environments, relying on training samples collected from interactive environments. However, the efficacy and efficiency of current sampling strategies remain unclear. We propose a strategic exploration framework for sampling with guaranteed efficiency to bridge this gap. By defining the feasible cost set for ICRL problems, we analyze how estimation errors in transition dynamics and the expert policy influence the feasibility of inferred constraints. Based on this analysis, we introduce two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimations or 2) strategically constraining the exploration policy around plausibly optimal ones. Both algorithms are theoretically grounded with tractable sample complexity, and their performance is validated empirically across various environments.
Lay Summary: Imagine teaching a robot to follow rules in a factory, but without ever telling it the rules directly. Instead, we show it how experts behave and ask it to "guess" what rules they’re following. This is a common challenge in AI called inverse constraint inference: figuring out the hidden rules behind smart behavior.
To do this, AI systems often explore their environment and learn from what happens. However, exploration takes time and resources, and not all is equally useful. Our research develops smarter ways for AI to explore while learning these hidden rules, making the process faster and more reliable.
We created two new strategies that help AI focus its attention on the most useful parts of the environment or adjust its learning to reduce errors. Both strategies come with theoretical guarantees and work well in real-world-like test scenarios. This makes them promising tools for building more efficient and trustworthy AI systems that need to learn from expert behavior without explicit instructions.
Primary Area: Reinforcement Learning->Inverse
Keywords: Inverse Constrained Reinforcement Learning, Exploration Algorithm, Sample Efficiency
Submission Number: 855
Loading