Keywords: Inverse Constrained Reinforcement Learning, Inductive Logic Programming, Deep Q Learning
TL;DR: The proposed method integrates Deep Q-Learning and Inductive Logic Programming to learn and revise agent control policies with respect to domain constraints revealed through self-guided domain exploration and user-guided examples of expert behaviour.
Track: Main Track
Abstract: Inverse Constrained Reinforcement Learning (ICRL) is an established field of policy learning that augments reward-driven exploratory optimisation with example-driven constraint inference aimed at exploiting limited observations of expert behaviour.
This paper proposes a generalisation of ICRL that employs weighted constraints to better support lifelong learning and to handle domains with potentially conflicting social norms. We introduce a Neuro-Symbolic ICRL approach (NSICRL) with two key components:
a symbolic system based on Inductive Logic Programming (ILP) that infers first-order constraints which are human-interpretable and generalise across environment configurations; and a neural system based on Deep Q learning (DQL) that efficiently learns near-optimal policies subject to those constraints. By weighting the high-level ILP constraints (based on the order in which they are learnt) and encoding them as low-level state-action penalties in the DQL reward function, we effectively allow earlier constraints to be overridden by later ones. Unlike prior work in ICRL, our approach is able to continue working when exposed to newly encountered expert behaviours that reveal more nuanced exceptions to previously learnt constraints. We evaluate NSICRL in a simulated traffic domain, which shows how it outperforms existing methods in terms of efficiency and accuracy when learning hard constraints; and which also shows the utility of learning defeasible norms in an ICRL context. To the best of our knowledge, this is the first approach that places equal emphasis on exploratory and imitative learning while also being able to infer defeasible norms in an interpretable way that scales to non-trivial examples.
Paper Type: Long Paper
Submission Number: 40
Loading