Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality
Abstract: The goal of the Inverse reinforcement learning (IRL) task is to identify the underlying
reward function and the corresponding optimal policy from a set of expert demonstrations. Most algorithms with theoretical guarantees in IRL assume the reward
has a linear structure. In this work, we want to extend our understanding of the
IRL problem when the reward is parametrized by some neural network structures. Meanwhile, conventional IRL algorithms usually adopt a nested structure, thus
exhibiting computational inefficiency, especially when the MDP is high-dimensional.
We address this problem by proposing the first neural single-loop maximum
likelihood algorithm. Due to the nonlinearity of neural network approximation, the
previous global convergence result established on linear reward scenarios is no longer
guaranteed. We give the nonasymptotic convergence analysis of our proposed neural
algorithm by using the overparameterization of certain neural networks. However,
it still remains unknown whether the proposed neural algorithm can identify the
globally optimal reward and the corresponding optimal policy. Under some over-parameterized neural network structures, we provide affirmative answers to both
questions. To our knowledge, this is the first IRL algorithm with a non-asymptotic
convergence guarantee that identifies provably global optimum within neural network
settings.
Submission Number: 1017
Loading