Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality
The goal of the Inverse reinforcement learning (IRL) task is to identify the underlying reward function and the corresponding optimal policy from a set of expert demonstrations. Most algorithms with theoretical guarantees in IRL assume the reward has a linear structure. In this work, we want to extend our understanding of the IRL problem when the reward is parametrized by some neural network structures. Meanwhile, conventional IRL algorithms usually adopt a nested structure, thus exhibiting computational inefficiency, especially when the MDP is high-dimensional. We address this problem by proposing the first neural single-loop maximum likelihood algorithm. Due to the nonlinearity of neural network approximation, the previous global convergence result established on linear reward scenarios is no longer guaranteed. We give the nonasymptotic convergence analysis of our proposed neural algorithm by using the overparameterization of certain neural networks. However, it still remains unknown whether the proposed neural algorithm can identify the globally optimal reward and the corresponding optimal policy. Under some over-parameterized neural network structures, we provide affirmative answers to both questions. To our knowledge, this is the first IRL algorithm with a non-asymptotic convergence guarantee that identifies provably global optimum within neural network settings.