Federated Maximum Likelihood Inverse Reinforcement Learning with Convergence Guarantee

Guangyu Jiang; Mahdi Imani; Nathaniel D. Bastian; Tian Lan

Federated Maximum Likelihood Inverse Reinforcement Learning with Convergence Guarantee

Guangyu Jiang, Mahdi Imani, Nathaniel D. Bastian, Tian Lan

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inverse Reinforcement Learning, Decentralized learning.

TL;DR: The paper considers decentralized inverse reinforcement learning on distributed clients/data. It presents a rigorous convergence analysis using a novel dual-aggregation and bi-level optimization.

Abstract: Inverse Reinforcement Learning (IRL) aims to recover the latent reward function and corresponding optimal policy from observed demonstrations. Existing IRL research predominantly focuses on a centralized learning approach, not suitable for real-world problems with distributed data and privacy restrictions. To this end, this paper proposes a novel algorithm for federated maximum-likelihood IRL (F-ML-IRL) and provides a rigorous analysis of its convergence and time-complexity. The proposed F-ML-IRL leverages a dual-aggregation to update the shared global model and performs bi-level local updates -- an upper-level learning task to optimize the parameterized reward function by maximizing the discounted likelihood of observing expert trajectories under the current policy and a low-level learning task to find the optimal policy concerning the entropy-regularized discounted cumulative reward under the current reward function. We analyze the convergence and time-complexity of the proposed F-ML-IRL algorithm and show that the global model in F-ML-IRL converges to a stationary point for both the reward and policy parameters within finite time, i.e., the log-distance between the recovered policy and the optimal policy, as well as the gradient of the likelihood objective, converge to zero. Finally, evaluating our F-ML-IRL algorithm on high-dimensional robotic control tasks in MuJoCo, we show that it ensures convergences of the recovered reward in decentralized learning and even outperforms centralized baselines due to its ability to utilize distributed data.

Supplementary Material: pdf

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10244

Loading