GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts

Returaj Burnwal; Anirban Santara; Nirav Pravinbhai Bhatt; Balaraman Ravindran; Gaurav Aggarwal

GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts

Returaj Burnwal, Anirban Santara, Nirav Pravinbhai Bhatt, Balaraman Ravindran, Gaurav Aggarwal

24 May 2023 (modified: 13 Jun 2023)Submitted to RSS-23 LTAMPReaders: Everyone

Abstract: Model predictive control (MPC) is a popular approach for trajectory optimization in practical robotics applications due to guarantees on safety, optimality, generalizability, interpretability, and explainability. Traditional MPC needs a hand-crafted cost function for trajectory optimization. However, some behaviors are complex and hand-crafting is difficult and error-prone. A special class of MPC policies called Learnable-MPC addresses this difficulty by using imitation learning from expert demonstrations. A critical assumption made by Learnable-MPC is that the demonstrator and the imitator agents have identical state-action spaces and transition dynamics. This is hard to satisfy in many practical applications of robotics. In this paper, we address this practical problem through a novel approach that uses a generative adversarial network (GAN) to match state-trajectory distributions of the demonstrator and the imitator. We evaluate our approach on a variety of simulated robotics tasks of DeepMind Control suite and demonstrate the efficacy of our approach at learning the demonstrator's behavior without having to copy their actions.

1 Reply

Loading