Single Step Policy Alignment for Imitation Learning with Auxiliary Imperfect Demonstration

Single Step Policy Alignment for Imitation Learning with Auxiliary Imperfect Demonstration

TMLR Paper4869 Authors

16 May 2025 (modified: 14 Aug 2025)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose a novel one-step supervised imitation learning (IL) framework called Adversarial Density Regression (ADR). This imitation learning (IL) framework seeks to utilize a single-step re-weighted behavioral cloning (BC) objective to rectify the policy acquired under conditions of unknown quality by aligning it with the expert distribution using demonstrations. Specifically, ADR is designed to address several limitations in previous IL algorithms: First, existing off-policy IL algorithms are based on the Bellman operator, which inevitably suffers from cumulative offsets from sub-optimal multi-step rewards. Additionally, these off-policy frameworks suffer from out-of-distribution~(OOD) state-actions. Second, the conservative terms that help solve the OOD issue require nuanced and delicate balancing. To address these limitations, we fully integrate a one-step density-weighted BC objective for IL with auxiliary imperfect demonstration. Theoretically, we demonstrate that this adaptation can effectively correct the distribution of policies trained on unknown-quality datasets to align with the expert policy's distribution. The difference between the empirical and the optimal value function is proportional to the upper bound of ADR's objective, indicating that minimizing ADR's objective is akin to approaching the optimal value. Empirically, we conduct extensive evaluations and find that ADR outperforms all of the selected IL algorithms on tasks from the Gym-Mujoco domain. Meanwhile, ADR achieves about \textbf{90\%} improvement over IQL when utilizing ground truth rewards on tasks from the Adroit and Kitchen domains.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=STt4fLsY1g&noteId=81FTh5Srys

Changes Since Last Submission: Withdrawing previous submission to revise the Authorlist.

Assigned Action Editor: ~Zheng_Wen1

Submission Number: 4869

Loading