SynIL: Leveraging Synergy for Offline Imitation Learning from Imperfect Demonstration Datasets

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Imperfect Demonstrations, Offline Reinforcement Learning, Synergy, Demonstration Quality
TL;DR: This study proposes a method to estimate rewards based on synergy, a concept proposed in neuroscience, to improve imitation learning from imperfect demonstrations.
Abstract: Imitation learning has undergone significant evolution with the advent of deep learning. Deep neural networks enable the learning of complex policies directly from demonstrations, without relying on traditional handcrafted feature approaches. However, deep learning-based imitation learning requires high-quality demonstrations, as the policies are directly trained on data. Thus, factors such as task difficulty and expert proficiency can lead to contamination of non-optimal demonstrations in reality, resulting in decreased performance. Previous studies have explored methods for evaluating the quality of demonstrations to identify optimal samples of demonstration. However, they often require the hand-selection of expert data in advance, which becomes increasingly challenging as datasets grow larger. This study proposes a method for automated evaluation of demonstration quality based on synergy, a low-dimensional structure observed in biological systems. The proposed method quantifies the degree of synergy manifestation as rewards to perform offline reinforcement learning. Since synergy is reported to relate to proficiency, we expect this to work as an indicator of the motion quality of demonstrations. Results demonstrated that the synergy-based rewards correlated with true rewards, and synergy-based imitation learning outperformed behavior cloning and even offline reinforcement learning with true rewards in some cases. Those results will offer a new framework for enhancing imitation learning systems with demonstrations in which not every sample is optimal. The proposed method will also contribute to computational neuroscience, as well as robotics and machine learning.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 17042
Loading