ConceptACT: Integrating High-Level Semantic Concepts into Transformer-Based Imitation Learning

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Concept Learning, Transformers, Robotics
TL;DR: We extend ACT (Imitation Learning) to incorporate episode-level semantic concepts through a Concept Transformer module, achieving a significant reduction in optimality gap on robotic pick-and-place tasks during training.
Abstract: Imitation learning in robotics allows humans to teach complex tasks by demonstration. While this training regime is quite powerful, most current approaches only rely on directly recorded data, such as joint values and image inputs. In this work we address this limitation by allowing humans to provide high-level annotations for each episode, which can contain additional semantic information. We include this information in the training process through a concept transformer and therefore enforce that the learning model can handle this additional information during training. We show in an experiment involving "pick and place" with additional sorting constraints that our extension of the ACT architecture (which we call ConceptACT) can lead to faster learning performance. Specifically, ConceptACT achieves a significant reduction in optimality gap compared to standard ACT, demonstrating that properly integrated semantic concepts can significantly improve sample efficiency in robotic imitation learning.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Jakob_Karalus1
Track: Regular Track: unpublished work
Submission Number: 119
Loading