DuCAS: a knowledge-enhanced dual-hand compositional action segmentation method for human-robot collaborative assembly

Published: 01 Jan 2024, Last Modified: 20 May 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recognising and tracking human actions from videos is crucial for human-robot collaborative assembly (HRCA). However, traditional action segmentation methods suffer from limited scene adaptability, partly because they conceptualise actions as unified verb-object entities with complete semantics. To overcome this, we propose a compositional action segmentation method. Following the human-robot shared assembly taxonomy, we deconstruct an assembly action into four elements: action verb, manipulated object, target object and tool. Our approach employs individual segmentation models for each action element, and then integrates general knowledge from large language models and domain-specific knowledge from predefined rules to form semantic-complete actions. Our method’s emphasis on general action elements and a modular design endows it with greater flexibility and adaptability than traditional approaches. Another attribute of our method is its capability to segment actions of each hand concurrently, facilitating more nuanced HRCA. Comparative experiments validate the superiority of our method over traditional action segmentation methods. More details can be found at https://github.com/LISMS-AKL-NZ/DuCAS.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview