Abstract: Human-robot collaboration (HRC) is crucial for enabling mass personalised manufacturing and human-centric manufacturing. The recent advancements in video understanding technology have enabled robots to interpret human actions from videos and discern the appropriate timing and nature of required robot assistance. However, current vision-based HRC systems mainly focus on capturing human overall actions, neglecting the detailed actions of each hand. This restricts robots from providing finer attentive assistance, such as delivering the required tools to a specific hand. Furthermore, these systems predominantly utilise classification-based action recognition techniques rather than reasoning-involved action segmentation techniques, thereby constraining their practical application. To address above limitations, this paper proposes a vision-based dual-hand action segmentation method for human-robot collaborative assembly (HRCA) scenarios. Initially, graph neural networks are used to capture hand-object interactions for both hands across frames, then these features are integrated with scene features, and finally, temporal convolutional networks are applied to capture temporal dynamics of the integrated features over frames. To better serve the application needs of HRCA, we have refined the action segmentation to atomic action level according to the Human-Robot Shared Assembly Taxonomy, facilitating comprehensive collaboration. More details can be found at https://github.com/LISMS-AKL-NZ/DuHa
Loading