Representation and Boundary Enhancement for Action Segmentation Using Transformer

Shang-Fu Chen; Cheng-Xun Wen; Wen-Huang Cheng; Kai-Lung Hua

Representation and Boundary Enhancement for Action Segmentation Using Transformer

Shang-Fu Chen, Cheng-Xun Wen, Wen-Huang Cheng, Kai-Lung Hua

Published: 01 Jan 2024, Last Modified: 23 Oct 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the task of action segmentation, the goal is to partition a lengthy, untrimmed video into a series of action segments. Recently, Transformer-based methods have outperformed the previous temporal convolutional networks (TCNs) in terms of overall performance. However, both TCNs and Transformers encounter the challenge of over-segmentation. Prior approaches often relied on post-processing techniques to address this issue, but these methods are not universally applicable to every model and may sometimes result in performance degradation. Therefore, in this paper, we propose a set of loss functions to enhance representation learning and employ a multi-task learning approach to strengthen the model’s ability to identify action boundaries. Through extensive experiments, we validate that our method demonstrates significant improvements, particularly in addressing the challenge of over-segmentation.

Loading