U-Transformer-based multi-levels refinement for weakly supervised action segmentation

Xiao Ke, Xin Miao, Wenzhong Guo

Published: 01 Jan 2024, Last Modified: 08 Apr 2025Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We propose a novel multi-level U Transformer structure, which combines multi-scale information and short-term information between adjacent frames to compensate for the lack of training data in action segmentation.•We propose neighbor attention based on a close-range matrix. It uses close-range matrix for adjacent frames, leveraging local connectivity to process long temporal messages.•We propose a novel loss function optimization strategy. It uses pair-wise similarity from deep feature learning to tackle excessive segmentation. Effective for timestamp supervision via a pseudo-label strategy, enhancing calibration and model training.•Extensive evaluations show that our model achieves state-of-the-art results on three challenging datasets: The 50Salads, GTEA, and Breakfast datasets.