U-Transformer-based multi-levels refinement for weakly supervised action segmentation

Published: 01 Jan 2024, Last Modified: 08 Apr 2025Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a novel multi-level U Transformer structure, which combines multi-scale information and short-term information between adjacent frames to compensate for the lack of training data in action segmentation.•We propose neighbor attention based on a close-range matrix. It uses close-range matrix for adjacent frames, leveraging local connectivity to process long temporal messages.•We propose a novel loss function optimization strategy. It uses pair-wise similarity from deep feature learning to tackle excessive segmentation. Effective for timestamp supervision via a pseudo-label strategy, enhancing calibration and model training.•Extensive evaluations show that our model achieves state-of-the-art results on three challenging datasets: The 50Salads, GTEA, and Breakfast datasets.
Loading