Robust Temporal Action Localization With Meta Boundary Refinement

Jiahua Li, Kun Wei, Zhe Xu, Liejun Wang, Cheng Deng

Published: 2025, Last Modified: 25 Mar 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Temporal Action Localization (TAL) aims to localize the start and end timestamps of actions with specific categories in untrimmed videos. Despite great success, noisy action boundary labels may be included due to the inherent subjectivity of manual annotations. This can lead TAL models to learn inaccurate action boundaries during training, potentially impairing their localization performance. To systematically analyze and enhance the TAL models’ robustness against noisy action boundary labels, we introduce a new task termed TAL with Noisy Label. We demonstrate that introducing even minimal random noise to action boundary labels in training data can substantially degrade the performance of leading TAL methods, thereby underscoring their vulnerability to noisy action boundary labels. To be specific, we propose a novel plug-and-play method called Energy-based Meta Boundary Refinement (EMBR), where a meta-learning pipeline is employed to rectify noisy action boundary labels, ameliorating the misguidance of noisy labels on model training. Under this meta-learning pipeline, EMBR utilizes an energy function to calculate the magnitude of label noise and re-weights samples, assigning lower weights to samples with higher noise, alleviating the impact of noisy samples on model training. In addition, considering the energy difference between action and background segments, an energy-based loss function is proposed to achieve larger energy differences across the boundary, assisting in the boundary refinement. Experimental results on the THUMOS14, ActivityNet1.3, and HACS datasets demonstrate the effectiveness of EMBR in enhancing the robustness of TAL models.
Loading