Abstract: Due to boundary ambiguity and over-segmentation issues, identifying all the frames in long untrimmed
videos is still challenging. To address these problems, we present the Efficient Two-Step Network (ETSN)
with two components. The first step of ETSN is Efficient Temporal Series Pyramid Networks (ETSPNet)
that capture both local and global frame-level features and provide accurate predictions of segmentation
boundaries. The second step is a novel unsupervised approach called Local Burr Suppression (LBS), which
significantly reduces the over-segmentation errors. Our empirical evaluations on the benchmarks includ-
ing 50Salads, GTEA and Breakfast dataset demonstrate that ETSN outperforms the current state-of-the-art
methods by a large margin.
Loading