DBAM: Dense Boundary and Actionness Map for Action Localization in Videos via Sentence QueryOpen Website

2021 (modified: 10 Nov 2022)ICIG (3) 2021Readers: Everyone
Abstract: Action localization in videos via sentence query remains a very challenging problem because of the semantic misalignment and the structural misalignment. With the observation that activities should be localized with both the local keywords of query sentence and the global information of whole video, we propose a novel method named Dense Boundary and Actionness Map (DBAM). This method trains a self-attention model to evaluate the importance of each word in the query sentence. Then it constructs a two-dimensional visual feature map for each candidate moment after video encoding. The visual feature map is cross-modal concatenated with the semantic feature and then DBAM directly performs convolution over the feature map to predict two-dimensional actionness map, starting map and ending map for candidate moments. The three maps are fused to generate proposals. We evaluate DBAM on the two challenging public benchmarks Charades-STA and TACoS and it outperforms the state-of-the-art by a large margin.
0 Replies

Loading