Abstract: Highlights•We propose a DAT-detector for spatiotemporal action detection using video-level class labels as weak supervision.•The DAT-detector generates precise actions tubes via proposed attention and regression modules.•We enhace tubelet proposal quality with our action tubelet proposal generation method.•Our method significantly outperforms the state-of-the-art action proposal methods.•We achieve remarkable performance in spatiotemporal action detection across multiple benchmarks, effectively competing with fully supervised approaches.
Loading