Abstract: Highlights•DPT extracts deep, specific, and global features from video frames.•Instead of considering a single layer, features are extracted from multi-scales.•PRB is designed that boost the frames representations before passing to CH.•Progressive features fusion ensures features refinement before predictions.
Loading