Abstract: Highlights•We introduce the MAB for enhancing motion encoding ability of frame-based models.•We enhance feature fusion by aligning heterogeneous features via cross-attention.•Experiments show efficacy of our method qualitatively and quantitatively.
Loading