Bridging asymmetry between image and video: Cross-modality knowledge transfer based on learning from video
Abstract: Highlights•Enhance action-semantic learning in the image modality to address the asymmetry between modalities.•Transfer action semantics from video to the image feature learning model from both global and local perspectives.•Our model performance improves by 3.9% and 4.2% compared to the SOTA model.
External IDs:dblp:journals/eswa/ZhouZCLDG25
Loading