Translating video into language by enhancing visual and language representations

Pengjie Tang, Yunlan Tan, Jinzhong Li, Bin Tan

Published: 2020, Last Modified: 11 Apr 2025J. Vis. Commun. Image Represent. 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A more effective two-stage pre-training strategy is used for video description.•A visual and language representation enhancing method is proposed.•A visual sequential mean pooling method is proposed to further improve performance.