Translating video into language by enhancing visual and language representations

Published: 01 Jan 2020, Last Modified: 11 Apr 2025J. Vis. Commun. Image Represent. 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A more effective two-stage pre-training strategy is used for video description.•A visual and language representation enhancing method is proposed.•A visual sequential mean pooling method is proposed to further improve performance.
Loading