Abstract: Highlights•A Quality-Aware Recurrent Feedback Network is constructed to refine video features.•Predicted textual information is used in the process of progressively enhancing the video representation.•We verify the superiority of QARFNet in utilizing the generated captions via a feedback mechanism.
Loading