Abstract: Highlights•A new task named few-supervised video captioning that uses only one human-sentence is introduced.•A pseudo labeling strategy with lexical constraint is proposed to augment knowledge.•A keyword-refined captioning module with video-text gated fusion is designed generating high-quality sentences by modeling global context.•Empirical studies demonstrate the satisfying quality of the generated captions by proposed method.
Loading