End-to-End Video Captioning Based on Multiview Semantic Alignment for Human-Machine Fusion. | OpenReview

End-to-End Video Captioning Based on Multiview Semantic Alignment for Human-Machine Fusion.

Shuai Wu, Yubing Gao, Weidong Yang 0001, Hongkai Li, Guangyu Zhu 0001

30 Jul 2025IEEE Trans Autom. Sci. Eng. 2025EveryoneCC BY-SA 4.0

Loading