TIA2V: Video generation conditioned on triple modalities of text-image-audio

Published: 01 Jan 2025, Last Modified: 16 May 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•First publicly proposed text–image–audio-to-video generation task.•Different designs for interactions among visual-text-audio modalities.•Better performance of the combination of the diffusion and GAN models.•Creation of three triple-modality datasets as further reliable benchmarks.
Loading