VisualCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
Anonymous Submission
Please stay tuned for all video loading...
More Patch-level Showcases
Hover on the video to see corresponding text prompts
Condition
Generated Video
Frame 0
Frame 60
Frame 140
Condition
Generated Video
Frame 0
Frame 40
Frame 76
Condition
Generated Video
Frame 0
Frame 80
Frame 156
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 40
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 40
More Image-level Showcases
Hover on the video to see corresponding text prompts
Condition
Generated Video
Frame 0
Frame 40
Frame 76
Condition
Generated Video
Frame 0
Frame 40
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 76
Condition
Generated Video
Frame 0
Frame 40
Condition
Generated Video
Frame 16
Frame 40
Condition
Generated Video
Frame 0
Condition
Generated Video
Frame 40
Condition
Generated Video
Frame 76
More Video-level Showcases
Hover on the video to see corresponding text prompts
Video InpaintingSource
Generated Video
Source
Generated Video
Video Outpainting
Source
Generated Video
Source
Generated Video
Video Transition
Source
Generated Video
Source
Generated Video
Source
Generated Video
Source
Generated Video
Source
Generated Video
Source
Generated Video
Comparisons
References
- Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, et al. LTX-Video: Realtime Video Latent Diffusion. https://arxiv.org/abs/2501.00103
- Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. HunyuanVideo: A Systematic Framework for Large Video Generative Models. https://arxiv.org/abs/2412.03603
- Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. CogVideoX: Text-to-Video Diffusion Models with an Expert Transformer. https://arxiv.org/abs/2408.06072
- Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, et al. Wan: Open and Advanced Large-Scale Video Generative Models. https://arxiv.org/abs/2503.20314