Keywords: video dataset, video generation
Abstract: The quality of the video dataset (image quality, resolution, and fine-grained caption) greatly influences the performance of the video generation model.
%
The growing demand for video applications sets higher requirements for high-quality video generation models.
%
For example, the generation of movie-level Ultra-High Definition (UHD) videos and the creation of 4K short video content.
%
However, the existing public datasets cannot support related research and applications.
%
In this paper, we first propose a high-quality open-sourced UHD-4K (22.4\% of which are 8K) text-to-video dataset named UltraVideo, which contains a wide range of topics (more than 100 kinds), and each video has 9 structured captions with one summarized caption (average of 824 words).
%
Specifically, we carefully design a highly automated curation process with four stages to obtain the final high-quality dataset: ***i)*** collection of diverse and high-quality video clips. ***ii*** statistical data filtering. ***iii)*** model-based data purification. ***iv)*** generation of comprehensive, structured captions.
%
In addition, we expand Wan to UltraWan-1K/-4K, which can natively generate high-quality 1K/4K videos with more consistent text controllability, demonstrating the effectiveness of our data curation.
%
We believe that this work can make a significant contribution to future research on UHD video generation. UltraVideo dataset and UltraWan models are available at https://xzc-zju.github.io/projects/UltraVideo.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/APRIL-AIGC/UltraVideo
Code URL: https://xzc-zju.github.io/projects/UltraVideo/
Primary Area: Datasets & Benchmarks illustrating Different Deep learning Scenarios (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 187
Loading