UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: video dataset, video generation
Abstract: The quality of the video dataset (image quality, resolution, and fine-grained caption) greatly influences the performance of the video generation model. % The growing demand for video applications sets higher requirements for high-quality video generation models. % For example, the generation of movie-level Ultra-High Definition (UHD) videos and the creation of 4K short video content. % However, the existing public datasets cannot support related research and applications. % In this paper, we first propose a high-quality open-sourced UHD-4K (22.4\% of which are 8K) text-to-video dataset named UltraVideo, which contains a wide range of topics (more than 100 kinds), and each video has 9 structured captions with one summarized caption (average of 824 words). % Specifically, we carefully design a highly automated curation process with four stages to obtain the final high-quality dataset: ***i)*** collection of diverse and high-quality video clips. ***ii*** statistical data filtering. ***iii)*** model-based data purification. ***iv)*** generation of comprehensive, structured captions. % In addition, we expand Wan to UltraWan-1K/-4K, which can natively generate high-quality 1K/4K videos with more consistent text controllability, demonstrating the effectiveness of our data curation. % We believe that this work can make a significant contribution to future research on UHD video generation. UltraVideo dataset and UltraWan models are available at https://xzc-zju.github.io/projects/UltraVideo.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/APRIL-AIGC/UltraVideo
Code URL: https://xzc-zju.github.io/projects/UltraVideo/
Primary Area: Datasets & Benchmarks illustrating Different Deep learning Scenarios (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 187
Loading