Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi; Penghui Ruan; Marco Chen; Xianbiao Qi; Shaozhe Hao; Shihao Zhao; Youze Huang; Bin Liang; Rong Xiao; Kam-Fai Wong

Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Bojia Zi, Penghui Ruan, Marco Chen, Xianbiao Qi, Shaozhe Hao, Shihao Zhao, Youze Huang, Bin Liang, Rong Xiao, Kam-Fai Wong

Published: 18 Sept 2025, Last Modified: 01 Feb 2026NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Instruction-based Video Editing Dataset

TL;DR: Instruction-based Video Editing Dataset

Abstract: Video content editing has a wide range of applications. With the advancement of diffusion-based generative models, video editing techniques have made remarkable progress, yet they still remain far from practical usability. Existing inversion-based video editing methods are time-consuming and struggle to maintain consistency in unedited regions. Although instruction-based methods have high theoretical potential, they face significant challenges in constructing high-quality training datasets - current datasets suffer from issues such as editing correctness, frame consistency, and sample diversity. To bridge these gaps, we introduce the **Señorita-2M** dataset, a large-scale, diverse, and high-quality video editing dataset. We systematically categorize editing tasks into 2 classes consisting of 18 subcategories. To build this dataset, we design four new task specialists and employ or modify 14 existing task experts to generate data samples for each subclass. In addition, we design a filtering pipeline at both the visual content and instruction levels to further enhance data quality. This approach ensures the reliability of constructed data. Finally, the **Señorita-2M** dataset comprises 2 million high-fidelity samples with diverse resolutions and frame counts. We trained multiple models using different base video models, i.e., Wan2.1 and CogVideoX-5B, on Señorita-2M, and the results demonstrate that the models exhibit superior visual quality, robust frame-to-frame consistency, and strong instruction following capability. More videos are available at: **https://senorita-2m-dataset.github.io**.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/SENORITADATASET/Senorita

Code URL: https://github.com/zibojia/SENORITA

Supplementary Material: zip

Primary Area: Applications of Datasets & Benchmarks for in Creative AI

Submission Number: 403

Loading