Condensing Videos by Learning Where Motion Matters

Jaehyun Choi; Jiwan Hur; Gyojin Han; Jaemyung Yu; Junmo Kim

Condensing Videos by Learning Where Motion Matters

Jaehyun Choi, Jiwan Hur, Gyojin Han, Jaemyung Yu, Junmo Kim

16 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video Dataset Condensation

TL;DR: Video Dataset Condensation

Abstract: Video dataset condensation aims to mitigate the immense computational cost of video processing, but faces the unique challenge of preserving the complex interplay between spatial content and temporal dynamics. Prior work often unnaturally disentangles these elements, overlooking their essential interdependence. We introduce Dynamic Frame Synthesis (DFS), a novel approach that preserves this critical coupling. DFS begins with a minimal set of key frames and dynamically synthesizes new ones by identifying moments of high motion complexity, where simple interpolation fails, through gradient misalignments. This adaptive process allocates new frames only where such complexity exists, creating highly efficient and temporally coherent synthetic datasets. Extensive experiments show DFS outperforms prior methods on standard action recognition benchmarks, creating powerful representations with significantly less storage.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 6510

Loading