High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion

Published: 01 Nov 2024, Last Modified: 11 Nov 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: We introduce HiFI, a patch-based cascaded pixel diffusion approach for High resolution Frame Interpolation, that generalizes across diverse resolutions up to 8K, a wide range of scene motions, and a broad spectrum of challenging scenes. HiFI helps significantly with high resolution and complex repeated textures that require global context. HiFI demonstrates comparable or beyond state-of-the-art performance on multiple benchmarks (Vimeo, Xiph, X-Test, SEPE-8K). On our newly introduced dataset that focuses on particularly challenging cases, HiFI significantly outperforms all other baselines. Diffusion models are powerful but computationally very expensive. To scale up to 8K resolutions, we introduce a patch-based cascade model that always performs diffusion at the same resolution and upsamples by processing patches of the inputs and the prior solution. We show that this technique drastically reduces memory usage at inference time and also allows us to use a single model at test time, solving both frame interpolation (base model’s task) and spatial up-sampling, saving training cost.
Loading