Generative Inbetweening: Adapting Image-to-Video Diffusion models for Keyframe Interpolation
Supplementary Materials (submission 1189)

In this webpage, we present (1) video comparisons between our method and baselines across diverse scenarios (Figure 3 in the paper); (2) video comparisons in complex articulated motions in animals and people, where our method outperforms baselines, but struggles to create natural kinematic motions because of the limitation of SVD itself (Figure 5 in the paper); (3) ablation study results (Figure 4 in the paper).


Baseline comparisons

Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1

Baseline comparisons in articulated motions

Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1
Input pairs
FILM
TRF
Ours
Image 1

Ablation Study

Input pairs
Ours w/o RA
Ours w/o FT
Ours
Image 1
Input pairs
Ours w/o RA
Ours w/o FT
Ours
Image 1
Input pairs
Ours w/o RA
Ours w/o FT
Ours
Image 1