Single image and video generation using a receptive diffusion model with convolutional spatiotemporal blocks
Abstract: Highlights•Union unifies diffusion for image/video training, avoids GAN errors and artifacts.•Receptive DDPM with ConvNext CS-Blocks capture local+global links in image/video.•Union offers diverse video generation, extrapolation, plus real-video editing.•Union leads in compute and quality, top LPIPS on Places50, beating baselines.
Loading