Abstract: Highlights•Denoising diffusion models for speech driven video editing.•Present a speech-conditioned diffusion model for this task.•We demonstrate promising results on the GRID and CREMA-D datasets.•An unstructured diffusion-based approach can generate high quality image frames without complex loss function.