Denoising Diffusion Gamma Models



Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.

Here are some samples from our model for you to listen:

  • Ground Truth - original sample from LJ dataset
  • WaveGrad - sample generated with WaveGrad (noise schedule: 6 - grid search, 25 - Fibonacci, 100 and 1k - linear)
  • Ours Gamma - Generated using our Gamma distribution proposed method



6 Iteration Samples

Ground Truth WaveGrad Ours - Gamma






25 Iteration Samples

Ground Truth WaveGrad Ours - Gamma






100 Iteration Samples

Ground Truth WaveGrad Ours - Gamma






1000 Iteration Samples

Ground Truth WaveGrad Ours - Gamma