Small Transformers, Big Results: Efficient Diffusion with Parameter Sharing

Mohamed Osman; Daniel Z Kaplan

Small Transformers, Big Results: Efficient Diffusion with Parameter Sharing

Mohamed Osman, Daniel Z Kaplan

Published: 19 Mar 2024, Last Modified: 31 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0

Keywords: denoising diffusion generative models, score-based generative models, resource efficient ml, transformers, parameter sharing

TL;DR: This paper presents an innovative approach to enhancing denoising diffusion generative models by introducing a block sharing mechanism within Vision Transformers, significantly reducing parameter count while maintaining or improving model quality.

Abstract: The interplay between model depth, computational complexity, and parameter count remains an intricate aspect of neural network design. We propose a novel block sharing mechanism for denoising diffusion generative models, enabling us to maintain or even improve model quality while reducing parameter count. Our approach leverages the architectural homogeneity of Vision Transformers and demonstrates enhanced performance with less computational overhead on various datasets. We provide our code and pre-trained models to facilitate further research.

Supplementary Material: zip

Submission Number: 156

Loading