DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

Published: 10 Jun 2025, Last Modified: 01 Jul 2025TTODLer-FM @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: blockwise training, generative models, diffusion models
TL;DR: We propose DiffusionBlocks, a novel training framework that eliminates end-to-end backpropagation by interpreting neural network blocks as denoising operations in a continuous-time diffusion process.
Abstract: Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks, limiting accessibility to state-of-the-art AI research. We propose $\textit{{D}iffusion{B}locks}$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process. By partitioning the network into independently trainable blocks and optimizing noise level assignments based on equal cumulative probability mass, our approach achieves both superior memory efficiency and improved performance compared to traditional backpropagation. Experiments on image generation and language modeling tasks demonstrate 4$\times$ memory reduction during training while maintaining or improving performance. DiffusionBlocks provides a promising pathway for democratizing access to large-scale neural network training with limited computational resources.
Submission Number: 16
Loading