EdgeDiff: Energy-Efficient Multi-Modal Few-Step Diffusion Model Accelerator Using Mixed-Precision and Reordered Group Quantization

Sangjin Kim, Jungjun Oh, Jeonggyu So, Yuseon Choi, Sangyeob Kim, Dongseok Im, Gwangtae Park, Hoi-Jun Yoo

Published: 2026, Last Modified: 27 May 2026IEEE J. Solid State Circuits 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent advances in diffusion models (DMs)—such as few-step denoising and multi-modal conditioning—have significantly improved computational efficiency and functional flexibility, but they also introduce new hardware challenges. In particular, the elimination of inter-timestep redundancy, increased encoder/decoder workload, and heightened sensitivity to quantization demand a new class of accelerator. We present EdgeDiff, the first processor to support end-to-end, few-step, and multi-modal DM inference. EdgeDiff introduces a unified solution named condition-aware reordered group mixed precision (CRMP) with several novel microarchitectures: compress-and-add (CAA) processing elements (PEs) with bit-shuffle trees (BSTs) for efficient low-bit multiply-accumulate (MAC), a tiered accumulation unit (TAU) to reduce floating-point (FP) accumulation energy, and a grid-based quantization unit (GQU) to eliminate expensive FP division. Fabricated in 28-nm CMOS, EdgeDiff achieves up to 34.4-TOPS/W energy efficiency and reduces generation energy to 418.4 mJ/image for one-step text-to-image (T2I) generation— $3.3\times $ lower than prior state of the art. Despite aggressive quantization, EdgeDiff maintains output quality comparable to FP inference across Fréchet Inception Distance (FID), contrastive language–image pretraining (CLIP), and peak signal-to-noise ratio (PSNR) metrics, establishing it as a compelling solution for energy-efficient, real-time generative artificial intelligence (AI) on edge platforms.

External IDs:dblp:journals/jssc/KimOSCKIPY26