IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

Hang Guo; Yawei Li; Tao Dai; Shu-Tao Xia; Luca Benini

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

Hang Guo, Yawei Li, Tao Dai, Shu-Tao Xia, Luca Benini

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

TL;DR: This paper presents the integral low-rank adaptation which can adapt the quantized diffusion models by int-multiplication or bit-shifting.

Abstract: Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.

Lay Summary: Adapting large AI models to new tasks is often expensive and slow. Our method, IntLoRA, makes this process more efficient by allowing the model to be fine-tuned using compact, low-precision data without sacrificing performance. Unlike existing approaches that require extra steps after training, IntLoRA keeps everything efficient both during and after fine-tuning. This helps reduce the cost of using powerful models on personal devices while maintaining high-quality results.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/csguoh/IntLoRA

Primary Area: Applications->Computer Vision

Keywords: Low-rank Adaptation, Network Quantization, Diffusion Models

Submission Number: 1156

Loading