HOW TO TRAIN YOUR DIFFUSION MODELS FOR FEW- SHOT CLASSIFICATION

Jingyuan Shi; Zhongqi Yue; Fengda Zhang; Qianpei He; Feifei Shao; Jun Xiao

HOW TO TRAIN YOUR DIFFUSION MODELS FOR FEW- SHOT CLASSIFICATION

Jingyuan Shi, Zhongqi Yue, Fengda Zhang, Qianpei He, Feifei Shao, Jun Xiao

04 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Classifier, Few-shot Learning

TL;DR: This work greatly accelerates the existing framework of diffusion classifer for few-shot learning while maintaining or even improving the performance.

Abstract: Generation and classification are two sides of the same coin: a strong generative model can be transformed into a powerful classifier. This is evident when diffusion models (DMs) outperform CLIP-based approaches in fine-grained or customized classification tasks, where a small few-shot training set defines the task on the fly. In this setting, the model is typically fine-tuned to reconstruct the training samples and, at inference, predicts the label with the lowest expected reconstruction error across diffusion time-steps. Although effective, this approach is computationally expensive, as it requires computing $average$ reconstruction errors for $every \ class$ over the $ full \ \ range \ \ of \ \ time-steps$ and \$multiple \ \ sampled \ \ noises$. In this work, we study techniques to improve both efficiency and accuracy of diffusion classifiers. To accelerate inference, we propose dynamic time-step selection to minimize unnecessary evaluations. To improve the estimation of reconstruction errors, we introduce class-object mask learning, which reduce variance and thereby require fewer noise samples to achieve high precision. To further reduce the number of candidate classes, we explore candidate class selection. Together, these techniques speed up diffusion-based classifiers by over an order of magnitude while simultaneously maintaining or even improving classification performance. Finally, we show that DMs and CLIP-based models are complementary, and integrating the two achieves further gains — reinforcing the close connection between generation and classification.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 1952

Loading