PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference

Qirui Wang; Qi Guo; Shanmin Pang; Ming-Ming Cheng; Qing Guo

PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference

Qirui Wang, Qi Guo, Shanmin Pang, Ming-Ming Cheng, Qing Guo

16 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Personalized text-to-image generation

TL;DR: PersonalQ enables efficient serving of personalized diffusion models at scale through intelligent checkpoint selection and trigger-token-aware quantization that preserves personalization quality while reducing memory footprint.

Abstract: Personalized text-to-image generation enables users to create custom AI models that generate their unique concepts—specific objects or artistic styles—achieving unprecedented creative control. However, deploying a large repository of personalized checkpoints faces two critical challenges: (1) ambiguous user prompts make it difficult to match the intended checkpoint in large repositories, and (2) standard post-training quantization methods degrade personalized diffusion checkpoints’ image quality. We analyze the importance of reasoning over checkpoint metadata and clarifying user prompts for intent-aligned checkpoint selection. Additionally, we find that trigger tokens for personalized diffusion play a crucial role in quantization. To address the challenges, we propose PersonalQ, a unified system with two components: Check-in analyzes checkpoint repositories and clarifies user intent for intent-aligned selection, and TAQ (Trigger-Aware Quantization), which protects the trigger-token-related representation to deliver high-quality inference from the chosen checkpoint under quantization. On our Repo-Prompts benchmark, PersonalQ achieves an 89% checkpoint-selection preference win rate and a 4.42/5 intent score. Across benchmarks, TAQ reduces inference memory by up to 75% while maintaining strong text-image alignment (CLIP score 0.297 vs. 0.315 at full precision) and image fidelity (FID 11.03 at W8A8 vs. 10.96 at full precision), enabling scalable deployment of personalized models without compromising quality.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6759

Loading