TCA-DiT: Quantizing Diffusion Transformers via Temporal Channel Alignment

16 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Post-Training Quantization, Diffusion Models, Diffusion Transformers
TL;DR: We propose TCA-DiT, a post-training quantization framework for Diffusion Transformers via temporal channel alignment
Abstract: Diffusion Transformers (DiTs) have achieved remarkable success in generative modeling, but their deployment is hindered by massive model sizes and high inference costs. Post-Training Quantization (PTQ) offers a retraining-free compression paradigm, yet its application to DiTs is particularly challenging due to timestep-varying, channel-wise activation anomalies. These anomalies vary dynamically across timesteps, undermining existing rotation- or scaling-based PTQ methods and leaving residual misaligned anomaly channels that impair quantization fidelity. We propose **TCA-DiT**—**T**emporal **C**hannel **A**lignment for **Di**ffusion **T**ransformers—a PTQ framework designed to explicitly address such timestep-varying anomalies. Specifically, we first introduce *Anomaly-aware Rotation Calibration (ARC)*, a learnable rotation-scaling mechanism that jointly optimizes rotation matrices with reconstruction and anomaly alignment losses, thereby aligning anomaly channels across timesteps and enabling more precise per-channel scaling. To improve calibration efficiency, we further develop *Anomaly-guided Timestep Grouping (ATG)*, which clusters timesteps based on anomaly distributions, capturing full temporal dynamics with a compact set of representatives. Finally, we propose *Reordered Group Quantization (RGQ)*, which reorders channels before group quantization to reduce intra-group variance and minimize quantization error. On DiT-XL/2 with W4A4, TCA-DiT improves FID by **0.74** and **6.47** on ImageNet 256$\times$256 and 512$\times$512, respectively. On PixArt-$\alpha$, it achieves a substantial **3.74** FID improvement while reducing memory usage by **3.8$\times$** and accelerating inference by **3.5$\times$**. These results highlight the critical role of anomaly alignment in enabling both effective and efficient quantization of DiTs.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 7733
Loading