Rotation Symmetry in Vision Quantization: The Objective Function is the Bottleneck

Published: 24 May 2026, Last Modified: 28 May 2026ICML 2026 Workshop WSS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: weight-space symmetry, post-training quantization, orthogonal rotation, knowledge distillation, Stiefel manifold
Abstract: Rotation-based post-training quantization (PTQ) is widely used to suppress activation outliers in large language models. In vision classification, however, rotations learned with the standard task loss perform worse than random rotations on ViT- family models. We evaluate rotation learning on the Stiefel manifold across five architectures (DeiT-S, Swin-T, ViT-S, ResNet-50, MobileNetV2). Replacing the task loss with Final KD or Block Output MSE eliminates this reversal in all five models. The gain is small on ViT-family models at +0.3∼0.7pp. On MobileNetV2, where rotation is restricted to conv1×1, task-loss accuracy of 18.79% rises to a 4-seed KD accuracy of 67.11±0.29%, a gap of about +48pp that exposes a strong architecture heterogeneity. The result suggests that the gain from rotation symmetry is determined more by the choice of optimization objective than by the symmetry itself.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 24
Loading