Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
TL;DR: The study introduces GMPQ-ASGA, linking sharpness to MPQ generalization, cutting search costs via small proxy datasets for large-scale accuracy.
Abstract: Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive search for quantization strategies on large-scale datasets. To resolve this issue, we introduce a novel approach that first searches for quantization strategies on small datasets and then generalizes them to large-scale datasets. This approach simplifies the process, eliminating the need for large-scale quantization fine-tuning and only necessitating model weight adjustment. Our method is characterized by three key techniques: sharpness-aware minimization for enhanced quantized model generalization, implicit gradient direction alignment to handle gradient conflicts among different optimization objectives, and an adaptive perturbation radius to accelerate optimization. It offers advantages such as no intricate computation of feature maps and high search efficiency. Both theoretical analysis and experimental results validate our approach. Using the CIFAR10 dataset (just 0.5\% the size of ImageNet training data) for MPQ policy search, we achieved equivalent accuracy on ImageNet with a significantly lower computational cost, while improving efficiency by up to 150\% over the baselines.
Lay Summary: We propose an innovative method named Adaptive Sharpness-Aware Gradient Aligning (ASGA) to address inefficiencies in optimizing neural networks through mixed-precision quantization. Traditional methods face high computational costs when optimizing neural networks on large datasets, while we trained models on small proxy datasets using sharpness-aware minimization and adaptive gradient alignment techniques. By stabilizing model training processes, this approach achieves comparable accuracy on large-scale datasets like ImageNet, while accelerating optimization search speeds by up to 150% and reducing data requirements to 0.5% of the target dataset's demands. This advancement enables faster and more accessible deployment of AI optimization for edge devices and resource-constrained scenarios.
Primary Area: Deep Learning->Algorithms
Keywords: model quantization, model compression, efficient neural network
Submission Number: 2783
Loading