everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Data free quantization of neural networks is a practical necessity as access to training data in many situations is restricted due to privacy, proprietary concerns, or memory issues. We present a data free weight rounding algorithm for Deep Neural Networks (DNNs) that does not require any training data, synthetic data generation, fine-tuning, or even batch norm statistics. Instead, our approach focuses on preserving the direction of weight vectors during quantization. We demonstrate that traditional weight rounding techniques, that round weights to the nearest quantized level, can result in large angles between the full-precision weight vectors and the quantized weight vectors, particularly under coarse quantization regimes. For a large class of high-dimensional weight vectors in DNNs, this angle error can approach 90 degrees. By minimizing this angle error, we significantly improve top-1 accuracy in quantized DNNs. We analytically derive the angle-minimizing rounding boundaries for ternary quantization under the assumption of Gaussian weights. Building on this, we propose a greedy data-free quantization method based on the cosine similarity between the full-precision weight vectors and the quantized weight vectors. Our approach consistently outperforms existing state-of-the-art data-free quantization techniques and, in several cases, surpasses even data-dependent methods on well-established models such as ResNet-18, VGG-16, and AlexNet with aggressive quantization levels of 3 to 6 bits on the ImageNet dataset.