Learning to Generate Diverse Data From a Temporal Perspective for Data-Free Quantization

Published: 01 Jan 2024, Last Modified: 05 Mar 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Model quantization is a prevalent method to compress and accelerate neural networks. Most existing quantization methods usually require access to real data to improve the performance of quantized models, which is often infeasible in some scenarios with privacy and security concerns. Recently, data-free quantization has been widely studied to solve the challenge of not having access to real data by generating synthetic data, among which generator-based data-free quantization is an important type. Previous generator-based methods focus on improving the performance of quantized models by optimizing the spatial distribution of synthetic data, while ignoring the study of changes in synthetic data from a temporal perspective. In this work, we reveal that generator-based data-free quantization methods usually suffer from the issue that synthetic data show homogeneity in the mid-to-late stages of the generation process due to the stagnation of the generator update, which hinders further improvement of the performance of quantized models. To solve the above issue, we propose introducing the discrepancy between the full-precision and quantized models as new supervision information to update the generator. Specifically, we propose a simple yet effective adversarial Gaussian-margin loss, which promotes continuous updating of the generator by adding more supervision information to the generator when the discrepancy between the full-precision and quantized models is small, thereby generating heterogeneous synthetic data. Moreover, to mitigate the homogeneity of the synthetic data further, we augment the synthetic data with linear interpolation. Our proposed method can also promote the performance of other generator-based data-free quantization methods. Extensive experimental results show that our proposed method achieves superior performances for various settings on data-free quantization, especially in ultra-low-bit settings, such as 3-bit.
Loading