Dataset Quantization Augmentation: Improving Dataset Compression Through Complexity-Guided Sampling and Augmentation

Ziyang Li, Qin Liu, Fengshan Zhao, Yujie Wang, Takeshi Ikenaga

Published: 2025, Last Modified: 08 May 2026ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Training deep neural networks (DNNs) typically requires large-scale datasets, which poses substantial challenges related to computing resources and storage. Dataset Quantization (DQ) was introduced to compress large datasets into smaller subsets for training various neural networks. However, DQ lacks consideration of sample importance and diversity, which may result in excluding crucial samples or insufficiently representing the dataset, thereby limiting model generalization. To resolve this, we propose Dataset Quantization Augmentation (DQA), an enhanced framework that integrates image complexity into dataset compression and dynamically applies data augmentation techniques based on sample complexity. By selecting the least complex samples and applying data augmentation techniques (e.g., CutBlur, CutMix, Cutout, geometric transformations), DQA enhances dataset diversity and boosts model performance. Experimental results demonstrate that DQA outperforms the original DQ method and other Dataset compression methods. On CIFAR-10, DQA achieves a 3.64% accuracy improvement over DQ with only 10% of the dataset. For HDR reconstruction using the HAT model on Flickr2K, DQA achieves a PSNR 0.1761 higher and an SSIM 0.132 higher than DQ at 30% compression. These results highlight DQA’s versatility and effectiveness in both high-level and low-level vision tasks, making it a promising approach for dataset optimization.
Loading