Compressing the Activation Maps in Deep Convolutional Neural Networks and Its Regularizing Effect

Published: 19 Mar 2024, Last Modified: 19 Mar 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Deep learning has dramatically improved performance in various image analysis applications in the last few years. However, recent deep learning architectures can be very large, with up to hundreds of layers and millions or even billions of model parameters that are impossible to fit into commodity graphics processing units. We propose a novel approach for compressing high-dimensional activation maps, the most memory-consuming part when training modern deep learning architectures. The proposed method can be used to compress the feature maps of a single layer, multiple layers, or the entire network according to specific needs. To this end, we also evaluated three different methods to compress the activation maps: Wavelet Transform, Discrete Cosine Transform, and Simple Thresholding. We performed experiments in two classification tasks for natural images and two semantic segmentation tasks for medical images. Using the proposed method, we could reduce the memory usage for activation maps by up to 95%. Additionally, we show that the proposed method induces a regularization effect that acts on the layer weight gradients.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: We have updated our paper based on AE's suggestion: - Cited "Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey" by Deng et al. in IEEE'20. - Rotated Table 3 on Page 26 for readability. We have also added Authors' names and released the code. Document for the code ( will be updated accordingly.
Assigned Action Editor: ~Evan_G_Shelhamer1
Submission Number: 1551