Winograd Structured Pruning

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Pruning, Winograd Convolution, GPU
TL;DR: Adaptive Structured Pruning for fast Winograd Convolution
Abstract: Both Winograd convolution (\textit{WC}) and pruning can significantly reduce the computation of Convolutional Neural Network (CNN), but applying them simultaneously is challenging. For example, applying fine-grained pruning to \textit{WC} eliminates the computational advantages from sparsity due to Winograd transformation. Moreover, integrating \textit{WC} with filter pruning can lead to a reduction in network accuracy due to the use of large pruning unit size. To address previous challenges, this paper proposes Adaptive Balanced Winograd Structured Pruning (ABWSP), a method specifically designed to prune weights in \textit{WC} networks executed on GPUs, which are widely used as computing devices for CNNs. ABWSP takes into account three crucial factors: pruning unit size, workload balance, and layer importance. First, ABWSP efficiently utilizes the computing units on GPUs by pruning grouped weights simultaneously. Considering the computational characteristics of \textit{WC} on GPU, the group size can be minimized while maintaining a regular data pattern (i.e., WSP). Secondly, the General Matrix Multiplications (GEMMs) of a layer are executed concurrently on GPU, and the execution time of the layer is determined by the longest GEMM operation. To minimize the execution cycle for \textit{WC}, ABWSP maintains an equal pruning ratio between matrices of \textit{WC} (i.e., BWSP). Lastly, applying BWSP to all layers results in a loss of accuracy. Since the importance varies for each \textit{WC} layer, the accuracy loss and performance benefit due to BWSP are different. To maintain accuracy with high speedup, ABWSP comprehensively evaluates both accuracy and speedup to determine the appropriate application of BWSP or WSP for each layer, automatically. By considering these factors, ABWSP optimizes the pruning process by effectively utilizing GPU computing units, minimizing execution cycles of each layer, and ensuring a balance between accuracy and speedup.
Supplementary Material: pdf
Primary Area: infrastructure, software libraries, hardware, etc.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 474
Loading