Restricted Random Pruning at Initialization for High Compression Range

Published: 03 May 2024, Last Modified: 03 May 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Pruning at Initialization (PaI) makes training overparameterized neural networks more efficient by reducing the overall computational cost from training to inference. Recent PaI studies showed that random pruning is more effective than ranking-based pruning, which learns connectivity. However, the effectiveness of each pruning method depends on the existence of skip connections and the compression ratio (the before-after pruning parameter ratio). While random pruning performs better than ranking-based pruning on architectures with skip connections, the superiority without skip connections is reversed in the high compression range. This paper proposes Minimum Connection Assurance (MiCA) that achieves higher accuracy than conventional PaI methods for architectures with and without skip connections, regardless of the compression ratio. MiCA preserves the random connection between the layers and maintains the performance at high compression ratios without the costly connection learning that ranking-based pruning requires. Experiments on image classification using CIFAR-10 and CIFAR-100 and node classification using OGBN-ArXiv show that MiCA enhances the compression ratio and accuracy trade-offs compared to existing PaI methods. In VGG-16 with CIFAR-10, MiCA improves the accuracy of random pruning by $27.0\%$ at $10^{4.7}\times$ compression ratio. Furthermore, experimental analysis reveals that increasing the utilization of the nodes through which information flows from the first layer is essential for maintaining high performance at a high compression ratio.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In response to the requested changes, we have made the following revisions to the paper: > 1. Address robustness over randomness with more repetition We have increased the number of experiments from three to five. The revised results are basically similar to the original results. We have added our thoughts on the accuracy variation results in the limitations section. Also, we have added the error bar for the compression ratio to show that our method is robust to the corrected compression ratio. > 2. Empirical demonstration of MiCA's effectiveness on compression <> accuracy tradeoff beyond convolutional networks > 3. Demonstration of the method on general tasks: beyond CIFAR-10/100 (two are very similar type of tasks); ideally beyond just image classification task As an additional experiment, we have experimented with the node classification task, which is the major task for graph neural networks (GNNs). GNNs are based on MLP and do not use convolutional structures. We employ graph convolutional network (GCN) and graph isomorphism network (GIN) architecture. (Please note that while GCN seems to have a convolutional structure from the name, it does not have the convolutional structure contained in CNN.) Section 4.9 has discussed the results. > 4. Demonstration of scalability: e.g. effectiveness in ImageNet on both architecture with and without skip-connection (Note as reviewers pointed out current ImageNet experiments doesn't seem to convincingly support claimed improvements) > 5. Better justification of why improvement of MiCA at high compression ratio is important despite still being quite low performance We have mentioned these two requested revisions in the additional section, "Limitations and Implications of This Work." [May 3, 2024 UTC] Publication month change: 4/2024 -> 5/2024
Supplementary Material: zip
Assigned Action Editor: ~Jaehoon_Lee2
Submission Number: 2038