Width transfer: on the (in)variance of width optimization

Rudy Chin; Diana Marculescu; Ari S. Morcos

Width transfer: on the (in)variance of width optimization

Rudy Chin, Diana Marculescu, Ari S. Morcos

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Channel Optimization, Channel Pruning, Neural Architecture Search, Convolutional Neural Network, Image Classification

Abstract: Optimizing the channel counts for different layers of a convolutional neural net- work (CNN) to achieve better accuracy without increasing the number of floating- point operations (FLOPs) required during the forward pass at test time is known as CNN width optimization. Prior work on width optimization has cast it as a hyperparameter optimization problem, which introduces large computational overhead (e.g., an additional 2× FLOPs of standard training). Minimizing this overhead could therefore significantly speed up training. With that in mind, this paper sets out to empirically understand width optimization by sensitivity analysis. Specifically, we consider the following research question: “Do similar training configurations for a width optimization algorithm also share similar optimized widths?” If this in fact is the case, it suggests that one can find a proxy training configuration requiring fewer FLOPs to reduce the width optimization overhead. Scientifically, it also suggests that similar training configurations share common architectural structure, which may be harnessed to build better methods. To this end, we control the training configurations, i.e., network architectures and training data, for three existing width optimization algorithms and find that the optimized widths are largely transferable across settings. Per our analysis, we can achieve up to 320× reduction in width optimization overhead without compromising the top-1 accuracy on ImageNet. Our findings not only suggest an efficient way to conduct width optimization, but also highlight that the widths that lead to better accuracy are invariant to various aspects of network architectures and training data.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/width-transfer-on-the-variance-of-width/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=15tlrG-bGN

5 Replies

Loading