## Rethinking Convolution: Towards an Optimal Efficiency

28 Sept 2020, 15:48 (modified: 05 Mar 2021, 23:09)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Reviewed Version (pdf): https://openreview.net/references/pdf?id=FIyhzRRks
Abstract: In this paper, we present our recent research about the computational efficiency in convolution. Convolution operation is the most critical component in recent surge of deep learning research. Conventional 2D convolution takes $O(C^{2}K^{2}HW)$ to calculate, where $C$ is the channel size, $K$ is the kernel size, while $H$ and $W$ are the output height and width. Such computation has become really costly considering that these parameters increased over the past few years to meet the needs of demanding applications. Among various implementation of the convolution, separable convolution has been proven to be more efficient in reducing the computational demand. For example, depth separable convolution reduces the complexity to $O(CHW\cdot(C+K^{2}))$ while spatial separable convolution reduces the complexity to $O(C^{2}KHW)$. However, these are considered an ad hoc design which cannot ensure that they can in general achieve optimal separation. In this research, we propose a novel operator called \emph{optimal separable convolution} which can be calculated at $O(C^{\frac{3}{2}}KHW)$ by optimal design for the internal number of groups and kernel sizes for general separable convolutions. When there is no restriction in the number of separated convolutions, an even lower complexity at $O(CHW\cdot\log(CK^{2}))$ can be achieved. Experimental results demonstrate that the proposed optimal separable convolution is able to achieve an improved accuracy-FLOPs and accuracy-#Params trade-offs over both conventional and depth/spatial separable convolutions.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
15 Replies