- TL;DR: We demonstrate how residual blocks can be viewed as Gauss-Newton steps; we propose a new residual block that exploits second order information.
- Abstract: A plethora of computer vision tasks, such as optical flow and image alignment, can be formulated as non-linear optimization problems. Before the resurgence of deep learning, the dominant family for solving such optimization problems was numerical optimization, e.g, Gauss-Newton (GN). More recently, several attempts were made to formulate learnable GN steps as cascade regression architectures. In this paper, we investigate recent machine learning architectures, such as deep neural networks with residual connections, under the above perspective. To this end, we first demonstrate how residual blocks (when considered as discretization of ODEs) can be viewed as GN steps. Then, we go a step further and propose a new residual block, that is reminiscent of Newton's method in numerical optimization and exhibits faster convergence. We thoroughly evaluate the proposed Newton-ResNet by conducting experiments on image and speech classification and image generation, using 4 datasets. All the experiments demonstrate that Newton-ResNet requires less parameters to achieve the same performance with the original ResNet.
- Keywords: Residual learning, Resnet, Newton