Workshop Submission: Towards Making Untrainable Networks Trainable

Published: 10 Oct 2024, Last Modified: 28 Oct 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0
Track: Extended Abstract Track
Keywords: Neural network design, representational alignment
TL;DR: We introduce a method to make networks that are difficult train, easier to train using a representational alignment.
Abstract: We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. Networks are considered untrainable when they overfit, underfit, or converge to poor results even when tuning their hyperparameters. For example, plain fully connected networks overfit on object recognition while deep convolutional networks without residual connections underfit. The traditional answer is to change the architecture to impose some inductive bias, although what that bias is remains unknown. We introduce guidance, where a guide network guides a target network using a neural distance function. The target is optimized to perform well and to match its internal representations, layer-by-layer, to those of the guide; the guide is unchanged. If the guide is trained, this transfers over part of the architectural prior and knowledge of the guide to the target. If the guide is untrained, this transfers over only part of the architectural prior of the guide. In this manner, we can investigate what kinds of priors different architectures place on untrainable networks such as fully connected networks. We demonstrate that this method overcomes the immediate overfitting of fully connected networks on vision tasks and makes plain CNNs competitive to ResNets.
Submission Number: 54
Loading