Track: Proceedings Track
Keywords: neural networks, modular deep learning, model stitching, model compression
TL;DR: We empirically investigate whether very structurally different layers can be stitched together, and suggest some new techniques for doing so.
Abstract: Model stitching is a technique for assembling new neural networks from the parts of existing networks, without having to re-train or fine-tune the existing weights. It has shown promise for new forms of neural architecture search, decentralized training, and transfer learning. But what are the upper bounds on this technique? Little investigation has gone into determining exactly what types of blocks can (or cannot) be stitched together, and how. In this work, we investigate the feasibility of adapting very low layers to very high layers, and stitching across different architectures, in the context of image classification models. We develop some modifications to the original stitching methods to make it possible to achieve good performance while stitching such disparate layers: (1) We interpolate the spatial dimensions of the input; (2) we propose adapters with more complex, nonlinear transformations; and (3) we propose the use of bottleneck adapters for computational efficiency. With these modifications, we are able to stitch, for example, the lower layers of a ResNet-50 to the upper layers of a Swin-Tiny, achieving ImageNet test accuracy near to the original models.
Submission Number: 153
Loading