Revisiting Model Stitching to Compare Neural Representations

Yamini Bansal; Preetum Nakkiran; Boaz Barak

Revisiting Model Stitching to Compare Neural Representations

Yamini Bansal, Preetum Nakkiran, Boaz Barak

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: representations, deep learning, neural network dynamics

TL;DR: We revisit model stitching to compare neural representations and provide quantitative evidence for various intuitions

Abstract: We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a "stitched model" formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations", by demonstrating that good networks of the same architecture, but trained in very different ways (eg: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that "more is better" by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in" to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call "stitching connectivity", akin to mode-connectivity: typical minima reached by SGD are all "stitching-connected" to each other.

Supplementary Material: pdf

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/revisiting-model-stitching-to-compare-neural/code)

10 Replies

Loading