If You've Trained One You’ve Trained Them All: Inter-Architecture Similarity Increases With RobustnessDownload PDF

Published: 20 May 2022, Last Modified: 05 May 2023UAI 2022 OralReaders: Everyone
Keywords: CKA, Robustness, Similarity, Inversion
TL;DR: Robust neural networks converge to similar representations regardless of architecture; this effect is especially profound if we control for the effect of feature correlations.
Abstract: Previous work has shown that commonly-used metrics for comparing representations between neural networks overestimate similarity due to correlations between data points. We show that intra-example feature correlations also causes significant overestimation of network similarity and propose an image inversion technique to analyze only the features used by a network. With this technique, we find that similarity across architectures is significantly lower than commonly understood, but we surprisingly find that similarity between models with different architectures increases as the adversarial robustness of the models increase. Our findings indicate that robust networks tend towards a universal set of representations, regardless of architecture, and that the robust training criterion is a strong prior constraint on the functions that can be learned by diverse modern architectures. We also find that the representations learned by a robust network of any architecture have an asymmetric overlap with non-robust networks of many architectures, indicating that the representations used by robust neural networks are highly entangled with the representations used by non-robust networks.
Supplementary Material: zip
4 Replies