Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Aakash Kaku; Sahana Upadhya; Narges Razavian

Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Aakash Kaku, Sahana Upadhya, Narges Razavian

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: Deep learning, Self-supervised learning, MoCo, Momentum contrastive self supervised learning, Histopathology, Chest Xray, Diabetic Retinopathy, Medical imaging

TL;DR: We improve momentum contrastive self-supervised learning for medical imaging datasets by having additional loss terms that brings the intermediate layer representations of two augmented versions of an image closer together

Abstract: We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov–Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/aakashrkaku/intermdiate_layer_matter_ssl

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/intermediate-layers-matter-in-momentum/code)

12 Replies

Loading