How deep convolutional neural networks lose spatial information with training

Umberto Maria Tomasini; Leonardo Petrini; Francesco Cagnetta; Matthieu Wyart

How deep convolutional neural networks lose spatial information with training

Umberto Maria Tomasini, Leonardo Petrini, Francesco Cagnetta, Matthieu Wyart

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Deep Learning Theory, Convolutional Neural Networks, Curse of Dimensionality, Representation Learning, Feature Learning, Computer Vision, Pooling, Stability, Diffeomorphisms, Gaussian noise, Image Classification, Learning Invariants

TL;DR: Deep nets perform image classification by aggregating information over space. We investigate the mechanisms by which this is achieved and propose a theory for an artificial scale-detection task.

Abstract: A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image data sets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitive to diffeomorphisms acting on images as the signal propagates through the net. This loss of sensitivity correlates with performance, and surprisingly also correlates with a gain of sensitivity to white noise acquired during training. These facts are unexplained, and as we demonstrate still hold when white noise is added to the images of the training set. Here, we (i) show empirically for various architectures that stability to image diffeomorphisms is achieved by spatial pooling in the first half of the net, and by channel pooling in the second half, (ii) introduce a scale-detection task for a simple model of data where pooling is learnt during training, which captures all empirical observations above and (iii) compute in this model how stability to diffeomorphisms and noise scale with depth. The scalings are found to depend on the presence of strides in the net architecture. We find that the increased sensitivity to noise is due to the perturbing noise piling up during pooling, after a ReLU non-linearity is applied to the noise in the internal layers.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/how-deep-convolutional-neural-networks-lose/code)

9 Replies

Loading