Keywords: ventral stream, circuit mechanisms, interpretability, deep learning, visual system, excitation inhibition, neuroscience, closed-loop optimization, ablation
TL;DR: Neural networks trained on ImageNet segregate the object/foreground features of their output layer to the positive input weights.
Abstract: One of the main organizational principles of artificial and biological intelligence systems is their reliance on signed inputs: positive and negative weights in artificial networks, and excitatory and inhibitory synapses in the brain. However, little is known about the role of inhibitory activity in high-level visual cortex such as inferotemporal cortex, or how artificial neural networks (ANNs) trained for object recognition segregate their learned representations into positive and negative weights.
Here, we dissected high-level visual mechanisms in ANNs trained with ImageNet. We investigated how learned representations of ANN classification units depended on their positive or negative inputs using ablation experiments and feature visualization. We found that unit representations changed more when ablating positive- vs. negative inputs. Object-related features were abolished when ablating positive inputs, while still preserving background textures. This effect was more pronounced in adversarially trained robust networks. This segregation persisted in networks trained with unsupervised learning, but was not present in a ResNet18 trained with Tanh instead of ReLU.
We found a consistent functional segregation when we trained models to replicate the activity of neurons in monkey visual cortex, across the ventral stream (V1, V4, and IT). Feature visualization of the neuron models produced images containing local features preferred by actual neurons. Analogous to units trained for classification, the learned representations of units trained to simulate neurons changed more upon ablating positive than negative inputs. We conclude that ANNs for classification segregate object or foreground information into the positive weights, with background or contextual information into the negative weights, in their last layer before softmax. These results hint at the relevance of signal rectification and inhibition into shaping feature selectivity in the primate ventral stream, a hypothesis we are testing in vivo.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10580
Loading