Keywords: ventral stream, circuit mechanisms, interpretability, deep learning, visual system, excitation inhibition, neuroscience, closed-loop optimization, ablation
TL;DR: Neural networks trained on ImageNet segregate the object/foreground features of their output layer to the positive input weights, with similar behavior in visual neurons.
Abstract: Signed connections are central to both artificial and biological intelligence, positive and negative weights in artificial networks, and excitatory and inhibitory synapses in the brain, yet their representational role remains unclear.
Here, we investigate how signed weights shape visual representations in artificial and biological systems involved in object recognition.
Using sign consistency as a proxy for biological Dale's law, which requires neurons to send either exclusively excitatory or inhibitory outputs, we found that accuracy of ImageNet trained networks positively correlated with the Dale index of their output layer.
Ablation and feature visualization reveal a functional segregation: removing positive inputs disrupts object related, low frequency structure, while removing negative inputs mainly alters background textures. This segregation is more pronounced in adversarially robust models, persists with unsupervised learning, and vanishes with non-rectified activations.
In intermediate layers, the most positive Dale‑like channels encoded localized, object‑like features, whereas the most negative ones captured dispersed, background features.
We next performed $\textit{in vivo}$ feature visualization in monkey ventral visual cortex (V1, V4, and IT) and fitted linear models using the input layer to the neural networks classification units. These models reproduced features similar to those preferred by the biological neurons. In the model neurons, removing positive inputs altered representations more than removing negative ones.
The most Dale-like positively projecting units exhibited localized features, while the negatively projecting units showed larger, more dispersed features, suited to carrying contextual input. Consistent with this, clearing the background around each neuron's preferred feature enhanced its response, likely by reducing inhibitory drive, supporting inhibition as a contextual modulation of the excitatory feature.
Our results demonstrate that both artificial and biological vision systems segregate features by weight sign: positive weights emphasize objects and low frequencies, negative weights encode context. This shows the convergence of representational strategies in brains and machines, yielding predictions for visual neuroscience.
Primary Area: interpretability and explainable AI
Submission Number: 21502
Loading