Decoding Semantics: A Multi-Modal CNN as a Model for Human Literacy Acquisition

Published: 14 May 2025, Last Modified: 13 Jul 2025CCN 2025 Proceedings asProceedingsPosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: While visually presented objects (e.g. a picture of a rat) and words (e.g. the word *rat*) appear perceptually different, they evoke a similar semantic activation in the human brain. A key question in understanding human reading acquisition is how semantic representations emerge such that visual object representations and written words are meaningfully linked. We used a convolutional neural network (CNN), trained such that both object images and written word stimuli activate the same output unit. Our findings indicate, that cross-modal semantic representations emerge gradually across layers. Using representational similarity analysis of the layer activations, we further were able to show, that incongruent information affects the network’s performance via interfering projections to a high dimensional space. This suggests that the acquisition of literacy can be modelled as the projection of object and word features, processed via the same neuronal substrate - the visual cortex - into a shared semantic space. Our approach offers a new avenue to uncover the neuronal substrate of human literacy acquisition by using representational similarity analysis to link representations in the CNN to brain imaging data.
Submission Number: 35
Loading