Categories vs Semantic Features: What shape the similarities people discern in photographs of objects?

Published: 02 Mar 2024, Last Modified: 30 Mar 2024ICLR 2024 Workshop Re-Align PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: short paper (up to 5 pages)
Keywords: representational learning, cogntive models, semantic representations, vision
TL;DR: We see the effect of different training objectives of CNNs on human-model alignment
Abstract: In visual cognitive neuroscience, there are two main theories about the function of the ventral visual. One suggests that it serves to classify objects (classification hypothesis); the other suggests that it generates intermediate representations from which people can generate verbal descriptions, actions, and other kinds of information (distributed semantic hypothesis). To adjudicate these, we trained two deep convolutional AlexNet models on 330,000 images belonging to 86 categories, representing the intersection of Ecoset images and the semantic norms collected by the Leuven group. One model was trained to produce category labels (classification hypothesis), the other to generate all of an item’s semantic features (distributed semantic hypothesis). The two models learned very different representational geometries throughout the network. We also estimated the human semantic structure of the 86 classes by using a triadic comparison task. The representations acquired by the feature-generating model aligned better with human-perceived similarities amongst images, and better predicted human judgments in a triadic comparison task. The results thus support (distributed semantic hypothesis).
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 73
Loading