Categories vs Semantic Features: What shape the similarities people discern in photographs of objects?
Track: short paper (up to 5 pages)
Keywords: representational learning, cogntive models, semantic representations, vision
TL;DR: We see the effect of different training objectives of CNNs on human-model alignment
Abstract: In visual cognitive neuroscience, there are two main theories about the function of
the ventral visual. One suggests that it serves to classify objects (classification
hypothesis); the other suggests that it generates intermediate representations
from which people can generate verbal descriptions, actions, and other kinds of
information (distributed semantic hypothesis). To adjudicate these, we trained
two deep convolutional AlexNet models on 330,000 images belonging to 86
categories, representing the intersection of Ecoset images and the semantic
norms collected by the Leuven group. One model was trained to produce
category labels (classification hypothesis), the other to generate all of an item’s
semantic features (distributed semantic hypothesis). The two models learned very
different representational geometries throughout the network. We also estimated
the human semantic structure of the 86 classes by using a triadic comparison
task. The representations acquired by the feature-generating model aligned better
with human-perceived similarities amongst images, and better predicted human
judgments in a triadic comparison task. The results thus support (distributed
semantic hypothesis).
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 73
Loading