Abstract: Understanding vision requires capturing the vast diversity of the visual world we experience. How can we sample this diversity in a manner that supports robust, generalizable inferences? While widely-used, massive neuroimaging datasets have strongly contributed to our understanding of brain function, their ability to comprehensively capture the diversity of visual and semantic experiences remains largely untested. More broadly, the factors required for diverse and generalizable datasets have remained unknown. To address these gaps, we introduce LAION-natural, a curated subset of 120 million natural photographs filtered from LAION-2B, and use it as a proxy of the breadth of our visual experience in assessing visual-semantic coverage. Our analysis of CLIP embeddings of these images reveals significant representational gaps in existing datasets, demonstrating that they cover only a restricted subset of the space spanned by LAION-natural. Simulations and analyses of functional MRI data further show that these gaps lead to impaired out-of-distribution generalization. Importantly, our results reveal that even moderately sized stimulus sets can achieve strong generalization if they are sampled from a diverse stimulus pool, and that this diversity is more important than the specific sampling strategy employed. These findings not only highlight limitations of existing datasets in generalizability and model comparison, but also provide clear strategies for future studies to support the development of stronger computational models of the visual system and generalizable inferences.
Submission Number: 37
Loading