Keywords: Novel Discovery, World Models, Analogical Reasoning, Cognitive Development, Embodied Cognition, Neuro-symbolic Architecture
TL;DR: LLMs fail at novel discovery because they lack embodied schemas of the world and robust analogical reasoning. We propose spatially grounded training and neuro-symbolic architectures to bridge the gap
Abstract: Large language models (LLMs) have been trained on vast data spanning nearly every scientific discipline, yet they have not produced a single novel discovery. Human polymaths such as John von Neumann routinely generated breakthroughs across disparate fields---from game theory to quantum mechanics to the very architecture of the modern computer---by connecting insights across domains. We argue this gap reflects a structural limitation of the LLM paradigm rather than a problem of scale. Drawing on Piaget’s theory of cognitive development and Gentner's structure-mapping, we contend novel discovery depends on two core processes: constructing nuanced internal schemas of the external world and flexibly redeploying them via analogical mapping. Without embodied data or exploration, LLMs form shallow world models; and because their architectures optimize for statistical efficiency, they struggle to extend analogies out of distribution in ways that capture relational structure across domains. Without rethinking training environments and architectures, LLMs will remain constrained to weak abstraction rather than the deep reasoning required for scientific innovation.
Submission Number: 103
Loading