Track: Track 1: Technical Foundations for a Post-AGI World
Keywords: large language models, computational creativity, scientific discovery, AI for science
TL;DR: We formalize research creativity as coherence vs. cognitive availability, decompose papers into reusable "idea atoms," and sample "alien" directions that are viable but non-obvious to the community.
Abstract: Large language models are adept at synthesizing and recombining familiar material, yet they often fail at a specific kind of creativity that matters most in research: producing ideas that are both \emph{coherent} and \emph{non-obvious} to the current community. We formalize this gap through \emph{cognitive availability}, the likelihood that a research direction would be naturally proposed by a typical researcher given what they have worked on. We introduce a pipeline that (i) decomposes papers into granular conceptual units, (ii) clusters recurring units into a shared vocabulary of \emph{idea atoms}, and (iii) learns two complementary models: a \emph{coherence} model that scores whether a set of atoms constitutes a viable direction, and an \emph{availability} model that scores how likely that direction is to be generated by researchers drawn from the community. We then sample ``alien'' directions that score high on coherence but low on availability. On a corpus of $\sim$7,500 recent LLM papers from NeurIPS, ICLR and ICML, we validate that (a) conceptual units preserve paper content under reconstruction, (b) idea atoms generalize across papers rather than memorizing paper-specific phrasing, and (c) the Alien sampler produces research directions that are more diverse than LLM baselines while maintaining coherence.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Alejandro_Hernandez1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 6
Loading