An Evolutionary Epistemology of Post-Training
Keywords: epistemology of AI, evolutionary epistemology, post-training, RLVR, GRPO
TL;DR: Campbell's Blind Variation Selective Retention provides a framework for thinking about RLVR post-training, with implications for debates over whether frontier models can generate knowledge and where the approach might generalize.
Abstract: The capacity of frontier models to produce new knowledge is actively contested, and recent results, including GPT-5.4 Pro's solutions to several Erdős Problems, have brought greater attention to the debate. Past work has tended to grant novelty only in a thin sense, attributing model outputs either to recombination of latent training-data elements or to reversion to modal outcomes within the training distribution. We argue that Campbell's evolutionary epistemology, and in particular Blind Variation Selective Retention (BV-SR), supports the stronger claim that post-training techniques such as Reinforcement Learning from Verified Rewards (RLVR) instantiate a genuine selectionist process, and that the resulting knowledge is properly attributed to the system rather than redistributed back across the authors of the pre-training corpus. We further consider where this framing predicts current approaches will succeed, where they will stall, and what it implies for extending these methods beyond mathematics into the wider sciences.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 88
Loading