AInstein: Can AI Rediscover Scientific Concepts from First Principles?

Shambhavi Mishra; Gaurav Sahu; Marco Pedersoli; Laurent Charlin; Jose Dolz; Christopher Pal

AInstein: Can AI Rediscover Scientific Concepts from First Principles?

Shambhavi Mishra, Gaurav Sahu, Marco Pedersoli, Laurent Charlin, Jose Dolz, Christopher Pal

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Multi-agent Systems, Self-Correction, LLM-As-A-Judge

Abstract: Large language models have demonstrated remarkable capabilities across diverse tasks, yet a fundamental question remains: can these models genuinely rediscover complex scientific insights, or do they merely recite memorized information? We present AInstein, a novel framework for evaluating whether language models can derive established scientific concepts from first principles when stripped of domain-specific terminology. Rather than testing the recall of scientific facts, we reformulate landmark discoveries as conceptual puzzles, challenging models to reconstruct the underlying technical solutions independently.

Submission Number: 324

Loading