Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning

Alan Li; Yixin Liu; Arpan Sarkar; Doug Downey; Arman Cohan

Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning

Alan Li, Yixin Liu, Arpan Sarkar, Doug Downey, Arman Cohan

Published: 23 Sept 2025, Last Modified: 07 Dec 2025FoRLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning, knowledge tracing/discovering/inducing, applications

TL;DR: We conduct a systematic study to examine reasoning and knowledge synergy in scientific problem-solving. We show that reasoning LLMs can be bottlenecked by domain knowledge, while reasoning-fine-tuning can help models surface relevant knowledge.

Abstract: Scientific reasoning tasks pose unique challenges for LLMs, requiring both deep domain knowledge and the ability to apply such knowledge through complex reasoning. While automated scientific reasoners hold great promise for assisting human scientists, currently there is neither a holistic dataset for evaluating scientific reasoning nor are there methods for disentangling the distinct roles of reasoning and knowledge in these tasks. To address these gaps, we introduce **SciReas**, a diverse suite of existing benchmarks for scientific reasoning tasks, and **SciReas-Pro**, a selective subset that requires more complex reasoning. We then propose **KRUX**, an evaluation framework that probes the distinct roles of reasoning and knowledge in scientific tasks. Combining the two, we conduct in-depth analysis that yields several key findings: (1) Retrieving task-relevant knowledge from parameters is a critical bottleneck for LLMs when carrying out scientific reasoning; (2) Stronger reasoning models consistently benefit from external knowledge added in-context; and, (3) Enhancing verbalized reasoning improves LLMs' ability to recall task-relevant knowledge.

Submission Number: 135

Loading