Keywords: reasoning, knowledge tracing/discovering/inducing, applications
TL;DR: We conduct a systematic study to examine reasoning and knowledge synergy in scientific problem-solving. We show that reasoning LLMs can be bottlenecked by domain knowledge, while reasoning-fine-tuning can help models surface relevant knowledge.
Abstract: Scientific reasoning tasks pose unique challenges for LLMs, requiring both deep domain knowledge and the ability to apply such knowledge through complex reasoning. While automated scientific reasoners hold great promise for assisting human scientists, currently there is neither a holistic dataset for evaluating scientific reasoning nor are there methods for disentangling the distinct roles of reasoning and knowledge in these tasks. To address these gaps, we introduce **SciReas**, a diverse suite of existing benchmarks for scientific reasoning tasks, and **SciReas-Pro**, a selective subset that requires more complex reasoning. We then propose **KRUX**, an evaluation framework that probes the distinct roles of reasoning and knowledge in scientific tasks.
Combining the two, we conduct in-depth analysis that yields several key findings:
(1) Retrieving task-relevant knowledge from parameters is a critical bottleneck for LLMs when carrying out scientific reasoning;
(2) Stronger reasoning models consistently benefit from external knowledge added in-context; and,
(3) Enhancing verbalized reasoning improves LLMs' ability to recall task-relevant knowledge.
Submission Number: 135
Loading