Do Large Language Models Understand Scientific Argumentation?

Do Large Language Models Understand Scientific Argumentation?

ICLR 2026 Conference Submission18838 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Computational argumentation, Scientific AI assistants, Large language models

TL;DR: Frontier LLMs fail at scientific argumentation despite strong math skills; our reasoning-aware training enables 3B models to match much larger systems, proving scientific discourse requires explicit supervision over scale

Abstract: Scientific discourse depends on argumentative reasoning: identifying claims, evaluating evidence, and constructing coherent responses. While recent advances in reasoning-capable language models have demonstrated strong performance on mathematical and logical benchmarks, their ability to engage in scientific argumentation remains unclear. We present the first systematic evaluation of language models across eight tasks spanning argument mining, rebuttal generation, and discourse-level reasoning using research papers, peer reviews, and grant proposals. Our study reveals that even frontier models with strong general reasoning skills struggle with domain-specific argumentative tasks, highlighting a fundamental capability gap. To address this, we introduce a training framework that explicitly scaffolds the argumentative reasoning process in language models, substantially improving their competence in scientific discourse. The resulting compact models approach or exceed the performance of much larger proprietary systems and generalize to unseen conversational settings, demonstrating reasoning transfer beyond task-specific supervision. These findings underscore that effective scientific argumentation is not an emergent property of scale, but requires explicit reasoning-aware training, and they point toward practical pathways for building AI systems that can contribute meaningfully to scientific discourse.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 18838

Loading