Modular Refinement of Small Language Models for Physics Reasoning via Localized Error Feedback

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language models, Physics Multi-step Reasoning, RLHF, LLM Agent
TL;DR: We introduce a refinement agent and use LoRA-based RLHF with a step-level reward model to improve reasoning using small LLM Models
Abstract: Large Language Models (LLMs) excel at many reasoning tasks but struggle with scientific domains like physics, which demand precise mathematical calculations alongside deep conceptual and factual understanding. In complex physics problem solving, LLMs commonly falter due to three core issues: misunderstanding the problem, incorrect application of concepts, and calculation mistakes. These challenges are more pronounced in small LLMs due to their limited capacity, making them more prone to failures. To address these limitations, we propose a modular reinforcement learning refinement framework tailored for small LLMs, integrating first step error localization, and correction through a Reinforcement Learning guided feedback mechanism. We also introduce PhysicsQA, a diverse benchmark of 370 physics problems designed to evaluate LLM reasoning across the aforementioned dimensions. Experimental results demonstrate improvements upto 10% in final answer accuracy reasoning using Small language models over existing approaches
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 10563
Loading