Modular Refinement of Small Language Models for Physics Reasoning via Localized Error Feedback

Raj Jaiswal; Dhruv Jain; Rishabh Dhawan; Dhruvkumar Patel; Avinash Anand; Shin'ichi Satoh; Tanuja Ganu; Rajiv Ratn Shah; Erik Cambria; Zhengkui Wang

Modular Refinement of Small Language Models for Physics Reasoning via Localized Error Feedback

Raj Jaiswal, Dhruv Jain, Rishabh Dhawan, Dhruvkumar Patel, Avinash Anand, Shin'ichi Satoh, Tanuja Ganu, Rajiv Ratn Shah, Erik Cambria, Zhengkui Wang

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, Physics Multi-step Reasoning, RLHF, LLM Agent

TL;DR: We introduce a refinement agent and use LoRA-based RLHF with a step-level reward model to improve reasoning using small LLM Models

Abstract: Large Language Models (LLMs) excel at many reasoning tasks but struggle with scientific domains like physics, which demand precise mathematical calculations alongside deep conceptual and factual understanding. In complex physics problem solving, LLMs commonly falter due to three core issues: misunderstanding the problem, incorrect application of concepts, and calculation mistakes. These challenges are more pronounced in small LLMs due to their limited capacity, making them more prone to failures. To address these limitations, we propose a modular reinforcement learning refinement framework tailored for small LLMs, integrating first step error localization, and correction through a Reinforcement Learning guided feedback mechanism. We also introduce PhysicsQA, a diverse benchmark of 370 physics problems designed to evaluate LLM reasoning across the aforementioned dimensions. Experimental results demonstrate improvements upto 10% in final answer accuracy reasoning using Small language models over existing approaches

Supplementary Material: zip

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Submission Number: 10563

Loading