Keywords: Large language models, Physics Multi-step Reasoning, RLHF, LLM Agent
TL;DR: We introduce a refinement agent and use LoRA-based RLHF with a step-level reward model to improve reasoning using small LLM Models
Abstract: Large Language Models (LLMs) excel at many reasoning tasks but struggle with scientific domains like physics, which demand precise mathematical calculations alongside deep conceptual and factual understanding. In complex physics problem solving, LLMs commonly falter due to three core issues: misunderstanding the problem, incorrect application of concepts, and calculation mistakes. These challenges are more pronounced in small LLMs due to their limited capacity, making them more prone to failures. To address these limitations, we propose a modular reinforcement learning refinement framework tailored for small LLMs, integrating first step error localization, and correction through a Reinforcement Learning guided feedback mechanism. We also introduce PhysicsQA, a diverse benchmark of 370 physics problems designed to evaluate LLM reasoning across the aforementioned dimensions. Experimental results demonstrate improvements upto 10% in final answer accuracy reasoning using Small language models over existing approaches
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 10563
Loading