Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models

Hung Le; Van Dai Do; Dung Nguyen; Svetha Venkatesh

Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models

Hung Le, Van Dai Do, Dung Nguyen, Svetha Venkatesh

Published: 21 Oct 2025, Last Modified: 21 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in fine-tuning large language models (LLMs) with reinforcement learning (RL) have shown promising improvements in complex reasoning tasks, particularly when paired with chain-of-thought (CoT) prompting. However, these successes have been largely demonstrated on large-scale models with billions of parameters, where a strong pretraining foundation ensures effective initial exploration. In contrast, RL remains challenging for tiny LLMs with 1 billion parameters or fewer because they lack the necessary pretraining strength to explore effectively, often leading to suboptimal reasoning patterns. This work introduces a novel intrinsic motivation approach, called Memory-R+, that leverages episodic memory to address this challenge, improving tiny LLMs in CoT reasoning tasks. Inspired by human memory-driven learning, our method leverages successful reasoning patterns stored in memory while allowing controlled exploration to generate novel responses. Intrinsic rewards are computed efficiently using a kNN-based episodic memory, allowing the model to discover new reasoning strategies while quickly adapting to effective past solutions. Experiments on three reasoning datasets demonstrate that our approach significantly enhances smaller LLMs' reasoning performance and generalization capability, making RL-based reasoning improvements more accessible in low-resource settings.

Certifications: J2C Certification

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=uz7Oq6W4iK

Changes Since Last Submission: camera-ready version

Code: https://github.com/thaihungle/Memory-R

Assigned Action Editor: ~Kamil_Ciosek1

Submission Number: 4826

Loading