Smart-Searcher: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Smart-Searcher: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

ACL ARR 2025 May Submission4730 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the model’s internal knowledge. In this paper, we introduce Smart-Searcher, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. Smart-Searcher employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechnism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that Smart-Searcher outperforms previous RAG and reasoning methods and achieves efficient retrieval. We will release all the codes, models, and data after review.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: retrieval-augmented generation, chain-of-thought

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 4730

Loading