KnItLM: Weaving Knowledge into Instruction-Tuned LLMs via Continual Pre-Training and Merging

KnItLM: Weaving Knowledge into Instruction-Tuned LLMs via Continual Pre-Training and Merging

ICLR 2026 Conference Submission19631 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Ingestion, RAG, Lora Transfer

TL;DR: Knowledge Ingestion into Instruction-Tuned LLMs via Continual Pre-Training and Merging for Improved RAG Performance

Abstract: RAG has become the de facto method for incorporating new, corpus-specific knowledge into an instruction following LLM (Instruct LLM). Although RAG-based prompting improves factual grounding, it fails when retrieval is incorrect or incomplete, leading to hallucinations. Finetuning methods such as RAFT and PA-RAG enhance RAG by ingesting new knowledge into the parameters of the model, but require generating massive amount of synthetic QA that covers the entire corpus. Continued Pre-Training (CPT) on the text corpus avoids the need for comprehensive synthetic data generation but breaks the instruction following capabilities of an Instruct LLM, necessitating instruction fine-tuning (IFT) post CPT. However, IFT is costly and may be infeasible due to the unavailability of an instruction tuning corpus. In this work, we propose KnItLM - KNowledge IngesTion via LoRA Merging. Instead of doing CPT on the Instruct LLM, KnItLM performs CPT with Low-Rank Adapters (LoRA) on its corresponding base LLM to infuse new knowledge. These knowledge-infused LoRA weights are then merged with the Instruct LLM, imparting new knowledge without impacting their instruction following capabilities. KnItLM avoids expensive instruction fine-tuning and relies on model merging to infuse the new knowledge into the Instruct LLM without destroying its instruction following capabilities. Empirical results show that KnItLM significantly improves the performance of RAG by taking accuracy from $54.17$% to $79.26$% for retrieval failure cases. In addition, the proposed method achieves superior performance to existing approaches while requiring substantially less training data.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 19631

Loading