A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Florin Cuconasu; Giovanni Trappolini; Nicola Tonellotto; Fabrizio Silvestri

A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Florin Cuconasu, Giovanni Trappolini, Nicola Tonellotto, Fabrizio Silvestri

Published: 01 May 2026, Last Modified: 12 May 2026RAG4Report 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: retrieval-augmented generation, LLMs, information retrieval

TL;DR: Base LLMs are more accurate for RAG than their instruct counterparts, but are less trustworthy.

Abstract: Retrieval-Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by Large Language Models (LLMs). Common wisdom and practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques. However, contrary to this popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20\% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more complex situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the topic; or, as Fromm would have it, "Seldom is a glance at the statistics enough to understand the meaning of the figures".

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 6

Loading