Lost but Not Only in the Middle

Jan Hutter, David Rau, Maarten Marx, Jaap Kamps

Published: 01 Jan 2025, Last Modified: 29 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Large language models (LLMs) are known to exhibit positional bias, the tendency of models to perform differently based on where relevant information appears within the input context. Understanding this bias is important in a retrieval augmented generation (RAG) setting, as it impacts how retrieved passages are taken into account by the model. We systematically investigate positional bias in a RAG setting, by evaluating four LLMs using three different types of distractor documents, assessing their ability to extract relevant information from the input context. Our findings reveal significant positional bias depending on the type of context documents used and the total amount of documents in the context. Furthermore, the results show that positional bias in state-of-the-art LLMs is not limited to information located in the middle of the input context. By analyzing the models’ attention, we identified patterns between the model’s accuracy in responding to questions and the correct attribution of attention to relevant information in the context. Our code is available at https://github.com/Janhutter/LBNOITM.
Loading