Keywords: LLMs, Large Language Models, Positional Bias, Bandit Algorithms, Inference-Time Optimization
TL;DR: We propose a bandit-based algorithm that treats LLM positional bias as a signal, strategically reordering documents to find relevant information with up to 65% fewer model queries than random permutation baselines.
Abstract: Large language models exhibit a strong position bias in multi-document contexts, systematically prioritizing information based on location rather than relevance.
While existing approaches treat this bias as noise to be mitigated, we introduce GOLD PANNING BANDITS, a framework that leverages position bias as a diagnostic signal: by reordering documents and observing shifts in the model's responses, we can efficiently identify the most relevant content.
We frame the problem of choosing reorderings as a bipartite matching problem.
While an optimal assignment can be computed at each iteration with the Hungarian algorithm in $O(N^3)$ time, we propose a greedy $O(N \log N)$ strategy that achieves comparable performance by prioritizing the placement of the most uncertain documents in the most informative positions.
Our approach identifies relevant documents using up to 65\% fewer language model queries than random permutation baselines on knowledge-intensive NLP tasks, substantially reducing computational cost without model retraining.
This work demonstrates that inherent LLM biases can be transformed from liabilities into assets for efficient, inference-time optimization.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22258
Loading