Entropy-Based Dynamic Hybrid Retrieval for Adaptive Query Weighting in RAG Pipelines

John Richard Perez; James Yuncheng Zhou; Radley Le; Alexander Menchtchikov; Ryan Lagasse

Entropy-Based Dynamic Hybrid Retrieval for Adaptive Query Weighting in RAG Pipelines

John Richard Perez, James Yuncheng Zhou, Radley Le, Alexander Menchtchikov, Ryan Lagasse

Published: 12 Jun 2025, Last Modified: 06 Jul 2025VecDB 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: retrieval augmented generation, hybrid retrieval, entropy

TL;DR: We propose a dynamic hybrid retrieval method that adaptively reweights sparse and dense scores using entropy-based confidence measures, significantly improving retrieval performance over static baselines on HotPotQA and TriviaQA.

Abstract: Traditional sparse and dense retrieval methods independently exhibit critical limitations: sparse models offer high lexical precision but lack semantic flexibility, while dense models capture semantic similarity but may introduce false positives due to embedding generalization. Hybrid retrieval aims to unify their strengths, yet current methods typically use static weighting, failing to adapt to query-specific retrieval uncertainties. We propose a dynamic hybrid retrieval method that performs multi-round entropy-based reweighting to iteratively optimize the linear combination of sparse and dense scores. Leveraging normalized Shannon entropy as a proxy for retrieval confidence, we update weight coefficients $w_s$ and $w_d$ across iterations until convergence or a predefined maximum is reached. The top-$k$ documents are re-ranked at each step, using fixed sparse and dense retrieval outputs, improving robustness without repeated querying. We implement our approach using a BM25-FAISS hybrid pipeline with MiniLM-L6-v2 embeddings and evaluate performance on HotPotQA and TriviaQA. Experimental results demonstrate that our dynamic hybrid model, under an optimal convergence threshold of $epsilon = 0.10$, significantly outperforms both pure dense and fixed-weight hybrid baselines in LLM-as-a-Judge (LLMJ) scores across both datasets, with statistically significant gains on TriviaQA $p < 0.01$ and marginal gains on HotPotQA $p \approx 0.055$, confirming the efficacy of entropy-aware adaptive retrieval.

Submission Number: 21

Loading