Enhancing Accuracy and Diversity in Retrieval-Augmented Generation through a Document-Structure-Aware Reranking Framework
Abstract: Retrieval-Augmented Generation (RAG) systems often suffer from contextual redundancy and limited recognition of domain-specific entities in specialized domains, which degrades the quality and accuracy of responses generated by Large Language Models (LLMs). To address these challenges, we propose a document-structure-aware reranking framework that enhances both relevance and informational diversity, thereby improving the comprehensiveness and reliability of LLM outputs. Our approach consists of two key components: a multi-channel relevance scoring mechanism that combines thematic matching and entity-level signals, and a dynamic Maximal Marginal Relevance (MMR) algorithm based on thematic structure. This algorithm dynamically adjusts the trade-off parameter between relevance and diversity, effectively reducing semantic overlap among top-ranked passages. We conduct relevance evaluation on an internal benchmark dataset. Our method significantly outperforms existing baselines across multiple core metrics, with a 10.6\% improvement in ranking accuracy over the internal baseline. Additionally, the framework further enhances the quality of model-generated responses by increasing the information density of the top-k document set.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: Information Retrieval and Text Mining,Question Answering
Contribution Types: NLP engineering experiment
Languages Studied: Chinese,English
Submission Number: 3829
Loading