TreeRAG: Unleashing the Power of Hierarchical Storage for Enhanced Knowledge Retrieval in Long Documents

ACL ARR 2024 December Submission1747 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: When confronting long document information retrieval for Query-Focused Summarization(QFS), Traditional Retrieval-Augmented Generation(RAG) frameworks struggle to retrieve all relevant knowledge points, and the chunking and retrieve strategies of existing frameworks may disrupt the connections between knowledge points and the integrity of the information. To address these issues, we propose $\textbf{TreeRAG}$, which employs $\textbf{Tree-Chunking}$ for chunking and embedding in a tree-like structure , coupled with "$\textbf{root-to-leaves}$" and "$\textbf{leaf-to-root}$" retrieve strategy named $\textbf{Bidirectional Traversal Retrieval}$. This approach effectively preserves the hierarchical structure among knowledge points and significantly enhances the ability to retrieve while minimizing noise inference. Our experimental results on the $\textbf{Finance, Law, and Medical subsets of the Dragonball dataset}$ demonstrate that $\textbf{TreeRAG}$ achieves significant enhancements in both recall quality and precision compared to traditional and popular existing methods and achieves better performance to corresponding question-answering tasks, marking a new breakthrough in long document knowledge retrieval.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: passage retrieval, dense retrieval, document representation, re-ranking
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: Chinese
Submission Number: 1747
Loading