Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Haozhen Zhang; Tao Feng; Jiaxuan You

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Haozhen Zhang, Tao Feng, Jiaxuan You

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Long-context Summarization, Graph Neural Networks, Large Language Models

TL;DR: We propose a method named graph of records (GoR), which utilizes LLM-generated historical responses in retrieval-augmented generation to enhance long-context summarization

Abstract: Retrieval-augmented generation (RAG) has revitalized Large Language Models (LLMs) by injecting non-parametric factual knowledge. Compared with long-context LLMs, RAG is considered an effective summarization tool in a more concise and lightweight manner, which can interact with LLMs multiple times using diverse queries to get comprehensive responses. However, the LLM-generated historical responses, which contain potentially insightful information, are largely neglected and discarded by existing approaches, leading to suboptimal results. In this paper, we propose \textit{graph of records} (\textbf{GoR}), which leverages historical responses generated by LLMs to enhance RAG for long-context global summarization. Inspired by the \textit{retrieve-then-generate} paradigm of RAG, we construct a graph by creating an edge between the retrieved text chunks and the corresponding LLM-generated response. To further uncover the sophisticated correlations between them, GoR further features a \textit{graph neural network} and an elaborately designed \textit{BERTScore}-based objective for self-supervised model training, enabling seamless supervision signal backpropagation between reference summaries and node embeddings. We comprehensively compare GoR with 12 baselines on four long-context summarization datasets, and the results indicate that our proposed method reaches the best performance. Extensive experiments further demonstrate the effectiveness of GoR.

Primary Area: learning on graphs and other geometries & topologies

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10834

Loading