AGE: Adaptive-masking for Graph Embedding in Graph Retrieval-Augmented Generation

Nguyen Huu Bao Long; Atsushi Hashimoto

AGE: Adaptive-masking for Graph Embedding in Graph Retrieval-Augmented Generation

Nguyen Huu Bao Long, Atsushi Hashimoto

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Self Supervised Learning, Large Language Models, Graph Retrieval-Augmented Generation, Reinforcement Learning, Knowledge Graphs

Abstract: GraphRAG is an extension of retrieval-augmented generation (RAG) that supports large language models (LLMs) by referring to graph-structured data as external knowledge. While this technique ideally captures intricate relationships, it often struggles with graph representations for LLMs, particularly for frozen LLMs, due to the misalignment between graph-based and text-based latent features. We tackle this issue by introducing the Adaptive-masking for Graph Embedding (AGE). AGE employs a Transformer in a mask-based self-supervised learning (SSL) approach. We designed the architecture similar to text embedding encoders, addressing the latent feature misalignment. In contrast to natural language texts, graphs are concise representations, and there exist key nodes that hold dominant contextual information, which are challenging to predict from their surroundings. Masking such key nodes leads to inefficiency in the SSL process. Therefore, AGE focuses on predicting nodes apart from key nodes, utilizing a learnable node sampler. Our experimental results indicate that AGE significantly improves approaches using non-parametric search component in GraphQA tasks, achieving superior accuracy across three benchmark datasets with distinct characteristics.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6889

Loading