ORAG: Ontology-Guided Retrieval-Augmented Generation for Theme-Specific Entity Typing

Jinfeng Xiao; Linyi Ding; James Barry; Mohab Elkaref; Geeth De Mel; Jiawei Han

ORAG: Ontology-Guided Retrieval-Augmented Generation for Theme-Specific Entity Typing

Jinfeng Xiao, Linyi Ding, James Barry, Mohab Elkaref, Geeth De Mel, Jiawei Han

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: LMs and the world

Keywords: Fine-grained entity typing, theme-specific entity typing, retrieval-augmented generation, ontology enrichment, ontology-based information retrieval

TL;DR: We propose ontology-guided retrieval-augmented generation (ORAG), and show it mitigates LLM hallucinations for theme-specific entity typing.

Abstract: Large language models (LLMs) incorporated with retrieval-augmented generation (RAG) have shown great power in many NLP tasks, including fine-grained entity typing (FET). However, we observe that recent LLMs can easily suffer from hallucinations on highly specialized and fast-evolving themes (e.g., redox-active organic electrode materials), especially in the following cases: (1) unseen entities: an entity never appears in the pre-training corpora of LLMs; and (2) misleading semantics: the context of an entity can potentially mislead an entity typing algorithm if the relevant knowledge is not correctly retrieved and utilized. To address these challenges, this paper proposes an Ontology-Guided Retrieval-Augmented Generation (ORAG) approach that incorporates ontology structures with RAG for the theme-specific entity typing task. ORAG first enriches the label ontology with external knowledge and constructs a structured knowledge unit for each node. Then, it retrieves the relevant nodes by dense passage retrieval and expands the retrieved results based on the ontological structure. In this way, more supporting knowledge will be retrieved within the limited input of LLMs for entity typing. In the evaluation, we construct a dataset with two themes for theme-specific entity typing with a focus on unseen entities and misleading semantics. We observe notable cases of hallucination when vanilla RAG is applied to Llama-3, GPT-3.5, and GPT-4, while ORAG can effectively mitigate such hallucinations and improve the results.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 951

Loading