Ontology-Retrieval Augmented Generation for Scientific Discovery

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: ontology, rag, retrieval, llm, science, ai4science, chemistry, biomedical, reasoning
TL;DR: We propose methods for generating ontologies, and doing RAG by retrieving context from relevant Ontologies, which reduces hallucinations.
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, sparkling an increasing interest for their application in science. However, in scientific domains, their utility is often limited by hallucinations that violate established relationships between concepts or ignore their meaning; problems that are not entirely eliminated with Retrieval Augmented Generation (RAG) techniques. A key feature of science is the use of niche concepts, abbreviations and implicit relationships, which may deem RAG approaches less powerful due to the lack of understanding of concepts, especially in emerging and less known fields. Ontologies, as structured frameworks for organizing knowledge and establishing relationships between concepts, offer a potential solution to this challenge. In this work we introduce OntoRAG, a novel approach that enhances RAG by retrieving taxonomical knowledge from ontologies. We evaluate the performance of this method on three common biomedical benchmarks. To extend the value of OntoRAG to emerging fields, where ontologies have not yet been developed, we also present OntoGen, a methodology for generating ontologies from a set of documents. We apply the combined OntoGen+OntoRAG pipeline to a novel benchmark of scientific discovery in the emerging field of single-atom catalysis. Our results demonstrate the promise of this method for improving reasoning and suppressing hallucinations in LLMs, potentially accelerating scientific discovery across various domains.
Supplementary Material: pdf
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14062
Loading