LLM OntologyRAG - Extending a Food-Agent with a Description Logic Knowledge Representation
Keywords: Large Language Model, Ontology, Neuro Symbolic Reasoning, Retrieval Augmented Generation
TL;DR: Using an ontology as a Description Logic context for RAGs, extending food agents with the Food ontology to improve LLM responses and reasoning related to food and nutrition queries
Abstract: Large Language Models (LLMs) are known to hallucinate (Zhang, et al., 2023; Gao et al., 2024). Updating
information or facts in the models requires complete retraining from scratch. Adding private or personal
information to the model is limited. To remedy this situation, Retrieval-Augmented Generation (RAG)
(Lewis, et al., 2020) has been proposed as a solution to minimize hallucinations in LLMs, provide updated
facts and knowledge, and to allow for private and personal information to be accessed by LLMs.
RAGs typically provide an extended context to an LLM given a specific user query. In RAG systems, the
user query is intercepted before it is handed over to the LLM. A similarity search based on the query
identifies related text segments in a database. Similar text segments are forwarded to the LLM as context
along with the query, allowing the LLM to generate a response with higher priority. Such an architecture
can reduce hallucinations and provide access to updated information or knowledge to an LLM. It also
allows users to add their private texts to an LLM without having to include them in the training corpus of
LLMs.
In our approach, we extend the RAG architecture by adding a formal ontology, i.e., a formal definition of a
specific domain using a Description Logic (DL) (Baader, et al., 2007) formalism, i.e., the Web Ontology
Language (OWL). As a domain, we picked food and nutrition, focusing on the USDA FoodData Central
databases. The USDA food data (USDA FoodData, 2026) is provided as relational databases and tables.
We converted the relational database into an OWL format reflecting hierarchical relations in the concept
taxonomy (or class hierarchy). The concept hierarchy includes general food items and specific products, as
well as all the nutrients associated with the food items in the USDA database. The food items, ingredients,
and nutrients are arranged in a hierarchy of hypernyms. Relations between the concepts (OWL classes) are
represented as formal relations with specified domains and ranges. This structure facilitates Description
Logic-based reasoning and allows users to identify food items by nutrition types or ingredient groups,
and to see the related food items. The ontology can be queried using SPARQL, and it can be used partially
or completely as the context in LLMs with a RAG-type of architecture.
Hierarchical and graph-based arrangement of knowledge has been proven to be beneficial in RAGs
(Peng et al., 2024; Huang et al., 2025). However, using ontologies and DL-based RAGs provides
additional advantages. We provide results for experiments using: a.) LLMs to generate SPARQL
queries from user queries to pull triple sets from the Food ontology; b.) select a subgraph from the
Food ontology as a context for a user query in a RAG architecture; c.) LLMs that consume the entire
Food ontology as a context to respond to a user query.
State-of-the-art LLMs can process structured RDF in Turtle format directly and generate responses
from ontologies. Subgraphs and triple sets can be provided as raw triple sets in RDF format, or as
formulated sets of short sentences of subject-predicate-object tuples. We show how an
OntologyRAG not only provides much better results and baseline text-based RAGs, but also useful
reasoning capabilities that provide significantly better tools, for example, for summarization using
hypernyms and semantic properties.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 36
Loading