Towards Ontology-Driven Multi-Hop Benchmark for Corporate GraphRAG Systems

Ivan Kirpichev; Eldar Kurmanaliev; Fedor Bushmelev; Maxim Abramov; Anastasiia Korepanova; Anna Kalyuzhnaya; Nikolay Nikitin

Towards Ontology-Driven Multi-Hop Benchmark for Corporate GraphRAG Systems

Ivan Kirpichev, Eldar Kurmanaliev, Fedor Bushmelev, Maxim Abramov, Anastasiia Korepanova, Anna Kalyuzhnaya, Nikolay Nikitin

Published: 10 Jun 2026, Last Modified: 10 Jun 2026IJCAI-ECAI 2026 Joint Workshop on GENAIK and NORAEveryoneRevisionsBibTeXCC BY 4.0

Track: Research

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Student Paper: Yes

Generative AI Compliance And Declaration: I confirm that this submission complies with GenAIK-NORA Generative AI Policy. The manuscript includes a mandatory declaration section explicitly stating whether Generative AI was used and, if applicable, describing the extent of its usage.

Keywords: GraphRAG, Multi-Hop Question Answering, Knowledge Graph, Ontology-driven RAG, Financial-domain Knowledge Graph

TL;DR: We introduce an ontology-driven benchmark showing that current RAG and GraphRAG systems fail on complex reasoning because they cannot preserve the strict hierarchical structure of the original domain

Abstract: Existing GraphRAG benchmarks suffer from two evaluation problems: static corpora already memorized by modern LLMs, and synthetically generated questions whose answers may not be grounded in the data. We present an automated framework that constructs a dynamic, news-enriched corporate knowledge graph and generates benchmark questions whose ground truth is physically validated against the database. The graph combines a strict ontology of S\&P 500 companies with executives, funds, products, resources, geographic entities, and a recent news stream. Questions are produced by extracting real paths from local subgraphs and using an LLM only to translate verified queries into natural language. The resulting dataset of 4,998 question–answer pairs across six complexity levels is used to compare Vanilla RAG, LightRAG, MS GraphRAG, and HippoRAG2.

Submission Number: 11

Loading