Is Complex Query Answering Really Complex?

Cosimo Gregucci; Bo Xiong; Daniel Hernández; Lorenzo Loconte; Pasquale Minervini; Steffen Staab; Antonio Vergari

Is Complex Query Answering Really Complex?

Cosimo Gregucci, Bo Xiong, Daniel Hernández, Lorenzo Loconte, Pasquale Minervini, Steffen Staab, Antonio Vergari

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

TL;DR: We highlight how common benchmarks for complex query answering with neural models are skewed towards "simple" queries and propose new more challenging benchmarks that solve this issue.

Abstract: Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as *complex* as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks most queries (up to 98% for some query types) can be reduced to simpler problems, e.g., link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreses significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that *require* models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

Lay Summary: Knowledge graphs are collections of facts that can be queried to find specific information. However, when the knowledge graph is incomplete, retrieving all answers to a query becomes challenging—a task known as *complex query answering* (CQA). While benchmarks exist to evaluate CQA, we find that most answers to the queries can be found by predicting just a single missing link in the graph, making them less hard than intended.   To fix this, we introduce new, more challenging benchmarks containing queries that require multi-hop reasoning and offer a more balanced level of hardness.   Our experiments show that all existing models perform significantly worse on the new benchmarks, and no single method stands out as clearly superior. This highlights that CQA remains an open challenge and calls for the development of more sophisticated approaches.

Link To Code: https://github.com/april-tools/is-cqa-complex

Primary Area: General Machine Learning->Evaluation

Keywords: complex query answering, knowledge graphs, multi-hop reasoning, neuro-symbolic

Submission Number: 10624

Loading