Keywords: knowledge graph, complex query answering
TL;DR: We construct a dataset for answering complex logical queries with multiple variables, $\text{EFO}_k$-CQA, consisting of 741 query types for empirical evaluation,providing new insights into how query hardness affects the outcomes.
Abstract: To answer complex queries on knowledge graphs, logical reasoning over incomplete knowledge needs learning-based methods because they are capable of generalizing over unobserved knowledge. Therefore, an appropriate dataset is fundamental to both obtaining and evaluating such methods under this paradigm. In this paper, we propose a comprehensive framework for data generation, model training, and method evaluation that covers the combinatorial space of Existential First-order Queries with multiple variables ($\text{EFO}_k$). The combinatorial query space in our framework significantly extends those defined by set operations in the existing literature. Additionally, we construct a dataset, $\text{EFO}_k$-CQA, with 741 query types for empirical evaluation, and our benchmark results provide new insights into how query hardness affects the results. Furthermore, we demonstrate that the existing dataset construction process is systematically biased and hinders the appropriate development of query-answering methods, highlighting the importance of our work. Our code and data are provided in~\url{https://anonymous.4open.science/r/EFOK-CQA/README.md}.
Submission Number: 976
Loading