GeoKG-Bench: Evaluating LLMs for Geospatial Domain-Specific Knowledge Graph Query Generation

ACL ARR 2026 January Submission9698 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge graph, Nebula graphs, LLM, code generation
Abstract: We study the problem of translating natural language (NL) questions into Nebula Graph Query Language (nGQL) using large language models (LLMs). % To systematically evaluate robustness, we introduce a benchmark of 105 NL–nGQL pairs covering single-hop and multi-hop queries, as well as easy and hard domain-specific anomaly questions that require implicit domain knowledge and semantic reasoning. % In particular, maritime queries such as identifying loitering or rendezvous events cannot be answered through literal keyword filtering and instead require reasoning over domain-defined conditions and graph structure. We evaluate LLMs under on two categories of NL questions: (i) knowledge-graph schema-dependent direct questions, and (ii) domain-concept anomaly event questions. % Performance is measured using token-level schema linking accuracy, constraint/filter match accuracy, projection precision, and execution accuracy. % Our results show that while existing LLMs generate accurate nGQL queries for simple questions, their performance degrades significantly on domain-specific questions, highlighting fundamental limitations in domain-aware reasoning for reliable property-graph query generation.
Paper Type: Short
Research Area: Question Answering
Research Area Keywords: knowledge graphs, benchmarking, code generation and understanding
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 9698
Loading