Keywords: Knowledge graph, Nebula graphs, LLM, code generation
Abstract: We study the problem of translating natural language (NL) questions into Nebula Graph Query Language (nGQL) using large language models (LLMs).
%
To systematically evaluate robustness, we introduce a benchmark of 105 NL–nGQL pairs covering single-hop and multi-hop queries, as well as easy and hard domain-specific anomaly questions that require implicit domain knowledge and semantic reasoning.
%
In particular, maritime queries such as identifying loitering or rendezvous events cannot be answered through literal keyword filtering and instead require reasoning over domain-defined conditions and graph structure.
We evaluate LLMs under on two categories of NL questions: (i) knowledge-graph schema-dependent direct questions, and (ii) domain-concept anomaly event questions.
%
Performance is measured using token-level schema linking accuracy, constraint/filter match accuracy, projection precision, and execution accuracy.
%
Our results show that while existing LLMs generate accurate nGQL queries for simple questions, their performance degrades significantly on domain-specific questions, highlighting fundamental limitations in domain-aware reasoning for reliable property-graph query generation.
Paper Type: Short
Research Area: Question Answering
Research Area Keywords: knowledge graphs, benchmarking, code generation and understanding
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 9698
Loading