Despite the promising results, the proposed approach has several limitations that point to directions for future work.
First, the reasoning scope is strictly limited to single-hop: KoRe operates on 1-hop star subgraphs and cannot natively handle multi-hop reasoning, although zero-shot WebQSP results hint at possible extensions towards that.
Further, since the datasets used in this work lack reasoning traces, our evaluation is performed with reasoning disabled -- the LLM backbone (Qwen3-8B) adopted supports enabling/disabling reasoning (``thinking mode''). We leave the analysis of reasoning impact to future works. In terms of interpretability, tracing which triples influenced a generation is not straightforward: auxiliary decoders could be added to this end in future versions of the architecture. In terms of language coverage, the presented experiments are English-centric; multilingual extensions -- which require significant data-collection efforts -- are currently ongoing.
Finally, our experimental setup currently leverages a single Wikidata KG, we plan to extend the evaluation to unseen knowledge graph schemas.

