Abstract: Knowledge graphs and ontologies represent symbolic and factual information that can offer structured and interpretable
knowledge. Extracting and manipulating this type of information is a crucial step in complex processes. While large
language models (LLMs) are known to be useful for extracting and enriching knowledge graphs and ontologies, previous
work has largely focused on comparing architecture-specific models (e.g. encoder-decoder only) across benchmarks
from similar domains. In this work, we provide a large-scale comparison of the performance of certain LLM features
(e.g. model architecture and size) and task learning methods (fine-tuning vs. in-context learning (iCL)) on text-to-graph
benchmarks in two domains, namely the general and biomedical ones. Experiments suggest that, in the general domain,
small fine-tuned encoder-decoder models and mid-sized decoder-only models used with iCL reach overall comparable
performance with high entity and relation recognition and moderate yet encouraging graph completion. Our results
also suggest that, regardless of other factors, biomedical knowledge graphs are notably harder to learn and are better
modelled by small fine-tuned encoder-decoder architectures. Pertaining to iCL, we analyse hallucinating behaviour related
to sub-optimal prompt design, suggesting an efficient alternative to prompt engineering and prompt tuning for tasks with
structured model output.
Loading