Automatic and Semi-Automatic Methods for Domain Knowledge-Graph Construction and Ontology Expansion

Andrey Khalov, Olga Ataeva, Natalia Tuchkova

Published: 10 Nov 2025, Last Modified: 08 May 2026AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS Vol. 59 Suppl. 6EveryoneRevisionsCC BY 4.0

Abstract: We present a combined pipeline for knowledge-graph construction and ontology expansion. This approach creates a BIO-tagged corpus via fully automatic LLM-based pseudo-annotation and introduces dedicated UNK reserve categories to capture previously unseen classes and relations. A specialized NER/RE model is trained on a 3-million-token dataset with 92 labels. This model exhibits a conservative quality pro- file—high precision with moderate recall—suited for safe graph enrichment: integrating the extracted facts expands the graph to ~0.98 million triples, while the expansion ratio (total inferred facts to explicit triples) increases from 2.65 to 3.52, with logical consistency preserved. UNK label pools are converted into stable synsets, enabling semi-automatic ontology expansion; 12 new classes derived from unstructured texts were added. We also demonstrate practical value for querying and analytics using an LLM + SPARQL setup.