Abstract: We present a combined pipeline for knowledge-graph construction and ontology expansion. This
approach creates a BIO-tagged corpus via fully automatic LLM-based pseudo-annotation and introduces
dedicated UNK reserve categories to capture previously unseen classes and relations. A specialized NER/RE
model is trained on a 3-million-token dataset with 92 labels. This model exhibits a conservative quality pro-
file—high precision with moderate recall—suited for safe graph enrichment: integrating the extracted facts
expands the graph to ~0.98 million triples, while the expansion ratio (total inferred facts to explicit triples)
increases from 2.65 to 3.52, with logical consistency preserved. UNK label pools are converted into stable
synsets, enabling semi-automatic ontology expansion; 12 new classes derived from unstructured texts were
added. We also demonstrate practical value for querying and analytics using an LLM + SPARQL setup.
Loading