A scalable platform to build the data layer of knowledge graph AI

Published: 23 Sept 2025, Last Modified: 28 Oct 2025NPGML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: knowledge graphs, graph AI, graph construction, data integration, reproducibility
TL;DR: We introduce Optimus, a platform for building large-scale KGs with an emphasis on reproducibility and extensibility.
Abstract: Knowledge graphs (KGs) underpin modern graph AI, from retrieval-augmented generation to large graph-language models. However, pipelines to construct and maintain KGs remain irreproducible and challenging to scale. We introduce Optimus, an opinionated platform for building large-scale KGs with an emphasis on reproducibility and extensibility. Optimus adopts a data lake-inspired medallion architecture; enforces schema contracts and identifier harmonization; and produces machine learning-ready KG exports. In benchmarking experiments, Optimus constructed a biomedical KG with 192,307 nodes, 21.5M edges, and 88.6M properties from 47 heterogeneous datasets. Parallelized execution reduced wall clock build time by 56.5% compared to sequential execution (143.6 s vs. 62.4 s), while throughput per edge improved as the graph scaled. These results demonstrate that Optimus enables efficient, reproducible, and scalable KG construction, strengthening the data layer of knowledge-grounded AI.
Submission Number: 105
Loading