SMVKGC: A Runtime Plug-in for Streaming Knowledge Graph Construction via Inductive Multi-View Clustering

02 Mar 2026 (modified: 03 Mar 2026)ESWC 2026 Workshop KGCW SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Graph Construction, Streaming Data, Multi-View Graph Clustering, Entity Assignment
Abstract: Knowledge graph (KG) construction from streaming data poses significant challenges, particularly in efficiently integrating incoming entities that often arrive without known connections to existing nodes. This dynamic setting complicates critical graph maintenance tasks such as entity resolution and community detection, as new entities must be appropriately placed within the existing graph structure. Most current methods are designed for static graphs and rely on complete graph structure, requiring full model retraining when new entities arrive and resulting in prohibitive computational costs for real-time applications. To address these limitations, we propose Streaming Multi-View Knowledge Graph Clustering (SMVKGC), a novel framework designed as a runtime plug-in for KG construction pipelines that leverages multi-view graph representations to efficiently assign streaming entities to existing clusters without requiring graph structure information or model retraining. Our approach employs view-specific Graph Neural Networks (GNNs) to capture local neighborhood structures within each view, where a view is defined as a distinct relation type between nodes, and integrates these representations using a Transformer-based encoder with contrastive learning objectives to produce discriminative embeddings. Crucially, a lightweight projector network approximates the full GNN-encoder pipeline using only node features, enabling rapid inference for streaming entities. Once embeddings are generated, the system either assigns incoming entities to existing clusters or performs lightweight re-clustering over the expanded node set, achieving substantial runtime savings since clustering is computationally negligible relative to model retraining. We evaluate SMVKGC on six benchmark datasets (ACM, DBLP, IMDB, Texas, Chameleon, and Wisconsin), where it achieves competitive clustering performance across standard metrics (NMI, ARI, ACC, F1) while reducing inference time by orders of magnitude compared to retraining-based baselines.
Submission Number: 12
Loading