KIKA: dual-stage domain knowledge injection-adaptation for multi-domain entity matching with contextual prompt enhancement

Dongyu Yang, Mengfei Xiong, Derong Shen, Tiezheng Nie, Yue Kou

Published: 2026, Last Modified: 03 Apr 2026J. Supercomput. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Entity matching (EM) is critical for data integration but faces challenges in real-world multi-domain scenarios, including high annotation costs, structural data defects (dirty/textual instances), domain-specific semantic ambiguity, and limited multi-domain generalization. To address these issues, we propose KIKA, the first BERT-based multi-domain EM framework via contextual prompt enhancement (CPE) preprocessor and dual-stage training. KIKA injects multi-domain knowledge into domain adapters via masked-token prediction and enhances multi-domain semantic understanding through a contrastive loss. We achieve adaptive adjustment of domain knowledge weights through fine-tuning, which maintains matching precision while substantially reducing required labeled training samples. Notably, multi-domain EM inherently involves processing massive heterogeneous data across diverse fields. This demands substantial computational power. Such power is critical for three key tasks: efficient model training, large-scale knowledge injection, and real-time inference. Thus, supercomputing capabilities become indispensable. Experiments on 20 datasets show that KIKA elevates state-of-the-art (SOTA) F1 scores by 1.77 scores on average in multi-domain scenarios and secures 6\(\times \)1st/2\(\times \)2nd in eight domain-shift tasks, establishing a robust solution for multi-domain entity matching.