Web table data integration based on smart campus scenarios to resolve name disambiguation of scientific research personnel

Abstract: Name ambiguity issue that results from the similarity of many common Chinese names. With the development of artificial intelligence, the disambiguation model based on machine learning has achieved better disambiguation effects and has been widely used in various universities. However, continually improving the disambiguation effect remains a major challenge. Smart campuses based on the Internet of Things are developing rapidly, and a large number of discretely distributed web tables that omit data values exist. However, the usable attributes of the disambiguation model are limited. To overcome these challenges, this study proposes a name disambiguation model of web tables from data integration (NDWT) in smart campuses. The model first recognises the label mapping in a webpage table using four types of label matchers and then designs the instance comparator based on the obtained label mapping. The web tables are integrated according to the instance mapping relationship, and two datasets, one before (BWT) and the other after (A WT) integration, are obtained. Relevant features are subsequently extracted from these two datasets and trained. Finally, the NDWT model is used for disambiguation experiments. Comparative experiments, condu-cted using seven different types of ML models, show that the NDWT model improves significantly after the integration of web tables; in particular, the pairwise F1 of the K-means model increases by 43.23%. The pairwise F1 of the remaining models increases by approximately 10%. The experimental evaluation proves the feasibility of the NDWT model proposed in this study. Confirming that it can achieve a higher distribution quality compared to conventional name disambiguation methods.
0 Replies
Loading