EDCleaner: Data Cleaning for Entity Information in Social Network

Published: 2019, Last Modified: 15 Jan 2026ICC 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The application of social network has produced a large amount of entity data in different formats, which accompanied by problems such as data offset and attribute missing. The existing research on dealing with multiple data scenarios and accuracy performance is insufficient. To solve the problem of data cleaning, EDCleaner is carried out on transforming entity information into structured data with attribute labeling. The method of attribute recognition and data normalization for semi-structured data is proposed in EDCleaner, which efficiently identifies the attribute tag relationship of data and obtains the structured data with uniform specifications. Furthermore, a data cleaning model with active learning extension is established. The machine learning classifier is used to further improve the accuracy of attribute recognition, and finally form an efficient and accurate data cleaning method. Experimental results show that EDCleaner improves the cleaning accuracy and other performing indicators of entity information and exceeds the level of state of the art.
Loading