Abstract: As the universal phenomenon that two people share a same name, picking out the relevant pages of a specific person from a mass of result documents which are related to multiple namesakes becomes a very troublesome and annoying thing. This paper proposes a contexts-co-occurrence-based method to deal with the name discrimination problem. Firstly, in order to consider both the global and local properties of a document, we extract important terms from the result collection through the method of combining the weight model and the windows model together. Secondly, base on those terms, the co-occurrence scores between any two terms are computed and then a contexts co-occurrence matrix is obtained. After that, according to the context matrix, we combine related context terms together to form several decision vectors. Finally, the similarities between any document vectors and decision vectors are computed through the VSM(vector space model) model. For a person name search task, the result documents can be divided into several groups automatically by the proposed method. The experiment result proves that our method can discriminate different persons who share the same name accurately and effectively.
External IDs:dblp:conf/fskd/ChenLZ11
Loading