Abstract: Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Links within Wikipedia indicate that the two texts of a link origin and destination are related about their semantic topics. Existing link detection methods focus on article titles because most of links in Wikipedia point to article titles. But there are a number of links in Wikipedia pointing to corresponding segments, because the whole article is too general and it is hard for readers to obtain the intention of the link. We propose a method to automatically predict whether a link target is a specific segment and provide which segment is most relevant. We propose a combination method of Latent Dirichlet Allocation (LDA) and Maximum Likelihood Estimation (MLE) to represent every segment as a vector, then we obtain similarity of each segment pair, finally we utilize variance, standard deviation and other statistical features to predict the results. Through evaluations on Wikipedia articles, our method performs better result than existing methods.
Loading