Abstract: A lightweight method distinguishes articles within Wikipedia that are classes (Novel, Book) from other articles (Three Men in a Boat, Diary of a Pilgrimage). It exploits clues available within the article text and within categories associated with articles in Wikipedia, while not requiring any linguistic preprocessing tools such as part of speech taggers, named entity recognizers or syntactic parsers. Experimental results show that classes can be identified among Wikipedia articles in multiple languages, at aggregate precision and recall generally above 0.9 and 0.6 respectively.
0 Replies
Loading