Keyword Extraction Based on PageRank

Jinghua Wang, Jianyi Liu, Cong Wang

Published: 2007, Last Modified: 19 Feb 2025PAKDD 2007EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Keywords are viewed as the words that represent the topic and the content of the whole text. Keyword extraction is an important technology in many areas of document processing, such as text clustering, text summarization, and text retrieval. This paper provides a keyword extraction algorithm based on WordNet and PageRank. Firstly, a text is represented as a rough undirected weighted semantic graph with WordNet, which defines synsets as vertices and relations of vertices as edges, and assigns the weight of edges with the relatedness of connected synsets. Then we apply UW-PageRank in the rough graph to do word sense disambiguation, prune the graph, and finally apply UW-PageRank again on the pruned graph to extract keywords. The experimental results show our algorithm is practical and effective.