A Supervised Keyphrase Extraction System Based on Graph Representation Learning

Corina Florescu, Wei Jin

Published: 2019, Last Modified: 30 Jul 2025ECIR (1) 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Current supervised approaches for keyphrase extraction represent each candidate phrase with a set of hand-crafted features and machine learning algorithms are trained to discriminate keyphrases from non-keyphrases. Although the manually-designed features have shown to work well in practice, feature engineering is a labor-intensive process that requires expert knowledge and normally does not generalize well. To address this, we present SurfKE, an approach that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that SurfKE, which uses its self-discovered features in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.

External IDs:dblp:conf/ecir/FlorescuJ19