Abstract: Keyphrase extraction involves automatically identifying key phrases that represent the core content within a document. Recent advancements have improved unsupervised methods for keyphrase extraction, enabling efficient operation without requiring labeled training data. As document length increases, effective keyphrase extraction becomes increasingly crucial for information summarization and retrieval. However, existing methods often struggle with long documents due to challenges in capturing global context and semantic relationships across extended text. In this paper, we propose UFORank, a unified framework for unsupervised keyphrase extraction specifically designed for long documents. UFORank integrates three key components: 1) topic importance derived from clustering semantically similar phrases, 2) position-biased weights that consider both relative positions and frequencies of phrases within the document structure, and 3) phrase-to-topic similarity measures for enhanced relevance scoring. Additionally, UFORank employs Glow, a flow-based generative model, to improve the semantic representation quality of both phrases and documents in the embedding space. Experimental evaluations on three benchmark datasets for long document keyphrase extraction demonstrate that UFORank achieves competitive performance compared to existing state-of-the-art methods, including PromptRank and Attention-Seeker. Specifically, UFORank achieves F1-scores of 17.66%, 20.73%, and 13.17% on the respective datasets. Comprehensive ablation studies validate the individual contributions of each framework component to the overall performance gains.
External IDs:doi:10.1109/access.2026.3654120
Loading