A Survey of Keyphrase Generation

ACL ARR 2024 June Submission332 Authors

10 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Keyphrase generation refers to the task of producing a set of words or phrases that summarises the content of a document. Continuous efforts have been dedicated to this task over the past few years, spreading across multiple lines of research, such as model architectures, data resources, and use-case scenarios. Yet, the current state of keyphrase generation remains unknown as there has been no attempt to review and analyse previous work. This survey bridges that gap and provides a comprehensive overview of the recent progress, limitations and open challenges in keyphrase generation. Our analysis of over 40 research papers reveals interesting new insights, such as that 1) commonly-used datasets are so similar that there is no practical benefit in using them together for evaluation, or that 2) the performance of many models was significantly overestimated due to the application of normalization procedures in ground truth. This paper not only surveys the literature but also addresses some of these concerns by training, documenting and releasing a strong PLM-based model for keyphrase generation, along with an evaluation framework, as an effort to facilitate future research.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: benchmarking, metrics, reproducibility, analysis
Contribution Types: Reproduction study, Publicly available software and/or pre-trained models, Surveys
Languages Studied: English
Submission Number: 332
Loading