GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

ACL ARR 2024 June Submission2770 Authors

15 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Heterogeneous graph neural networks have recently gained attention for long document summarization, modeling the extraction as a node classification task. Although effective, these models often require external tools or additional machine learning models to define graph components, producing highly complex and less intuitive structures. We present GraphLSS, a heterogeneous graph for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. It defines two levels of information (words and sentences) and four types of edges (sentence semantic similarity, sentence occurrence order, word in sentence, and word semantic similarity) without requiring auxiliary learning models. Experiments on two benchmark datasets show that GraphLSS is competitive with top-performing graph-based methods, outperforming recent non-graph models. We release our code on \url{<anonymized>}.

Paper Type: Short

Research Area: Machine Learning for NLP

Research Area Keywords: extractive summarisation, graph-based methods, document representation

Contribution Types: NLP engineering experiment

Languages Studied: english

Submission Number: 2770

Loading