Longformer-Enhanced Hierarchical Attention for High-Quality Extractive Document Summarization

Twinkle Joshi

Published: 05 Dec 2025, Last Modified: 15 Apr 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Over the past few years, document summarization has become an important tool for efficiently processing the large amounts of textual information available in the modern era, enabling quick comprehension of lengthy documents. However, traditional Bidirectional Encoder Representations from Transformers (BERT) + Bidirectional Gated Recurrent Unit (BiGRU) with attention architecture rely on vanilla BERT with a fixed 512 -token limit, which leads to context fragmentation, and its flat attention mechanism lacks explicit hierarchical modelling of word and sentence importance. To address these limitations, a longformer-based Hierarchical Attention Network (HAN) (LHAN) framework is proposed. Initially, the data is sourced from the arXiv and PubMed datasets, followed by preprocessing using sentence segmentation, tokenization with sentence span mapping, and greedy ROUGE-1-based label generation. Then, Longformer is used to encode full-length documents by preserving the global context, whereas word-level attention is applied using HAN to capture key terms within sentences. Furthermore, the importance of sentences is ranked using sentence-level attention, followed by a binary classification layer to select the top-ranked sentences for the final summary. The experimental results demonstrated consistent improvements over the traditional method by attaining ROUGE-1 of 45.35, ROUGE-2 of 19.48, and ROUGE-L of 41.82, demonstrating enhanced summarization quality.