FastLexRank: Efficient Lexical Ranking for Structuring Social Media Posts

ACL ARR 2024 June Submission1865 Authors

15 Jun 2024 (modified: 16 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this paper, we present FastLexRank, a computationally efficient adaptation of the LexRank algorithm, which is an unsupervised approach to ranking texts based on graph-based centrality scoring of sentences, which we have tailored to be efficient text ranking. Addressing the computational and memory complexities of the original LexRank, FastLexRank employs a new algorithm to approximate the stationary distribution of sentence graphs, thereby enhancing efficiency while maintaining the quality of summarization. The correlation of FastLexRank's centrality scores with the original LexRank scores approaches a perfect match, and the Kendall rank correlation between ranked sequences produced by the original and the new approximation approach also reaches this high level of agreement. The paper details these algorithmic modifications and their transformative effect on the size of the data sets that can be processed, e.g., large social media corpora. Empirical results confirm FastLexRank's ability to effectively generate centrality scores for sentences in large social media corpora, underscoring its suitability for real-time analysis in various applications. We further suggest that FastLexRank can act as a ranker to identify the most central tweet, which can then be integrated with more advanced NLP technologies, such as Large Language Models, for enhanced analysis. This research contributes to Natural Language Processing by offering a scalable solution for text centrality calculation, critical for managing the ever-increasing volume of digital content.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP, NLP Applications, Computational Social Science and Cultural Analytics, Summarization
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 1865
Loading