A method for Automatic Text Summarization based on Rhetorical Analysis and Topic Modeling

Published: 01 Jan 2020, Last Modified: 12 Aug 2024Int. J. Comput. 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview