Abstract: As the rate of web data creation and storage continues to rise, so does the need for effective multi-document summarization techniques. Computer-generated summaries that can effectively reflect the original data in a readable format saves the need for manual human labor. In the field of online product reviews specifically, a single product can have hundreds to thousands of reviews, and an average shopper is often to only read a handful of them. With effective summarization tools, these shoppers could be given a single generated summary that condenses the content of the review set into a single paragraph or two. Many efforts have been made to fulfill this need for multi-document summarization; however, most require complex data graphs, structures, or language models. Our research stresses simplicity in the extractive algorithm, making the process easier to understand and implement. In this paper, we propose four different versions of multi-document extractive summarizers based on KL-Divergence, TF-IDF, and Diversity scoring. These extractive summarizers are then measured against each other, along with several top-notch summarizers, in terms of effectiveness at expressing relevant content and linguistic quality. The results of these tests show a significant advantage for our summarizers, thus promoting our summarizers as a powerful-yet-simple process ready for use in product review summarization.
External IDs:dblp:conf/webi/BenhamGN24
Loading