A crowdsourcing approach to evaluate the quality of query-based extractive text summaries

Neslihan Iskender, Aleksandra Gabryszak, Tim Polzehl, Leonhard Hennig

13 Feb 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: High cost and time consumption are concurrent barriers for research and application of automated summarization. In order to explore options to overcome this barrier, we analyze the feasibility and appropriateness of micro-task crowdsourcing for evaluation of different summary quality characteristics and report an ongoing work on the crowdsourced evaluation of query-based extractive text summaries. To do so, we assess and evaluate a number of linguistic quality factors such as grammaticality, non-redundancy, referential clarity, focus and structure & coherence. Our first results imply that referential clarity, focus and structure & coherence are the main factors effecting the perceived summary quality by crowdworkers. Further, we compare these results using an initial set of expert annotations that is currently being collected, as well as an initial set of automatic quality score ROUGE for summary evaluation. Preliminary results show that ROUGE does not correlate with linguistic quality factors, regardless if assessed by crowd or experts. Further, crowd and expert ratings show highest degree of correlation when assessing low quality summaries. Assessments increasingly divert when attributing high quality judgments.

0 Replies