Argument Summarization and its Evaluation in the Era of Large Language Models

ACL ARR 2025 February Submission6357 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have revolutionized various Natural Language Generation (NLG) tasks, including Argument Summarization (ArgSum), a key subfield of Argument Mining (AM). This paper investigates the integration of state-of-the-art LLMs into ArgSum, addressing the challenges of traditional evaluation metrics, which do not align well with human judgment. We propose a novel prompt-based evaluation scheme, and validate it through a novel human benchmark dataset. Our work makes three key contributions: the integration of LLMs into existing ArgSum frameworks, the development of a new ArgSum system benchmarked against prior methods, and the introduction of an advanced LLM-based evaluation scheme. We demonstrate that the use of LLMs substantially improves both the generation and evaluation of argument summaries, achieving state-of-the-art results and advancing the field of ArgSum.
Paper Type: Long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Research Area Keywords: argument mining, argument generation, applications
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 6357
Loading