Argument Summarization and its Evaluation in the Era of Large Language Models

Argument Summarization and its Evaluation in the Era of Large Language Models

ACL ARR 2025 February Submission6357 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have revolutionized various Natural Language Generation (NLG) tasks, including Argument Summarization (ArgSum), a key subfield of Argument Mining (AM). This paper investigates the integration of state-of-the-art LLMs into ArgSum, addressing the challenges of traditional evaluation metrics, which do not align well with human judgment. We propose a novel prompt-based evaluation scheme, and validate it through a novel human benchmark dataset. Our work makes three key contributions: the integration of LLMs into existing ArgSum frameworks, the development of a new ArgSum system benchmarked against prior methods, and the introduction of an advanced LLM-based evaluation scheme. We demonstrate that the use of LLMs substantially improves both the generation and evaluation of argument summaries, achieving state-of-the-art results and advancing the field of ArgSum.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: argument mining, argument generation, applications

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 6357

Loading