Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style

Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style

ACL ARR 2024 June Submission3966 Authors

16 Jun 2024 (modified: 07 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The ability of large language models (LLMs) to generate evidence-based and stylistic counter-arguments is crucial for enhancing online discussions. However, there is a research gap in evaluating these models' practical effectiveness in real-world applications. Previous studies often overlook the balance between evidentiality and stylistic elements necessary for persuasive arguments. We created and audited Counterfire, a new dataset of 32,000 counter-arguments generated by non- and finetuned-LLMs with varying prompts for evidence use and argumentative style. We audited models like GPT-3.5, PaLM 2, and Koala, evaluating their rhetorical quality and persuasive abilities. Our findings showed that while GPT-3.5 Turbo excelled in argument quality and style adherence, it still fell short of human standards, emphasizing the need for further refinement in LLM outputs.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: Quantitative analyses of news and/or social media

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 3966

Loading