Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments

Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments

ACL ARR 2025 February Submission7256 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Identifying arguments is a necessary prerequisite for various tasks in automated discourse analysis, particularly within contexts such as political debates, online discussions, and scientific reasoning. Alongside theoretical insights into the structural constitution of arguments, a significant amount of research has focused on the practical extraction of arguments, leading to the growth of publicly available datasets where the classic BERT-like transformers prevail and consistently attain highly competitive benchmark performance. Indeed, this has fostered the general assumption that argument mining is reliable and applicable in a variety of contexts. Our findings indicate that apparent progress often arises from data limitations and labeling rather than the inherent capabilities of these models. Experiments show that these transformers learn the specifics of datasets rather than the composition of arguments. They perform excellently on individual benchmarks, but have difficulty generalizing when tested on other datasets. Crucially, we demonstrate that task-specific pre-training for structurally embedding argument components can indeed improve generalization. At the same time, we stress the need for common methodologies that are able to unify different perspectives on how arguments are constituted in order to transform argument mining into a universally applicable research paradigm.

Paper Type: Long

Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining

Research Area Keywords: Argument Mining, Benchmarks, Transformers, Generalization, Limitations

Contribution Types: Model analysis & interpretability, Reproduction study, Publicly available software and/or pre-trained models, Data resources, Data analysis, Position papers, Surveys

Languages Studied: English

Submission Number: 7256

Loading