DEBAGREEMENT: A comment-reply dataset for (dis)agreement detection in online debates

John Pougué-Biyong; Valentina Semenova; Alexandre Matton; Rachel Han; Aerin Kim; Renaud Lambiotte; Doyne Farmer

DEBAGREEMENT: A comment-reply dataset for (dis)agreement detection in online debates

John Pougué-Biyong, Valentina Semenova, Alexandre Matton, Rachel Han, Aerin Kim, Renaud Lambiotte, Doyne Farmer

Published: 11 Oct 2021, Last Modified: 23 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: stance detection, (dis)agreement detection, pre-trained language models, graph representation learning

TL;DR: This paper presents a comment-reply dataset collected from Reddit which unveils opportunities to combine pre-trained language models and graph representation learning methods for (dis)agreement detection.

Abstract: In this paper, we introduce DEBAGREEMENT, a dataset of 42,894 comment-reply pairs from the popular discussion website Reddit, annotated with agree, neutral or disagree labels. We collect data from five forums on Reddit: r/BlackLivesMatter, r/Brexit, r/climate, r/democrats, r/Republican. For each forum, we select comment pairs such that they form altogether a user interaction graph. DEBAGREEMENT presents a challenge for Natural Language Processing (NLP) systems, as it contains slang, sarcasm and topic-specific jokes, often present in online exchanges. We evaluate the performance of state-of-the-art language models on a (dis)agreement detection task, and investigate the use of contextual information available (graph, authorship, and temporal information). Since recent research has shown that context, such as social context or knowledge graph information, enables language models to better perform on downstream NLP tasks, DEBAGREEMENT provides novel opportunities for combining graph-based and text-based machine learning techniques to detect (dis)agreements online.

Supplementary Material: pdf

URL: https://scale.com/open-datasets/oxford

Contribution Process Agreement: Yes

Dataset Url: https://scale.com/open-datasets/oxford

License: Creative Commons Attribution 4.0 International Public License (“CC BY 4.0”)

Author Statement: Yes

5 Replies

Loading