Outcome-Constrained Large Language Models for Countering Hate Speech

Outcome-Constrained Large Language Models for Countering Hate Speech

ACL ARR 2024 June Submission1720 Authors

14 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Counter speech (CS) that challenges or counteracts harmful or discriminatory messages is an effective way to diminish the influence of hate speech (HS). Automatic CS generation methods have been developed to assist efforts in combating online HS. Existing research focuses on generating CS with linguistic attributes, such as being polite, informative, and intent-driven. However, the real impact of CS in online environments is seldom considered. This study aims to develop methods for generating CS constrained by conversation outcomes and evaluate their effectiveness. We experiment with large language models (LLMs) to incorporate into the text generation process two desired conversation outcomes: low conversation incivility and non-hateful hater reentry. Specifically, we experiment with instruction prompts, LLM finetuning, and LLM reinforcement learning. Evaluation results show that our methods effectively steer the generation of conversational systems towards desired outcomes. Our analyses, however, show that there are differences in the quality and style of the generated CS.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: sociolinguistics, task-oriented, text-to-text generation, applications, hate speech

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 1720

Loading