HateModerate: Grounding and Benchmarking Hate Speech Detection with Content Policies

Anonymous

HateModerate: Grounding and Benchmarking Hate Speech Detection with Content Policies

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: Social media platforms greatly facilitate user communications, but they also open the doors to unwanted contents such as hateful speech, misinformation, and pornography. To protect users from a massive scale of hateful contents, existing work investigate machine learning solutions for training automated hate speech moderators. Nevertheless, we identify that one gap is that few existing hate speech datasets are associated with a list of moderation rules. Without clarifying the moderation criteria, the trained moderator may behave differently from user's expectation. This work seeks to bridge this gap by creating a hate speech dataset matching a list of moderation rules. Using crowdsourcing, we search and collect a dataset named HateModerate grounded by Facebook's community standards guidelines for hate speech. We evaluate the performance of state-of-the-art hate speech detectors against HateModerate, revealing substantial discrepancies these models have with content policies. By fine-tuning one model with HateModerate, we observe that fine-tuning can effectively improve the models' conformity to policies. Our results highlight the necessity of developing rule-based datasets for hate speech detection. Our datasets and code can be found on: https://sites.google.com/view/content-moderation-project

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading