Toxic Speech and Speech Emotions: Investigations of Audio-based Modeling and Intercorrelations

Published: 01 Jan 2022, Last Modified: 13 Nov 2024EUSIPCO 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Content moderation (CM) systems have become essential following the monumental increase in multimodal and online social platforms; and while increasingly published work focuses on text-based solutions, there is still limited work on audio-based methods. In this study we aim to explore relation-ships between speech emotions and toxic speech, as part of a CM scenario. We first investigate an appropriate framework for combining speech emotion recognition (SER) and audio-based CM models. We then investigate which emotional aspects (i.e., attribute, sentiment, or attitude) could contribute the most in facilitating audio-based CM recognition platforms. Our experi-mental results indicate that conventional shared feature encoder approaches may fail to capture additional discriminative features for boosting audio-based CM tasks while utilizing SER learning. We further investigate performance trade-offs of late-fusion frameworks for combining SER and CM information. We argue that these observations could be attributed to an emotionally-biased distribution in the CM scenario, concluding that SER could in deed play a role in content moderation frameworks, given added application-specific emotional information.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview