An Enhanced Multimodal Negative Feedback Detection Framework with Target Retrieval in Thai Spoken Audio

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ICME Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This research addresses the challenge of effectively identifying negative feedback in spoken audio within the context of voluminous and complex user-generated content. The study introduces an integrated audio analytics framework de-signed to enhance processing speed and accuracy. The frame-work combines Query-by-Example Spoken Term Detection (QbE-STD), Speaker Diarization (SD), and Automatic Speech Recognition (ASR) with text-based feedback (sentiment, toxicity and sarcasm detection). By employing QbE-STD, the system facilitates targeted retrieval of specific terms, thus optimizing processing duration. Additionally, the application of transfer learning techniques to under-resourced languages, such as Thai, demonstrates significant improvements in the accuracy of both ASR and text-based feedback analysis. This research paves the way for future studies in large-scale analysis of audio-based negative feedback. It also highlights the potential for deploying efficient audio analytics in various fields, including content moderation and decision support systems.
Loading