An Enhanced Multimodal Negative Feedback Detection Framework with Target Retrieval in Thai Spoken Audio

Pantid Chantangphol; Sattaya Singkul; Thanawat Lodkaew; Nattasit Maharattamalai; Atthakorn Petchsod; Theerat Sakdejayont; Tawunrat Chalothorn

An Enhanced Multimodal Negative Feedback Detection Framework with Target Retrieval in Thai Spoken Audio

Pantid Chantangphol, Sattaya Singkul, Thanawat Lodkaew, Nattasit Maharattamalai, Atthakorn Petchsod, Theerat Sakdejayont, Tawunrat Chalothorn

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ICME Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This research addresses the challenge of effectively identifying negative feedback in spoken audio within the context of voluminous and complex user-generated content. The study introduces an integrated audio analytics framework de-signed to enhance processing speed and accuracy. The frame-work combines Query-by-Example Spoken Term Detection (QbE-STD), Speaker Diarization (SD), and Automatic Speech Recognition (ASR) with text-based feedback (sentiment, toxicity and sarcasm detection). By employing QbE-STD, the system facilitates targeted retrieval of specific terms, thus optimizing processing duration. Additionally, the application of transfer learning techniques to under-resourced languages, such as Thai, demonstrates significant improvements in the accuracy of both ASR and text-based feedback analysis. This research paves the way for future studies in large-scale analysis of audio-based negative feedback. It also highlights the potential for deploying efficient audio analytics in various fields, including content moderation and decision support systems.

Loading