Abstract: This study presents an approach to detect Arabic dialects from opinion videos. It conducts a four-way multidialect classification task at utterance level from spoken language (audio), written language (text) and their combination. It also analyzes the effect of speaker on detecting dialects through presenting speaker-dependent and speaker-independent approaches. Word embedding based features are used to represent text modality whereas a combination of time-domain and frequency- domain acoustic features are used to represent audio modality. In case of speaker-independent, textual modality achieves significantly better results than audio modality while combining both modalities results in improving the results yielding F1 score of 63.61%. For speaker-dependent approach, similar performance is achieved for individual audio and text modalities while the best performance is achieved when combining both modalities with F1 score of 85.52%.
Loading