Abstract: Due to a drastic improvement in the quality of internet services worldwide, there is an explosion of content generation and consumption. This has led to an increasing diversity of audiences who want to consume media outside their linguistic familiarity/preference. Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis. Accurate and fine-grained spoken language detection is an essential first step for all the subsequent content. Current techniques in spoken language detection may lack in final language prediction accuracy, require large amounts of training data, or finally may deal with work with a small number of languages. In this work, we present a real-time language detection approach to detect spoken language from noisy and readily available data sources through a Capsule Network (CapsNet) architecture. Further, we show that the CapsNet can effectively detect whether data samples belong to none of the languages on which the model was trained. We compare our results with our baseline based on a combination of recurrent networks and attention mechanism.
0 Replies
Loading