Language Recognition Using Triplet Neural NetworksDownload PDFOpen Website

2019 (modified: 06 Nov 2022)INTERSPEECH 2019Readers: Everyone
Abstract: In this paper, we propose a novel neural network back-end approach based on triplets for the language recognition task, due to its success application in the related field of text-dependent speaker verification. A triplet is a training example constructed of three audio samples; two from the same class and one from a different class. In presenting two pairs of samples to the network, the triplet neural network learns to discriminate between samples from the same languages and pairs of different languages. Triplet-based training optimizes the Area Under the Curve (AUC) in contrast to other triplet loss functions proposed in the literature. The optimization of the AUC as cost function is appropriate for a detection task as it directly correlates with end-use performance of the system. Moreover, we show the importance of defining an appropriate method of triplet selection and how this impacts performance of the system. When benchmarked on the LRE09 database, the new triplet backend demonstrated superior performance compared to traditional back-ends used for language recognition. In addition, we performed an evaluation on the LRE15 and LRE17 databases to check the generalization power of the proposed systems.
0 Replies

Loading