Speaker identification with deep neural networks

Alfredo Méndez; Juan Pablo Rodríguez

Speaker identification with deep neural networks

Alfredo Méndez, Juan Pablo Rodríguez

20 Jul 2019 (modified: 05 May 2023)RIIAA 2019 Conference SubmissionReaders: Everyone

TL;DR: Text independent speaker identification through deep convolutional neural networks achieving a 93% accuracy

Keywords: speaker identification, neural networks, deep neural networks, convolutional neural networks, text independent

Abstract: There is a great research effort in looking for medical solutions for Alzheimer’s disease, while significantly less in creating solutions for care-giving post diagnosis. The motivation of this project is to provide patients with a tool to recognize familiar people in common situations. The solution implements a text independent speaker recognition system using deep learning. This paper compares the performance of a convolutional neural network (CNN) against a fully connected neural network to address the speaker identification problem. The CNN includes 1-dimension convolutional layers, max pooling layers, batch normalization, regularization techniques and a SoftMax output. The models are trained and tested in the freely available VCTK Corpus Data Set for 109 speakers. Our results show that the CNN surpasses the fully connected network with an accuracy of 93.05% compared to 75.88%.

0 Replies

Loading