Speaker identification with deep neural networks

Jul 20, 2019 Submission readers: everyone
  • TL;DR: Text independent speaker identification through deep convolutional neural networks achieving a 93% accuracy
  • Keywords: speaker identification, neural networks, deep neural networks, convolutional neural networks, text independent
  • Abstract: There is a great research effort in looking for medical solutions for Alzheimer’s disease, while significantly less in creating solutions for care-giving post diagnosis. The motivation of this project is to provide patients with a tool to recognize familiar people in common situations. The solution implements a text independent speaker recognition system using deep learning. This paper compares the performance of a convolutional neural network (CNN) against a fully connected neural network to address the speaker identification problem. The CNN includes 1-dimension convolutional layers, max pooling layers, batch normalization, regularization techniques and a SoftMax output. The models are trained and tested in the freely available VCTK Corpus Data Set for 109 speakers. Our results show that the CNN surpasses the fully connected network with an accuracy of 93.05% compared to 75.88%.
0 Replies