TL;DR: Text independent speaker identification through deep convolutional neural networks achieving a 93% accuracy
Keywords: speaker identification, neural networks, deep neural networks, convolutional neural networks, text independent
Abstract: There is a great research effort in looking for medical solutions for Alzheimer’s
disease, while significantly less in creating solutions for care-giving post diagnosis.
The motivation of this project is to provide patients with a tool to recognize familiar
people in common situations. The solution implements a text independent speaker
recognition system using deep learning.
This paper compares the performance of a convolutional neural network (CNN)
against a fully connected neural network to address the speaker identification
problem. The CNN includes 1-dimension convolutional layers, max pooling layers,
batch normalization, regularization techniques and a SoftMax output. The models
are trained and tested in the freely available VCTK Corpus Data Set for 109
speakers. Our results show that the CNN surpasses the fully connected network
with an accuracy of 93.05% compared to 75.88%.
0 Replies
Loading