A Speaker Recognition System Based on Deep Learning

Haowei  Li

doi:10.26689/jera.v3i6.1056

Download PDF

DOI

10.26689/jera.v3i6.1056

Published : 2020-12-31

Abstract

This paper lies in the field of digital signal processing. This is a speech recognition system that identifies the different speakers based on deep learning. The invention consists of the following steps: Firstly, we collect the voice data from different people. Secondly, the data having been selected is preprocessed by extracting their Mel Frequency Cepstral Coefficients (MFCC) and is divided into training set and test set randomly. Thirdly, we cut the training set into batches, and put them into the convolutional neural network which consists of convolutional layers, max pooling layers and fully connected layers. After repeatedly adjusting the parameters of the network such as learning rate, dropout rate and decay rate, the model will reach the optimal performance. Finally, the testing set is also cut into batches and put into the trained neural network. The final recognition accuracy rate is 70.23%. In brief, the research can automatically recognize different speakers efficiently.