Return to search

Language Classification Using Neural Networks

In this project a model has been created that with an audio sequence as input can classify the language being spoken to be either english or french. The focus of the project has been to experiment with different ways to process audio files and to design a neural network in order to maximize the performance for the task of language classification. The purpose of the project was to investigate the highest reachable accuracy and to examine what sample length that would be appropriate in order to be useful in voice control application. The signal processing part dealt mainly with how enveloping, Mel frequency ceptral coefficients (MFCC) and Mel frequency spectral coefficients (MFSC) could be used to enhance the accuracy of the model. The neural network design focused on how the width and depth of the network and the use of dropouts could be used to increase the performance. The experiments resulted in a model with a maximum accuracy of 92,30 % that could outperform humans for samples of approximately 1,2 seconds of shorter. A suitable sample length to be usable in other applications was concluded to be in the interval of 0,7 to 1,5 seconds.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-385537
Date January 2019
CreatorsLindgren, Andreas, Lind, Gustav
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTVE-F ; 19016

Page generated in 0.0023 seconds