Global ETD Search

Return to search

Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet

Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS).

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-187196

voice activity detection

VAD

deep learning

machine learning

artificial intelligence

convolutional neural network

CNN

deep neural network

DNN

sound event detection

SED

mel spectrogram

audio processing

polyphonic sound detection score

PSDS

signal processing

signal to noise ratio

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-187196
Date	January 2022
Creators	Joborn, Ludvig, Beming, Mattias
Publisher	Linköpings universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet

Description

Links & Downloads

Tags

Additional Fields