• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 4
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 22
  • 22
  • 22
  • 8
  • 7
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Voice Activity Detection / Voice Activity Detection

Ent, Petr January 2009 (has links)
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
22

Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet

Joborn, Ludvig, Beming, Mattias January 2022 (has links)
Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS).

Page generated in 0.0307 seconds