• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 18
  • 9
  • 4
  • 3
  • 1
  • 1
  • Tagged with
  • 40
  • 40
  • 38
  • 38
  • 26
  • 24
  • 23
  • 22
  • 20
  • 19
  • 13
  • 12
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ / DEVELOPMENT OF ALGORITHMS FOR GUNSHOT DETECTION

Hrabina, Martin January 2019 (has links)
Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
32

SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION

Jingzhao Dai (6642491) 11 June 2019 (has links)
<p>Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].</p> <p>In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.</p> <p>Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.</p>
33

Channel Compensation for Speaker Recognition Systems

Neville, Katrina Lee, katrina.neville@rmit.edu.au January 2007 (has links)
This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: • Convolutional (or lowpass filtering) effects • Addition of white Gaussian noise Three different methods of channel compensation were tested: • Cepstral Mean Subtraction (CMS) • RelAtive SpecTrAl (RASTA) Processing • Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB).
34

Semantic Classification And Retrieval System For Environmental Sounds

Okuyucu, Cigdem 01 October 2012 (has links) (PDF)
The growth of multimedia content in recent years motivated the research on audio classification and content retrieval area. In this thesis, a general environmental audio classification and retrieval approach is proposed in which higher level semantic classes (outdoor, nature, meeting and violence) are obtained from lower level acoustic classes (emergency alarm, car horn, gun-shot, explosion, automobile, motorcycle, helicopter, wind, water, rain, applause, crowd and laughter). In order to classify an audio sample into acoustic classes, MPEG-7 audio features, Mel Frequency Cepstral Coefficients (MFCC) feature and Zero Crossing Rate (ZCR) feature are used with Hidden Markov Model (HMM) and Support Vector Machine (SVM) classifiers. Additionally, a new classification method is proposed using Genetic Algorithm (GA) for classification of semantic classes. Query by Example (QBE) and keyword-based query capabilities are implemented for content retrieval.
35

Online detekce jednoduchých příkazů v audiosignálu / Online detection of simple voice commands in audiosignal

Zezula, Miroslav January 2011 (has links)
This thesis describes the development of voice module, that can recognize simple speech commands by comparation of input sound with recorded templates. The first part of thesis contains a description of used algorithm and a verification of its functionality. The algorithm is based on Mel-frequency cepstral coefficients and dynamic time warping. Thereafter the hardware of voice module is designed, containing signal controller 56F805 from Freescale. The signal from microphone is conditioned by operational amplifiers and digital filter. The third part deals with the development of software for the controller and describes the fixed point implementation of the algorithm, respecting limited capabilities of the controller. Final test proves the usability of voice module in low-noise environment.
36

Biometric Multi-modal User Authentication System based on Ensemble Classifier

Assaad, Firas Souhail January 2014 (has links)
No description available.
37

Automatické rozpoznávání logopedických vad v řečovém projevu / Automatic Recognition of Logopaedic Defect in Speech Utterances

Dušil, Lubomír January 2009 (has links)
The thesis is aimed at an analysis and automatic detection of logopaedic defects in speech utterance. Its objective is to facilitate and accelerate the work of logopaedists and to increase percentage of detected logopaedic defects in children of the youngest possible age followed by the most successful treatment. It presents methods of speech work, classification of the defects within individual stages of child development and appropriate words for identification of the speech defects and their subsequent remedy. After that there are analyses of methods of calculating coefficients which reflect human speech best. Also classifiers which are used to discern and determine whether it is a speech defect or not. Classifiers exploit coefficients for their work. Coefficients and classifiers are being tested and their best combination is being looked for in order to achieve the highest possible success rate of the automatic detection of the speech defects. All the programming and testing jobs has been conducted in the Matlab programme.
38

Automatic Speech Quality Assessment in Unified Communication : A Case Study / Automatisk utvärdering av samtalskvalitet inom integrerad kommunikation : en fallstudie

Larsson Alm, Kevin January 2019 (has links)
Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world.
39

Automatická klasifikace výslovnosti hlásky R / Automatic classification of pronunciation of the letter „R“

Hrušovský, Enrik January 2018 (has links)
This diploma thesis deals with automatic clasification of vowel R. Purpose of this thesis is to made program for detection of pronounciation of speech defects at vowel R in children. In thesis are processed parts as speech creation, speech therapy, dyslalia and subsequently speech signal processing and analysis methods. In the last part is designed software for automatic detection of pronounciation of vowel R. For recognition of pronounciation is used algorithm MFCC for extracting features. This features are subsequently classified by neural network to the group of correct or incorrect pronounciation and is evaluated classification success.
40

Detekce logopedických vad v řeči / Detection of Logopaedic Defects in Speech

Pešek, Milan January 2009 (has links)
The thesis deals with a design and an implementation of software for a detection of logopaedia defects of speech. Due to the need of early logopaedia defects detecting, this software is aimed at a child’s age speaker. The introductory part describes the theory of speech realization, simulation of speech realization for numerical processing, phonetics, logopaedia and basic logopaedia defects of speech. There are also described used methods for feature extraction, for segmentation of words to speech sounds and for features classification into either correct or incorrect pronunciation class. In the next part of the thesis there are results of testing of selected methods presented. For logopaedia speech defects recognition algorithms are used in order to extract the features MFCC and PLP. The segmentation of words to speech sounds is performed on the base of Differential Function method. The extracted features of a sound are classified into either a correct or an incorrect pronunciation class with one of tested methods of pattern recognition. To classify the features, the k-NN, SVN, ANN, and GMM methods are tested.

Page generated in 0.0918 seconds