• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 1
  • 1
  • 1
  • Tagged with
  • 10
  • 10
  • 10
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatic Speech Recognition Using Finite Inductive Sequences

Cherri, Mona Youssef, 1956- 08 1900 (has links)
This dissertation addresses the general problem of recognition of acoustic signals which may be derived from speech, sonar, or acoustic phenomena. The specific problem of recognizing speech is the main focus of this research. The intention is to design a recognition system for a definite number of discrete words. For this purpose specifically, eight isolated words from the T1MIT database are selected. Four medium length words "greasy," "dark," "wash," and "water" are used. In addition, four short words are considered "she," "had," "in," and "all." The recognition system addresses the following issues: filtering or preprocessing, training, and decision-making. The preprocessing phase uses linear predictive coding of order 12. Following the filtering process, a vector quantization method is used to further reduce the input data and generate a finite inductive sequence of symbols representative of each input signal. The sequences generated by the vector quantization process of the same word are factored, and a single ruling or reference template is generated and stored in a codebook. This system introduces a new modeling technique which relies heavily on the basic concept that all finite sequences are finitely inductive. This technique is used in the training stage. In order to accommodate the variabilities in speech, the training is performed casualty, and a large number of training speakers is used from eight different dialect regions. Hence, a speaker independent recognition system is realized. The matching process compares the incoming speech with each of the templates stored, and a closeness ration is computed. A ratio table is generated anH the matching word that corresponds to the smallest ratio (i.e. indicating that the ruling has removed most of the symbols) is selected. Promising results were obtained for isolated words, and the recognition rates ranged between 50% and 100%.
2

Novel Pitch Detection Algorithm With Application to Speech Coding

Kura, Vijay 19 December 2003 (has links)
This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions.
3

Formant narrowing using linear predictive coding to improve phonetic perception in children

Tarr, Eric William 10 January 2011 (has links)
No description available.
4

Anomaly Detection in Diagnostics Data with Natural Fluctuations / Anomalidetektering i diagnostikdata med naturliga variationer

Sundberg, Jesper January 2015 (has links)
In this thesis, the red hot topic anomaly detection is studied, which is a subtopic in machine learning. The company, Procera Networks, supports several broadband companies with IT-solutions and would like to detected errors in these systems automatically. This thesis investigates and devises methods and algorithms for detecting interesting events in diagnostics data. Events of interest include: short-term deviations (a deviating point), long-term deviations (a distinct trend) and other unexpected deviations. Three models are analyzed, namely Linear Predictive Coding, Sparse Linear Prediction and Wavelet Transformation. The final outcome is determined by the gap to certain thresholds. These thresholds are customized to fit the model as well as possible. / I den här rapporten kommer det glödheta området anomalidetektering studeras, vilket tillhör ämnet Machine Learning. Företaget där arbetet utfördes på heter Procera Networks och jobbar med IT-lösningar inom bredband till andra företag. Procera önskar att kunna upptäcka fel hos kunderna i dessa system automatiskt. I det här projektet kommer olika metoder för att hitta intressanta företeelser i datatraffiken att genomföras och forskas kring. De mest intressanta företeelserna är framfärallt snabba avvikelser (avvikande punkt) och färändringar äver tid (trender) men också andra oväntade mänster. Tre modeller har analyserats, nämligen Linear Predictive Coding, Sparse Linear Prediction och Wavelet Transform. Det slutgiltiga resultatet från modellerna är grundat på en speciell träskel som är skapad fär att ge ett så bra resultat som mäjligt till den undersäkta modellen..
5

A Feature Design of Multi-Language Identification System

Lin, Jun-Ching 17 July 2003 (has links)
A multi-language identification system of 10 languages: Mandarin, Japanese, Korean, Tamil, Vietnamese, English, French, German, Spanish and Farsi, is built in this thesis. The system utilizes cepstrum coefficients, delta cepstrum coefficients and linear predictive coding coefficients to extract the language features, and incorporates Gaussian mixture model and N-gram model to make the language classification. The feasibility of the system is demonstrated in this thesis.
6

Dialects, Sex-specificity, and Individual Recognition in the Vocal Repertoire of the Puerto Rican Parrot (Amazona vittata)

Roberts, Briony Z. Jr. 23 December 1997 (has links)
The following study is part of a larger study examining techniques that might be of use in the release program of the Puerto Rican Parrot (Amazona vittata), including marking, capturing, and radio-tracking. The portion of the study reported here documents the vocal behavior of A. vittata during the reproductive season and examines the possibility of using vocalizations to identify individuals, determine the sex of individuals and determine the location of an individual's breeding territory. Objectives of this study included: 1) cataloguing and categorizing the vocal repertoire of A. vittata, 2) determining whether the vocal repertoire was sex-specific and region-specific and 3) determining if an individual's vocal repertoire could be used to identify it. The vocal repertoire was characterized using a hierarchical method and 147 calls were described. The repertoire was found contain a high percentage (76 %) of graded calls. Evolutionary strategies that may explain the complexity of such a repertoire are discussed. The vocal repertoire was found to be both sex- and region-specific. Characteristics analyzed included time and frequency parameters of sonagrams. Three methods were used to determine the feasibility of vocal recognition of individuals. These methods included: bird-call pairing, sonagraphic analysis, and linear predictive coding. Sonagraphic analyses in combination with linear predictive coding techniques show the most promise as tools in voice recognition of the parrot, however, further research will be necessary to determine how reliable voice recognition may be as a method for identifying individuals in the field. / Master of Science
7

Κατασκευή μικροϋπολογιστικού συστήματος επεξεργασίας σημάτων ομιλίας για την εκτίμηση των μηχανισμών διαμόρφωσης του ήχου στη φωνητική κοιλότητα

Αγγελόπουλος, Ιωάννης 30 April 2014 (has links)
Στα πλαίσια της διπλωματικής εργασίας αναπτύχθηκε μία εφαρμογή, η οποία προσδιορίζει τις τρεις πρώτες συχνότητες συντονισμού της φωνητικής κοιλότητας κατά τη διαδικασία της φώνησης φωνηέντων. Οι τρεις αυτές συχνότητες παρέχουν επαρκή πληροφορία για τον προσδιορισμό του φωνήεντου. Η φώνηση εξομοιώνεται με σήμα εισόδου το οποίο παρουσιάζει κορυφές σε αναμενόμενες περιοχές συχνοτήτων. Ο προσδιορισμός των συχνοτήτων συντονισμού στηρίζεται στη μέθοδο βραχύχρονης ανάλυσης Fourier. Η εφαρμογή αναπτύχθηκε σε περιβάλλον μVision της Keil, σε γλώσσα προγραμματισμού C, για τον μικροελεγκτή STM32F103RB της ST Microelectronics. / In the context of this thesis an application was developed, that is capable of estimating the first three formant frequencies (resonances of the vocal tract) in the event of voicing of vowels. These three frequencies provide us enough information to determine the vowel that is voiced. The human voice is being emulated by an input signal which has peaks in the anticipated frequency regions. The formant frequencies are being estimated based on the short-time Fourier analysis method. The application was developed in Keil μVision programming suite, in C programming language, for the STM32F103RB microcontroller by ST Microelectronics.
8

VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ / DEVELOPMENT OF ALGORITHMS FOR GUNSHOT DETECTION

Hrabina, Martin January 2019 (has links)
Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
9

Voice Activity Detection in the Tiger Platform

Thorell, Hampus January 2006 (has links)
<p>Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications.</p><p>A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today.</p><p>In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation.</p><p>Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.</p>
10

Voice Activity Detection in the Tiger Platform

Thorell, Hampus January 2006 (has links)
Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications. A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today. In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation. Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.

Page generated in 0.1027 seconds