• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 24
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 45
  • 24
  • 23
  • 20
  • 20
  • 19
  • 19
  • 16
  • 15
  • 9
  • 8
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Automatic Speech Quality Assessment in Unified Communication : A Case Study / Automatisk utvärdering av samtalskvalitet inom integrerad kommunikation : en fallstudie

Larsson Alm, Kevin January 2019 (has links)
Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world.
42

Predicting the intramuscular fat content in porcine M. longissimus via ultrasound spectral analysis with consideration of structural and compositional traits / Schätzung des intramuskulären Fettgehaltes im M. longissimus des Schweines mittels Ultraschallspektralanalyse unter besonderer Berücksichtigung struktureller und kompositioneller Merkmale

Koch, Tim 17 February 2011 (has links)
No description available.
43

A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model

Bekli, Zeid, Ouda, William January 2018 (has links)
Voice recognition has become a more focused and researched field in the last century,and new techniques to identify speech has been introduced. A part of voice recognition isspeaker verification which is divided into Front-end and Back-end. The first componentis the front-end or feature extraction where techniques such as Mel-Frequency CepstrumCoefficients (MFCC) is used to extract the speaker specific features of a speech signal,MFCC is mostly used because it is based on the known variations of the humans ear’scritical frequency bandwidth. The second component is the back-end and handles thespeaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) andGaussian Mixture Model-Universal Background Model (GMM-UBM) methods forenrollment and verification of the specific speaker. In addition, normalization techniquessuch as Cepstral Means Subtraction (CMS) and feature warping is also used forrobustness against noise and distortion. In this paper, we are going to build a speakerverification system and experiment with a variance in the amount of training data for thetrue speaker model, and to evaluate the system performance. And further investigate thearea of security in a speaker verification system then two methods are compared (GMMand GMM-UBM) to experiment on which is more secure depending on the amount oftraining data available.This research will therefore give a contribution to how much data is really necessary fora secure system where the False Positive is as close to zero as possible, how will theamount of training data affect the False Negative (FN), and how does this differ betweenGMM and GMM-UBM.The result shows that an increase in speaker specific training data will increase theperformance of the system. However, too much training data has been proven to beunnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models.
44

Objective assessment of disordered connected speech / Evaluation objective des troubles de la voix dans la parole connectée

Alpan, Ali 07 February 2012 (has links)
Within the context of the assessment of laryngeal function, acoustic analysis has an important place because the speech signal may be recorded non-invasively and it forms the base on which the perceptual assessment of voice is founded. Given the limitations of perceptual ratings, one has investigated vocal cues of disordered voices that are clinically relevant, summarize properties of speech signals and report on a speaker's phonation in general and voice in particular. Ideally, the acoustic descriptors should also be correlates of auditory-perceptual ratings of voice. Generally speaking, the goal of acoustic analysis is to document quantitatively the degree of severity of a voice disorder and monitor the evolution of the voice of dysphonic speakers.<p><p><p>The first part of this thesis is devoted to the analysis of disordered connected speech. The aim is to investigate vocal cues that are clinically relevant and correlated with auditory-perceptual ratings. Two approaches are investigated. The variogram-based method in the temporal domain is addressed first. The second approach is in the cepstral domain. In particular, the first rahmonic amplitude is used as an acoustic cue to describe voice quality. A multi-dimensional approach combining temporal and spectral aspects is also investigated. The goal is to check whether acoustic cues in both domains report complementary information when predicting perceptual scores.<p><p><p>Both methods are tested first on a corpus of synthetic sound stimuli that has been obtained by means of a synthesizer of disordered voices. The purpose is to learn about the link between the signal properties (fixed by the synthesis parameters) and acoustic cues.<p>In this study, we had the opportunity to use two large natural speech corpora. One of them has been perceptually rated. <p><p><p>The final part of the text is devoted to the automatic classification of voice with regard to perceived voice quality. Many studies have proposed a binary (normal/pathological) classification of voice samples. An automatic categorization according to perceived degrees of hoarseness appears, however, to be more attractive to both clinicians and technologists and more likely to be clinically relevant. Indeed, one way to reduce inter-rater variability of an auditory-perceptual evaluation is to ask several experts to participate and then to average the perceptual scores. However, auditory-perceptual evaluation of a corpus by several judges is a very laborious, time-consuming and costly task. Making this perceptual evaluation task automatic is therefore desirable. <p>The aim of this study is to exploit the support vector machine classifier that has become, over the last years, a popular tool for classification, to carry out categorization of voices according to perceived degrees of hoarseness. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
45

Channel Modeling Applied to Robust Automatic Speech Recognition

Sklar, Alexander Gabriel 01 January 2007 (has links)
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.

Page generated in 0.0416 seconds