• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 19
  • 7
  • 5
  • 4
  • 1
  • Tagged with
  • 61
  • 45
  • 44
  • 42
  • 34
  • 34
  • 26
  • 25
  • 22
  • 20
  • 13
  • 12
  • 11
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Sparsity Motivated Auditory Wavelet Representation and Blind Deconvolution

Adiga, Aniruddha January 2017 (has links) (PDF)
In many scenarios, events such as singularities and transients that carry important information about a signal undergo spreading during acquisition or transmission and it is important to localize the events. For example, edges in an image, point sources in a microscopy or astronomical image are blurred by the point-spread function (PSF) of the acquisition system, while in a speech signal, the epochs corresponding to glottal closure instants are shaped by the vocal tract response. Such events can be extracted with the help of techniques that promote sparsity, which enables separation of the smooth components from the transient ones. In this thesis, we consider development of such sparsity promoting techniques. The contributions of the thesis are three-fold: (i) an auditory-motivated continuous wavelet design and representation, which helps identify singularities; (ii) a sparsity-driven deconvolution technique; and (iii) a sparsity-driven deconvolution technique for reconstruction of nite-rate-of-innovation (FRI) signals. We use the speech signal for illustrating the performance of the techniques in the first two parts and super-resolution microscopy (2-D) for the third part. In the rst part, we develop a continuous wavelet transform (CWT) starting from an auditory motivation. Wavelet analysis provides good time and frequency localization, which has made it a popular tool for time-frequency analysis of signals. The CWT is a multiresolution analysis tool that involves decomposition of a signal using a constant-Q wavelet filterbank, akin to the time-frequency analysis performed by basilar membrane in the peripheral human auditory system. This connection motivated us to develop wavelets that possess auditory localization capabilities. Gammatone functions are extensively used in the modeling of the basilar membrane, but the non-zero average of the functions poses a hurdle. We construct bona de wavelets from the Gammatone function called Gammatone wavelets and analyze their properties such as admissibility, time-bandwidth product, vanishing moments, etc.. Of particular interest is the vanishing moments property, which enables the wavelet to suppress smooth regions in a signal leading to sparsi cation. We show how this property of the Gammatone wavelets coupled with multiresolution analysis could be employed for singularity and transient detection. Using these wavelets, we also construct equivalent lterbank models and obtain cepstral feature vectors out of such a representation. We show that the Gammatone wavelet cepstral coefficients (GWCC) are effective for robust speech recognition compared with mel-frequency cepstral coefficients (MFCC). In the second part, we consider the problem of sparse blind deconvolution (SBD) starting from a signal obtained as the convolution of an unknown PSF and a sparse excitation. The BD problem is ill-posed and the goal is to employ sparsity to come up with an accurate solution. We formulate the SBD problem within a Bayesian framework. The estimation of lter and excitation involves optimization of a cost function that consists of an `2 data- fidelity term and an `p-norm (p 2 [0; 1]) regularizer, as the sparsity promoting prior. Since the `p-norm is not differentiable at the origin, we consider a smoothed version of the `p-norm as a proxy in the optimization. Apart from the regularizer being non-convex, the data term is also non-convex in the filter and excitation as they are both unknown. We optimize the non-convex cost using an alternating minimization strategy, and develop an alternating `p `2 projections algorithm (ALPA). We demonstrate convergence of the iterative algorithm and analyze in detail the role of the pseudo-inverse solution as an initialization for the ALPA and provide probabilistic bounds on its accuracy considering the presence of noise and the condition number of the linear system of equations. We also consider the case of bounded noise and derive tight tail bounds using the Hoe ding inequality. As an application, we consider the problem of blind deconvolution of speech signals. In the linear model for speech production, voiced speech is assumed to be the result of a quasi-periodic impulse train exciting a vocal-tract lter. The locations of the impulses or epochs indicate the glottal closure instants and the spacing between them the pitch. Hence, the excitation in the case of voiced speech is sparse and its deconvolution from the vocal-tract filter is posed as a SBD problem. We employ ALPA for SBD and show that excitation obtained is sparser than the excitations obtained using sparse linear prediction, smoothed `1=`2 sparse blind deconvolution algorithm, and majorization-minimization-based sparse deconvolution techniques. We also consider the problem of epoch estimation and show that epochs estimated by ALPA in both clean and noisy conditions are closer to the instants indicated by the electroglottograph when with to the estimates provided by the zero-frequency ltering technique, which is the state-of-the-art epoch estimation technique. In the third part, we consider the problem of deconvolution of a specific class of continuous-time signals called nite-rate-of-innovation (FRI) signals, which are not bandlimited, but specified by a nite number of parameters over an observation interval. The signal is assumed to be a linear combination of delayed versions of a prototypical pulse. The reconstruction problem is posed as a 2-D SBD problem. The kernel is assumed to have a known form but with unknown parameters. Given the sampled version of the FRI signal, the delays quantized to the nearest point on the sampling grid are rst estimated using proximal-operator-based alternating `p `2 algorithm (ALPAprox), and then super-resolved to obtain o -grid (O. G.) estimates using gradient-descent optimization. The overall technique is termed OG-ALPAprox. We show application of OG-ALPAprox to a particular modality of super-resolution microscopy (SRM), called stochastic optical reconstruction microscopy (STORM). The resolution of the traditional optical microscope is limited by di raction and is termed as Abbe's limit. The goal of SRM is to engineer the optical imaging system to resolve structures in specimens, such as proteins, whose dimensions are smaller than the di raction limit. The specimen to be imaged is tagged or labeled with light-emitting or uorescent chemical compounds called uorophores. These compounds speci cally bind to proteins and exhibit uorescence upon excitation. The uorophores are assumed to be point sources and the light emitted by them undergo spreading due to di raction. STORM employs a sequential approach, wherein each step only a few uorophores are randomly excited and the image is captured by a sensor array. The obtained image is di raction-limited, however, the separation between the uorophores allows for localizing the point sources with high precision. The localization is performed using Gaussian peak- tting. This process of random excitation coupled with localization is performed sequentially and subsequently consolidated to obtain a high-resolution image. We pose the localization as a SBD problem and employ OG-ALPAprox to estimate the locations. We also report comparisons with the de facto standard Gaussian peak- tting algorithm and show that the statistical performance is superior. Experimental results on real data show that the reconstruction quality is on par with the Gaussian peak- tting.
52

Porovnání metod pro identifikaci poruch valivých ložisek / Comparison of methods for identification of rolling bearing failures

Kokeš, Miroslav January 2021 (has links)
The aim of this master thesis is the comparison of selected methods and parameters for roller bearings diagnostics. Selected statistical parameters are kurtosis, crest factor, and parameter K(t). The other selected methods are envelope analysis, cepstral analysis, and ACEP method. These methods are implemented in LabVIEW software and compared based on noise resistance, computation speed, and overall capability of identifying roller bearing faults.
53

Online detekce jednoduchých příkazů v audiosignálu / Online detection of simple voice commands in audiosignal

Zezula, Miroslav January 2011 (has links)
This thesis describes the development of voice module, that can recognize simple speech commands by comparation of input sound with recorded templates. The first part of thesis contains a description of used algorithm and a verification of its functionality. The algorithm is based on Mel-frequency cepstral coefficients and dynamic time warping. Thereafter the hardware of voice module is designed, containing signal controller 56F805 from Freescale. The signal from microphone is conditioned by operational amplifiers and digital filter. The third part deals with the development of software for the controller and describes the fixed point implementation of the algorithm, respecting limited capabilities of the controller. Final test proves the usability of voice module in low-noise environment.
54

Automatická klasifikace digitálních modulací / Automatic Classification of Digital Modulations

Kubánková, Anna January 2008 (has links)
This dissertation thesis deals with a new method for digital modulation recognition. The history and present state of the topic is summarized in the introduction. Present methods together with their characteristic properties are described. The recognition by means of artificial neural is presented in more detail. After setting the objective of the dissertation thesis, the digital modulations that were chosen for recognition are described theoretically. The modulations FSK, MSK, BPSK, QPSK, and QAM-16 are concerned. These modulations are mostly used in modern communication systems. The method designed is based on the analysis of module and phase spectrograms of the modulated signals. Their histograms are used for the examination of the spectrogram properties. They provide information on the count of carrier frequencies in the signal, which is used for the FSK and MSK recognition, and on the count of phase states on which the BPSK, QPSK, and QAM-16 are classified. The spectrograms in that the characteristic attributes of the modulations are visible are obtained with the segment length equal to the symbol length. It was found that it is possible to correctly recognize the modulation with the known symbol length at the signal-to-noise ratio at least 0 dB. That is why it is necessary to detect the symbol length prior to the spectrogram calculation. Four methods were designed for this purpose: autocorrelation function, cepstrum analysis, wavelet transform, and LPC coefficients. These methods were algorithmized and analyzed with signals disturbed by the white Gaussian noise, phase noise and with signals passed through a multipass fading channel. The method of detection by means of cepstrum analysis proved the most suitable and reliable. Finally the new method for digital modulation recognition was verified with signals passed through a channel with properties close to the real one.
55

Automatic Speech Recognition Model for Swedish using Kaldi

Wang, Yihan January 2020 (has links)
With the development of intelligent era, speech recognition has been a hottopic. Although many automatic speech recognition(ASR) tools have beenput into the market, a considerable number of them do not support Swedishbecause of its small number. In this project, a Swedish ASR model basedon Hidden Markov Model and Gaussian Mixture Models is established usingKaldi which aims to help ICA Banken complete the classification of aftersalesvoice calls. A variety of model patterns have been explored, whichhave different phoneme combination methods and eigenvalue extraction andprocessing methods. Word Error Rate and Real Time Factor are selectedas evaluation criteria to compare the recognition accuracy and speed ofthe models. As far as large vocabulary continuous speech recognition isconcerned, triphone is much better than monophone. Adding feature transformationwill further improve the speed of accuracy. The combination oflinear discriminant analysis, maximum likelihood linear transformand speakeradaptive training obtains the best performance in this implementation. Fordifferent feature extraction methods, mel-frequency cepstral coefficient ismore conducive to obtain higher accuracy, while perceptual linear predictivetends to improve the overall speed. / Det existerar flera lösningar för automatisk transkribering på marknaden, menen stor del av dem stödjer inte svenska på grund utav det relativt få antalettalare. I det här projektet så skapades automatisk transkribering för svenskamed Hidden Markov models och Gaussian mixture models genom att användaKaldi. Detta för att kunna möjliggöra för ICABanken att klassificera samtal tillsin kundtjänst. En mängd av modellvariationer med olika fonemkombinationsmetoder,egenvärdesberäkning och databearbetningsmetoder har utforskats.Word error rate och real time factor är valda som utvärderingskriterier föratt jämföra precisionen och hastigheten mellan modellerna. När det kommertill kontinuerlig transkribering för ett stort ordförråd så resulterar triphonei mycket bättre prestanda än monophone. Med hjälp utav transformationerså förbättras både precisionen och hastigheten. Kombinationen av lineardiscriminatn analysis, maximum likelihood linear transformering och speakeradaptive träning resulterar i den bästa prestandan i denna implementation.För olika egenskapsextraktioner så bidrar mel-frequency cepstral koefficiententill en bättre precision medan perceptual linear predictive tenderar att ökahastigheten.
56

Biometric Multi-modal User Authentication System based on Ensemble Classifier

Assaad, Firas Souhail January 2014 (has links)
No description available.
57

Automatické rozpoznávání logopedických vad v řečovém projevu / Automatic Recognition of Logopaedic Defect in Speech Utterances

Dušil, Lubomír January 2009 (has links)
The thesis is aimed at an analysis and automatic detection of logopaedic defects in speech utterance. Its objective is to facilitate and accelerate the work of logopaedists and to increase percentage of detected logopaedic defects in children of the youngest possible age followed by the most successful treatment. It presents methods of speech work, classification of the defects within individual stages of child development and appropriate words for identification of the speech defects and their subsequent remedy. After that there are analyses of methods of calculating coefficients which reflect human speech best. Also classifiers which are used to discern and determine whether it is a speech defect or not. Classifiers exploit coefficients for their work. Coefficients and classifiers are being tested and their best combination is being looked for in order to achieve the highest possible success rate of the automatic detection of the speech defects. All the programming and testing jobs has been conducted in the Matlab programme.
58

Channel Modeling Applied to Robust Automatic Speech Recognition

Sklar, Alexander Gabriel 01 January 2007 (has links)
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.
59

Automatic Speech Quality Assessment in Unified Communication : A Case Study / Automatisk utvärdering av samtalskvalitet inom integrerad kommunikation : en fallstudie

Larsson Alm, Kevin January 2019 (has links)
Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world.
60

Automatická klasifikace výslovnosti hlásky R / Automatic classification of pronunciation of the letter „R“

Hrušovský, Enrik January 2018 (has links)
This diploma thesis deals with automatic clasification of vowel R. Purpose of this thesis is to made program for detection of pronounciation of speech defects at vowel R in children. In thesis are processed parts as speech creation, speech therapy, dyslalia and subsequently speech signal processing and analysis methods. In the last part is designed software for automatic detection of pronounciation of vowel R. For recognition of pronounciation is used algorithm MFCC for extracting features. This features are subsequently classified by neural network to the group of correct or incorrect pronounciation and is evaluated classification success.

Page generated in 0.0602 seconds