Spelling suggestions: "subject:"epeech signals"" "subject:"cpeech signals""
1 |
Adaptive conversational speech communication - analysis and application to integrated services networksRaviraj, C. R. January 1990 (has links)
No description available.
|
2 |
Modelling and extraction of fundamental frequency in speech signalsPawi, Alipah January 2014 (has links)
One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method.
|
3 |
C++ Builder mokymo taikomieji aspektai / C++ Builder teaching aspectsLakutijevskis, Miroslavas 20 June 2005 (has links)
First of all the Internet pages presenting C++ Builder teaching material, are studied in the Master’s thesis. Applied themes are stressed the most of which is lacking in those pages. The aim of the work is – to describe the uses of C++ Builder tools that is - components of various format files (*.bmp, *.wmf, *.emf, *.ico, etc.) used to show images, graphic objects in the form of the program, which can show drawings, schemes, drafts; reading and drawing of random sequences, geometrical figures, function, drawing of analytically given functions and data presented by number sequences, the speech signals and other the sound files. Some examples of reading listening and viewing of the different sound files are also presented. The other important for applications subjects, as databases and programming of connections with databases, problem solutions in the form of algorithms, are also described The optimum algorithms “Division in half” and “Golden section” algorithms are realized in C++ Builder.
|
4 |
Sparsity Motivated Auditory Wavelet Representation and Blind DeconvolutionAdiga, Aniruddha January 2017 (has links) (PDF)
In many scenarios, events such as singularities and transients that carry important information about a signal undergo spreading during acquisition or transmission and it is important to localize the events. For example, edges in an image, point sources in a microscopy or astronomical image are blurred by the point-spread function (PSF) of the acquisition system, while in a speech signal, the epochs corresponding to glottal closure instants are shaped by the vocal tract response. Such events can be extracted with the help of techniques that promote sparsity, which enables separation of the smooth components from the transient ones. In this thesis, we consider development of such sparsity promoting techniques. The contributions of the thesis are three-fold: (i) an auditory-motivated continuous wavelet design and representation, which helps identify singularities; (ii) a sparsity-driven deconvolution technique; and (iii) a sparsity-driven deconvolution technique for reconstruction of nite-rate-of-innovation (FRI) signals. We use the speech signal for illustrating the performance of the techniques in the first two parts and super-resolution microscopy (2-D) for the third part.
In the rst part, we develop a continuous wavelet transform (CWT) starting from an auditory motivation. Wavelet analysis provides good time and frequency localization, which has made it a popular tool for time-frequency analysis of signals. The CWT is a multiresolution analysis tool that involves decomposition of a signal using a constant-Q wavelet filterbank, akin to the time-frequency analysis performed
by basilar membrane in the peripheral human auditory system. This connection motivated us to develop wavelets that possess auditory localization capabilities. Gammatone functions are extensively used in the modeling of the basilar membrane, but the non-zero average of the functions poses a hurdle. We construct bona de wavelets from the Gammatone function called Gammatone wavelets and analyze their properties such as admissibility, time-bandwidth product, vanishing moments, etc..
Of particular interest is the vanishing moments property, which enables the wavelet to suppress smooth regions in a signal leading to sparsi cation. We show how this property of the Gammatone wavelets coupled with multiresolution analysis could be employed for singularity and transient detection. Using these wavelets, we also construct equivalent lterbank models and obtain cepstral feature vectors out of such a representation. We show that the Gammatone wavelet cepstral coefficients (GWCC) are effective for robust speech recognition compared with mel-frequency cepstral coefficients (MFCC).
In the second part, we consider the problem of sparse blind deconvolution (SBD) starting from a signal obtained as the convolution of an unknown PSF and a sparse excitation. The BD problem is ill-posed and the goal is to employ sparsity to come up with an accurate solution. We formulate the SBD problem within a Bayesian framework. The estimation of lter and excitation involves optimization of a cost function that consists of an `2 data- fidelity term and an `p-norm (p 2 [0; 1]) regularizer, as the sparsity promoting prior. Since the `p-norm is not differentiable at the origin, we consider a smoothed version of the `p-norm as a proxy in the optimization. Apart from the regularizer being non-convex, the data term is also non-convex in the filter and excitation as they are both unknown. We optimize the non-convex cost using an alternating minimization strategy, and develop an alternating `p `2 projections algorithm (ALPA). We demonstrate convergence of the iterative algorithm and analyze in detail the role of the pseudo-inverse solution as an initialization for the ALPA and provide probabilistic bounds on its accuracy considering the presence of noise and the condition number of the linear system of equations. We also consider the case of bounded noise and derive tight tail bounds using the Hoe ding inequality.
As an application, we consider the problem of blind deconvolution of speech signals. In the linear model for speech production, voiced speech is assumed to be the result of a quasi-periodic impulse train exciting a vocal-tract lter. The locations of the impulses or epochs indicate the glottal closure instants and the spacing between them the pitch. Hence, the excitation in the case of voiced speech is sparse and its deconvolution from the vocal-tract filter is posed as a SBD problem. We employ ALPA for SBD and show that excitation obtained is sparser than the excitations obtained using sparse linear prediction, smoothed `1=`2 sparse blind deconvolution algorithm, and majorization-minimization-based sparse deconvolution techniques. We also consider the problem of epoch estimation and show that epochs estimated by ALPA in both clean and noisy conditions are closer to the instants indicated by the electroglottograph when with to the estimates provided by the zero-frequency ltering technique, which is the state-of-the-art epoch estimation technique.
In the third part, we consider the problem of deconvolution of a specific class of continuous-time signals called nite-rate-of-innovation (FRI) signals, which are not bandlimited, but specified by a nite number of parameters over an observation interval. The signal is assumed to be a linear combination of delayed versions of a prototypical pulse. The reconstruction problem is posed as a 2-D SBD problem. The kernel is assumed to have a known form but with unknown parameters. Given the sampled version of the FRI signal, the delays quantized to the nearest point on the sampling grid are rst estimated using proximal-operator-based alternating `p `2 algorithm (ALPAprox), and then super-resolved to obtain o -grid (O. G.) estimates using gradient-descent optimization. The overall technique is termed OG-ALPAprox.
We show application of OG-ALPAprox to a particular modality of super-resolution microscopy (SRM), called stochastic optical reconstruction microscopy (STORM).
The resolution of the traditional optical microscope is limited by di raction and is termed as Abbe's limit. The goal of SRM is to engineer the optical imaging system to resolve structures in specimens, such as proteins, whose dimensions are smaller than the di raction limit. The specimen to be imaged is tagged or labeled with light-emitting or uorescent chemical compounds called uorophores. These compounds speci cally bind to proteins and exhibit uorescence upon excitation. The uorophores are assumed to be point sources and the light emitted by them undergo spreading due to di raction. STORM employs a sequential approach, wherein each step only a few uorophores are randomly excited and the image is captured by a sensor array. The obtained image is di raction-limited, however, the separation between the uorophores allows for localizing the point sources with high precision. The localization is performed using Gaussian peak- tting. This process of random excitation coupled with localization is performed sequentially and subsequently consolidated to obtain a high-resolution image. We pose the localization as a SBD problem and employ OG-ALPAprox to estimate the locations. We also report comparisons with the de facto standard Gaussian peak- tting algorithm and show that the statistical performance is superior. Experimental results on real data show that the reconstruction quality is on par with the Gaussian peak- tting.
|
5 |
Timbre Perception of Time-Varying SignalsArthi, S January 2014 (has links) (PDF)
Every auditory event provides an information-rich signal to the brain. The signal constitutes perceptual attributes of pitch, loudness, timbre, and also, conceptual attributes like location, emotions, meaning, etc. In the present work we examine the timbre perception of time-varying signals in particular. While stationary signal timbre, by-itself is complex perceptually, the time-varying signal timbre introduces an evolving pattern, adding to its multi-dimensionality.
To characterize timbre, we conduct psycho-acoustic perception tests with normal-hearing human subjects. We focus on time-varying synthetic speech signals(can be extended to music) because listeners are perceptually consistent with speech. Also, we can parametrically control the timbre and pitch glides using linear time-varying models. In order to quantify the timbre change in time-varying signals, we define the JND(Just noticeable difference) of timbre using diphthongs, synthesized using time-varying formant frequency model. The diphthong JND is defined as a two dimensional contour on the plane of percentage change of formant frequencies of terminal vowels. Thus, we simplify the perceptual probing to a lower dimensional space, i.e, 2-D even for a diphthong, which is multi-parametric. We also study the impact of pitch glide on the timbre JND of the diphthong. It is observed that timbre JND is influenced by the occurrence of pitch glide.
Focusing on the magnitude of perceptual timbre change, we design a MUSHRA-like listening test using the vowel continuum in the formant-frequency space. We provide explicit anchors for reference: 0% and 100%, thus quantifying the perceptual timbre change on a 1-D scale. We also propose an objective measure of timbre change and observe that there is good correlation between the objective measure and subjective human responses of percentage timbre change.
Using the above experimental methodology, we studied the influence of pitch shift on timbre perception and observed that the perceptual timbre change increases with change in pitch. We used vowels and diphthongs with 5 different types of pitch glides-(i) Constant pitch,(ii) 3-semitone linearly-up,(iii) 3 semitone linearly-down, (iv)V–like pitch glide and (v) hat-like pitch glide. The present study shows that timbre change can be measured on a 1-D scale if the perturbation is along one-dimension. We observe that for bright vowels(/a/and/i/), linearly decreasing pitch glide(dull pitch glide)causes more timbre change than linearly increasing pitch glide(bright pitch glide).For dull vowels(/u/),it is vice-versa. To summarize, in congruent pitch glides cause more perceptual timbre change than congruent pitch glides.(Congruent pitch glide implies bright pitch glide in bright vowel or dull pitch glide in dull vowel and in congruent pitch glide implies bright pitch glide in dull vowel or dull pitch glide in bright vowel.) Experiments with quadratic pitch glides show that the decay portion of pitch glide affects timbre perception more than the attack portion in short duration signals with less or no sustained part.
In case of time-varying timbre, bright diphthongs show patterns similar to bright vowels. Also, for bright diphthongs(/ai/), perceived timbre change is most with decreasing pitch glide(dull pitch glide). We also observed that listeners perceive more timbre change in constant pitch than in pitch glides, congruent with the timbre or pitch glides with quadratic changes.
The main conclusion of this study is that pitch and timbre do interact and in congruent pitch glides cause more timbre change than congruent pitch glides. In the case of quadratic pitch glides, listener perception of vowels is influenced by the decay than the attack in pitch glide in short duration signals. In the case of time-varying timbre also, in congruent pitch glides cause the most timbre change, followed by constant pitch glide. For congruent pitch glides and quadratic pitch glides in time-varying timbre, the listeners perceive lesser timbre change than otherwise.
|
6 |
Nonstationary Techniques For Signal Enhancement With Applications To Speech, ECG, And Nonuniformly-Sampled SignalsSreenivasa Murthy, A January 2012 (has links) (PDF)
For time-varying signals such as speech and audio, short-time analysis becomes necessary to compute specific signal attributes and to keep track of their evolution. The standard technique is the short-time Fourier transform (STFT), using which one decomposes a signal in terms of windowed Fourier bases. An advancement over STFT is the wavelet analysis in which a function is represented in terms of shifted and dilated versions of a localized function called the wavelet. A specific modeling approach particularly in the context of speech is based on short-time linear prediction or short-time Wiener filtering of noisy speech. In most nonstationary signal processing formalisms, the key idea is to analyze the properties of the signal locally, either by first truncating the signal and then performing a basis expansion (as in the case of STFT), or by choosing compactly-supported basis functions (as in the case of wavelets). We retain the same motivation as these approaches, but use polynomials to model the signal on a short-time basis (“short-time polynomial representation”). To emphasize the local nature of the modeling aspect, we refer to it as “local polynomial modeling (LPM).”
We pursue two main threads of research in this thesis: (i) Short-time approaches for speech enhancement; and (ii) LPM for enhancing smooth signals, with applications to ECG, noisy nonuniformly-sampled signals, and voiced/unvoiced segmentation in noisy speech.
Improved iterative Wiener filtering for speech enhancement
A constrained iterative Wiener filter solution for speech enhancement was proposed by Hansen and Clements. Sreenivas and Kirnapure improved the performance of the technique by imposing codebook-based constraints in the process of parameter estimation. The key advantage is that the optimal parameter search space is confined to the codebook. The Nonstationary signal enhancement solutions assume stationary noise. However, in practical applications, noise is not stationary and hence updating the noise statistics becomes necessary. We present a new approach to perform reliable noise estimation based on spectral subtraction. We first estimate the signal spectrum and perform signal subtraction to estimate the noise power spectral density. We further smooth the estimated noise spectrum to ensure reliability. The key contributions are: (i) Adaptation of the technique for non-stationary noises; (ii) A new initialization procedure for faster convergence and higher accuracy; (iii) Experimental determination of the optimal LP-parameter space; and (iv) Objective criteria and speech recognition tests for performance comparison.
Optimal local polynomial modeling and applications
We next address the problem of fitting a piecewise-polynomial model to a smooth signal corrupted by additive noise. Since the signal is smooth, it can be represented using low-order polynomial functions provided that they are locally adapted to the signal. We choose the mean-square error as the criterion of optimality. Since the model is local, it preserves the temporal structure of the signal and can also handle nonstationary noise. We show that there is a trade-off between the adaptability of the model to local signal variations and robustness to noise (bias-variance trade-off), which we solve using a stochastic optimization technique known as the intersection of confidence intervals (ICI) technique. The key trade-off parameter is the duration of the window over which the optimum LPM is computed.
Within the LPM framework, we address three problems: (i) Signal reconstruction from noisy uniform samples; (ii) Signal reconstruction from noisy nonuniform samples; and (iii) Classification of speech signals into voiced and unvoiced segments.
The generic signal model is
x(tn)=s(tn)+d(tn),0 ≤ n ≤ N - 1.
In problems (i) and (iii) above, tn=nT(uniform sampling); in (ii) the samples are taken at nonuniform instants. The signal s(t)is assumed to be smooth; i.e., it should admit a local polynomial representation. The problem in (i) and (ii) is to estimate s(t)from x(tn); i.e., we are interested in optimal signal reconstruction on a continuous domain starting from uniform or nonuniform samples.
We show that, in both cases, the bias and variance take the general form:
The mean square error (MSE) is given by
where L is the length of the window over which the polynomial fitting is performed, f is a function of s(t), which typically comprises the higher-order derivatives of s(t), the order itself dependent on the order of the polynomial, and g is a function of the noise variance. It is clear that the bias and variance have complementary characteristics with respect to L. Directly optimizing for the MSE would give a value of L, which involves the functions f and g. The function g may be estimated, but f is not known since s(t)is unknown. Hence, it is not practical to compute the minimum MSE (MMSE) solution. Therefore, we obtain an approximate result by solving the bias-variance trade-off in a probabilistic sense using the ICI technique. We also propose a new approach to optimally select the ICI technique parameters, based on a new cost function that is the sum of the probability of false alarm and the area covered over the confidence interval. In addition, we address issues related to optimal model-order selection, search space for window lengths, accuracy of noise estimation, etc.
The next issue addressed is that of voiced/unvoiced segmentation of speech signal. Speech segments show different spectral and temporal characteristics based on whether the segment is voiced or unvoiced. Most speech processing techniques process the two segments differently. The challenge lies in making detection techniques offer robust performance in the presence of noise. We propose a new technique for voiced/unvoiced clas-sification by taking into account the fact that voiced segments have a certain degree of regularity, and that the unvoiced segments do not possess any smoothness. In order to capture the regularity in voiced regions, we employ the LPM. The key idea is that regions where the LPM is inaccurate are more likely to be unvoiced than voiced. Within this frame-work, we formulate a hypothesis testing problem based on the accuracy of the LPM fit and devise a test statistic for performing V/UV classification. Since the technique is based on LPM, it is capable of adapting to nonstationary noises. We present Monte Carlo results to demonstrate the accuracy of the proposed technique.
|
7 |
Characterization of the Voice Source by the DCT for Speaker InformationAbhiram, B January 2014 (has links) (PDF)
Extracting speaker-specific information from speech is of great interest to both researchers and developers alike, since speaker recognition technology finds application in a wide range of areas, primary among them being forensics and biometric security systems.
Several models and techniques have been employed to extract speaker information from the speech signal. Speech production is generally modeled as an excitation source followed by a filter. Physiologically, the source corresponds to the vocal fold vibrations and the filter corresponds to the spectrum-shaping vocal tract. Vocal tract-based features like the melfrequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients have been shown to contain speaker information. However, high speed videos of the larynx show that the vocal folds of different individuals vibrate differently. Voice source (VS)-based features have also been shown to perform well in speaker recognition tasks, thereby revealing that the VS does contain speaker information. Moreover, a combination of the vocal tract and VS-based features has been shown to give an improved performance, showing that the latter contains supplementary speaker information.
In this study, the focus is on extracting speaker information from the VS. The existing techniques for the same are reviewed, and it is observed that the features which are obtained by fitting a time-domain model on the VS perform poorly than those obtained by simple transformations of the VS. Here, an attempt is made to propose an alternate way of characterizing the VS to extract speaker information, and to study the merits and shortcomings of the proposed speaker-specific features.
The VS cannot be measured directly. Thus, to characterize the VS, we first need an estimate of the VS, and the integrated linear prediction residual (ILPR) extracted from the speech signal is used as the VS estimate in this study. The voice source linear prediction model, which was proposed in an earlier study to obtain the ILPR, is used in this work.
It is hypothesized here that a speaker’s voice may be characterized by the relative proportions of the harmonics present in the VS. The pitch synchronous discrete cosine transform (DCT) is shown to capture these, and the gross shape of the ILPR in a few coefficients. The ILPR and hence its DCT coefficients are visually observed to distinguish between speakers. However, it is also observed that they do have intra-speaker variability, and thus it is hypothesized that the distribution of the DCT coefficients may capture speaker information, and this distribution is modeled by a Gaussian mixture model (GMM).
The DCT coefficients of the ILPR (termed the DCTILPR) are directly used as a feature vector in speaker identification (SID) tasks. Issues related to the GMM, like the type of covariance matrix, are studied, and it is found that diagonal covariance matrices perform better than full covariance matrices. Thus, mixtures of Gaussians having diagonal covariances are used as speaker models, and by conducting SID experiments on three standard databases, it is found that the proposed DCTILPR features fare comparably with the existing VS-based features. It is also found that the gross shape of the VS contains most of the speaker information, and the very fine structure of the VS does not help in distinguishing speakers, and instead leads to more confusion between speakers. The major drawbacks of the DCTILPR are the session and handset variability, but they are also present in existing state-of-the-art speaker-specific VS-based features and the MFCCs, and hence seem to be common problems. There are techniques to compensate these variabilities, which need to be used when the systems using these features are deployed in an actual application.
The DCTILPR is found to improve the SID accuracy of a system trained with MFCC features by 12%, indicating that the DCTILPR features capture speaker information which is missed by the MFCCs. It is also found that a combination of MFCC and DCTILPR features on a speaker verification task gives significant performance improvement in the case of short test utterances. Thus, on the whole, this study proposes an alternate way of extracting speaker information from the VS, and adds to the evidence for speaker information present in the VS.
|
8 |
Análise dinâmica não linear de sinais de voz para detecção de patologias laríngeas. / Dynamic nonlinear analysis of voice signals for the detection of laryngeal pathologies.COSTA, Washington César de Almeida. 13 August 2018 (has links)
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-08-13T16:22:35Z
No. of bitstreams: 1
WASHINGTON CÉSAR DE ALMEIDA COSTA - TESE PPGEE 2012..pdf: 6463355 bytes, checksum: 40d8703ef8a6dd3ef05acde3025cf628 (MD5) / Made available in DSpace on 2018-08-13T16:22:35Z (GMT). No. of bitstreams: 1
WASHINGTON CÉSAR DE ALMEIDA COSTA - TESE PPGEE 2012..pdf: 6463355 bytes, checksum: 40d8703ef8a6dd3ef05acde3025cf628 (MD5)
Previous issue date: 2012-11-09 / Patologias na laringe podem afetar a qualidade vocal, prejudicando a comunicação humana. As técnicas objetivas tradicionais para o diagnóstico dessas patologias fazem uso de exames considerados invasivos, causando certo desconforto ao paciente. Análise acústica, utilizando técnicas de processamento digital de sinais de voz, pode ser utilizada para o desenvolvimento de ferramentas não invasivas de auxílio ao diagnóstico de patologias laríngeas. A precisão do diagnóstico, contudo, depende da escolha das características e parâmetros da fala que melhor representem a desordem vocal provocada por uma determinada patologia. Este trabalho trata da caracterização e da classificação de sinais de vozes saudáveis e vozes afetadas por diferentes patologias laríngeas (edema, paralisia e nódulos nas pregas vocais), por meio da análise dinâmica não linear (e teoria do caos), como também por meio da análise de quantificação de recorrência. No processo de caracterização é investigado, por meio de testes estatísticos,
o potencial de cada característica em discriminar os tipos de sinais de voz considerados. Para a classificação é empregada a técnica de análise discriminante com as funções linear ou quadrática, com validação cruzada, sendo considerado um intervalo de confiança de 95% para as médias das taxas de acuraria do classificador. A partir da combinação de características dos conjuntos das medidas de análise não linear (MNL) e das medidas de quantificação de recorrência (MQR), as médias da taxa de acurácia obtidas variaram nos intervalos de confiança: [95,44%; 100%) para a classificação entre vozes saudáveis e patológicas; [94,75%; 100%] entre vozes saudáveis e afetadas por edema, e entre saudáveis e nódulos. Para a classificação entre saudável e paralisia, obteve-se uma acurácia de 100% . Também são avaliados os efeitos do uso de vetores híbridos formados por características MNL, MQR e coeficientes extraídos da
análise preditiva linear (LPC). Neste caso. as taxas de acurácia variaram nos intervalos de confiança: [95,02%; 97,62%] na discriminação entre vozes afetadas por paralisia e edema; [98,29%; 99,93%] para paralisia versus nódulos e [97,98%; 99,84%] para edema versus nódulos. Os resultados encontrados indicam que o método utilizado é promissor, podendo ser empregado no desenvolvimento de uma ferramenta computacional para apoio ao diagnóstico de patologias laríngeas. / Laryngeal pathologies may affect the voice quality, harniing human communication.
The traditional objective techniques for diagnosing these pathologies make use of exams, considered invasive, causing discomfort to the patient. Acoustic analysis, using digital speech signal processing techniques. can be used for the development of non-invasive tools in order to aid laryngeal diseases diagnosis. The accuracy of diagnosis, however. depends on the choice of parameters and the speech characteristics diat better represent the voice disorder caused by a given pathology. This work deals with the characterization and classification of healthy voice signals and voices affecied by different laryngeal diseases (edema, paralysis and vocal fold nodules), by means of nonlinear dynamic analysis (and chãos theory) as well as recurrence quantification analysis. In the characterization process, the potential of each feature is investigated to discriminate the types of voice signals considered, by means of statistical tests. For the classification,
the technique of discriminam analysis is employed with linear or quadratic functions,
with cross-validation. A 95% confidence levei was considered for the average of accuracy rates of the classifier performance. From the feature combination of the set of nonlinear analysis measures (MNL) and the quantification recurrence measures (MQR). the average of accuracy rates varied in the following confidence intervals: [95.44%; 100%] for healthy and pathologícal classification: [94.75%; 100%] between healdiy and edema voices, and also between healthy and nodules. The accuracy rate was 100% between healthy and paralysis. We also evaluated the effects of using hybrid vectors formed by MNL, MQR and linear predictive coding (LPC) coefficients. In this case, the accuracy rates ranged in the confidence intervals: [95.02%; 97.62%] in the paralysis versus edema voices discrimination; [98.29%; 99.93%] for paralysis versus nodules and [97.98%; 99.84%] for edema versus nodules. Obtained results indicate that the used method is promising and it can even be used to develop a computational tool to support diagnosis of laryngeal diseases.
|
Page generated in 0.0644 seconds