Spelling suggestions: "subject:"voice signal"" "subject:"joice signal""
1 |
Process and Analysis of Voice Signal by MATLABWu, Nan, Wang, Bofei January 2014 (has links)
Deliver message by voice is the most important, effective and common method of exchange information for mankind. Language is human specific features and human voice is commonly used tool which is also the important way to pass information to each other. The voice has large information capacity. So we can use modern method to study voice processing technology, so that people can easily transmit, store, access and apply the voice. In this thesis, we designed a collection system that can collect voice and use different filters to filter the noise. After filtering the noise, the voice will be more quality in mobile communication, radio, TV and so on. In this thesis we use Microsoft recorder to collect a voice, and then analyze its time-domain, the frequency spectrum and the characteristics of the voice signal. We use MATLAB‟s function to remove the noise which has been added to the voice, further use bilinear transformation method to design a filter which is based on Butterworth simulation and window function and then filter the voice signal which has been added noise. After that we compare the time-domain and frequency-domain of the original voice and noised voice, then playback the noised voice and de-noising voice and then compare the application of signal processing in FIR filter and IIR filter, especially in the perspectives of the signal filtering de-noising characteristics and applications. According to the comparison, we can determine which filter is the best.
|
2 |
Aukštesnių eilių statistika grįsto balso detektavimo algoritmo sudarymas ir tyrimas / Design and analysis of voice activity detector based on higher order statisticsDuchovskis, Donatas 29 May 2006 (has links)
This report covers a robust voice activity detection (VAD) algorithm presented in [1]. The algorithm uses higher order statistics (HOS) metrics of speech signal in linear prediction coding (LPC) residual domain to classify noise and speech frames of a signal. Chapters in this report present voice activity detection problem and analysis of environment issues for VAD, deep HOS based and standard algorithms analysis and a real time HOS based voice activity detector model. New improvements (instantaneous SNR estimation, decision smoothing, adaptive thresholds, artificial neural network) to the proposed algorithm are introduced and performance results of the improved algorithm compared to standard VAD algorithms are presented.
|
3 |
Speaker identification based on an integrated system combining cepstral feature extraction and vector quantizationSanchez, Jose Boris. Meyer-Baese, Anke. January 2005 (has links)
Thesis (M.S.)--Florida State University, 2005. / Advisor: Dr. Anke Meyer-Baese, Florida State University, College of Engineering, Dept. of Electrical Engineering. Title and description from dissertation home page (viewed June 15, 2005). Document formatted into pages; contains vii, 30 pages. Includes bibliographical references.
|
4 |
Biomechanically informed nonlinear speech signal processingLittle, M. A. January 2007 (has links)
Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.
|
5 |
Estimação do sinal glotal para padrões acústicos de doenças da laringe / not availableGuerra, Aparecida de Cássia 03 May 2005 (has links)
Muitas pesquisas tem sido feitas em processamento digital de sinais (PDS) na tentativa de se avaliar o sinal de fala para diagnosticar doenças da laringe. Medidas acústicas têm sido propostas de forma a avaliar indiretamente o trato glotal por meio do sinal de voz coletado através de microfone convencional. Para isso, o modelo paramétrico Liljencrants-Fant (LF) foi desenvolvido para representar o sinal glotal em condições normais e patológicas. Tais parâmetros apresentam vantagens sobre medidas acústicas por possuírem características fisiológicas reais das pregas vocais. Assim, podendo ser empregados para identificação de doenças da laringe. Além da estimação dos parâmetros LF, no domínio do tempo (parâmetros T), a forma de onda da derivativa glotal também pôde ser quantificada através dos parâmetros identificados na literatura por parâmetros R (Rd, Ra, Rk e Rg), parâmetros quocientes Q (SQ, OQ, CQ, AQ e NAQ), parâmetros B1 e B2 que são as extensões de bandas do pulso derivativo LF, e o parâmetro ece, que relaciona os parâmetros β e Ta. Os parâmetros B1 e B2 e ece apesar de serem propostos na literatura, não são encontrados resultados diferentes a essas duas medidas. Os resultados mostraram que os parâmetros B não foram confiáveis na discriminação entre as vozes, por outro lado, o parâmetro ece mostrou-se ser opção na discriminação entre as vozes normais, nódulo e Reinke. O objetivo deste trabalho é direcionar a atenção sobre o sinal glotal, estimando-o automaticamente mediante técnicas de PDS aplicadas ao sinal de fala, visando extrair parâmetros que identifiquem as condições normais e patológicas da laringe. Por fim foram propostos os parâmetros TRp e TRs, visando dissociar os efeitos de primeira ordem dos de ordem superior na fase de retorno do pulso glotal com a finalidade de estimar a real não-linearidade do sub-sistema glotal, retratando as condições normais e patológicas da laringe. Por fim foram propostos os parâmetros TRp e TRs, visando dissociar os efeitos de primeira ordem dos de ordem superior na fase de retorno do pulso glotal com a finalidade de estimar a real não-linearidade do sub-sistema glotal, retratando as condições fisiológicas do movimento das pregas vocais. Com um nível de confiança de 95%, o parâmetro de primeira ordem (TRp) é efetivo na discriminação do Edema de Reinke, porém mostrou-se ineficaz na detecção do nódulo. Em relação ao parâmetro de ordem superior, conclui-se que o TRs é um excelente detetor de vozes patológicas (nódulo e Edema de Reinke), porém não é capaz de discriminar as patologias. / Many researches has been conducted in digital signal processing (DSP) atempting to evaluate the physiological conditions of larynx. Acoustical parameters have been proposed to evaluate the glotal tract from voice signal. One technique proposed is the Liljencrants-Fant model (LF) developed to represent normal and pathologic conditions of the larynx. Those parameters compare favourably as far as real physiologic characteristic of vocal folds is concerned. So, a primary use of the model is the larynx pathologic identification. Beyond LF parameters estimation, (T parameters in the time domain), the waveform of glotal pulse derivative also can be quantified through, R parameters (Rd, Ra, Rk and Rg), quocient parameters (SQ, OQ, CQ, AQ and NAQ), B parameters (B1 and B2) that are band extension of the LF glotal pulse derivative and the ece parameter that in fact, is a relationship between β and Ta. Although proposed in the literature, no results are found, related to B and ece parameters. Our founds show that B parameters do not present good results in voice discrimination, however, ece parameter seems to be good option to discriminate normal voice, nodulo and Reinke edema. The main purpose of this work is to estimate the glotal signal from the voice signal using DSP techniques in order to obtain parameters that identifies the physiological larynx condition. In order to estimate the shape of return phase of glotal pulse, twoparameters have been proposed in this work. The first one evaluates the pulse (TRp, in other words, the first order component of the return phase. The second is responsible to evaluate superior orders components of the return phase (TRs), i.e, the non-linear component of the glotal pulse. With 95% of confidence level, TRp is effective in Reinke edema discrimination however it is inefficient for nodule e dection. By the other hand, the TRs parameter works well to detect pathologic voice however is unable to discriminated them.
|
6 |
Análise acústica para classificação de patologias da voz empregando análise de Componentes Principais, Redes Neurais Artificiais e Máquina de vetores de Suporte.ESPINOLA, Sérgio de Brito. 19 September 2017 (has links)
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2017-09-19T15:36:01Z
No. of bitstreams: 1
Dissertacao_SergioEspinola_CEEI_UFCG.pdf: 59559230 bytes, checksum: 045a4738e365ab656e17da8b2185cb9b (MD5) / Made available in DSpace on 2017-09-19T15:36:01Z (GMT). No. of bitstreams: 1
Dissertacao_SergioEspinola_CEEI_UFCG.pdf: 59559230 bytes, checksum: 045a4738e365ab656e17da8b2185cb9b (MD5)
Previous issue date: 2014-03-12 / Estima-se que um terço da força de trabalho humana dependa da voz para
realização de seus ofícios. Procedimentos médicos avaliam a qualidade vocal do
indivíduo sendo os mais usados aqueles baseados na escuta da voz (subjetivo)
ou na inspeção das dobras (ou pregas) vocais por exames sofisticados
(objetivos, porém invasivos e caros). A análise acústica da voz busca extrair
medidas robustas para descrever vários fenômenos associados à produção da
fala ou características intrínsecas do ser humano como frequência fundamental,
timbre, etc. O presente estudo consiste na caracterização de um modelo de
processamento digital de Voz para apoio ao diagnóstico no contexto da
construção de sistemas de identificação automatizados de patologias da fala.
Para análise da técnica proposta foi utilizada uma base de dados (base KAY) que
foi estruturada por especialistas num arranjo de seis grupos de Patologias. A
esse, acrescentado também um de vozes “Normal”. Assim, 182 vozes foram
escolhidas, as quais dispunham de um catálogo indexado de cerca de 33
descritores, para cada voz, calculados da elocução da vogal \a\ sustentada. Ao
selecionar combinações desses descritores – como perturbações em frequência
(jitter), em amplitude (shimmer) etc, este estudo encontrou evidências
estatísticas e mostrou ser possível: a) Separar vozes normais das patológicas –
esperado, b) Separar patologias específicas (Paralisia, Edema de Reinke,
Nódulos) com acurácia de 100% (para a grande maioria dessas combinações) e
cerca de 92% (para Nódulos contra Reinke); c) Discriminá-las por meio de
classificadores (redes neurais artificiais e máquina de vetores de suporte) e
reduzir a dimensionalidade e complexidade (quantidade de dados) via técnica de
análise de componentes principais (ACP) sobre esses descritores para a
separação intra patologias; e d) Testes estatísticos com os grupos locais
confirmaram também limiares de indícios de Anormalidade presentes na
literatura. A utilização de menor quantidade de descritores – obtida pós ACP
(compressão) – mostrou-se também eficiente (mesmas taxas de acurácia). / It is estimated one-third of the work force relies on the use the voice in their jobs. The clinical diagnostic may be performed on voice listening by a specialist (subjective perspective) or through invasive and often not cheaper exams to
check vocal structures. The area of Voice Acoustic analyses aims to extract
robust measurements to describe several phenomena associated with voice
production, or human being particular characteristics like fundamental frequency,
timbre, etc. This study consisted of a model characterizing the digital voice
processing for support in building automatic systems for the identification of
disorders of speech (to aid diagnosis of pathologies). To support this
investigation and proposed model, a commercial voice database (KAY base) was
used with the endorsement from medical specialists. Derived acoustic analyses of
those speech samples data records were presented to professionals for
classification and six “severities groups” case-studied were built. After these
analyses, one Normal group was added and, at the end, 182 voices have been
selected. Their refined audio database contain, among other things, an indexed
list of vocal descriptors calculated on the presence of the utterance of the vowel
\a\ sustained speech. Statistical evidences were found: a) Difference between
pathological groups vocal descriptors to normal (expected); b) It was achieved
100% from true positive, most cases, among Paralysis, Reinke's Edema and
Nodules separations; c) from few cases, there were detected minor distinctions:
Paralysis, Reinke's Edema, Nodules and Edema (pair comparison) with
disordered groups; c) Among Machine Learning Algorithms (artificial neural
networks "RN" and support vector machine "SVM"), the technique of Principal
Components Analyses (PCA) and main statistics performed, it was found facts to
help to structure some automated recognition systems. These Supervised
learning methods showed that it could be possible to generate classification
predictions (disordered presence) for the response to new data; and d) Inner
tests also confirmed literature established reference thresholds. Hence
considering suitable combinations of descriptors with two machine learning
classifiers, as showed, is sufficient suitable and worthy.
|
7 |
Modelo de produção da voz baseado na biofísica da fonação.ROCHA, Raissa Bezerra. 24 August 2018 (has links)
Submitted by Maria Medeiros (maria.dilva1@ufcg.edu.br) on 2018-08-24T15:00:51Z
No. of bitstreams: 1
RAISSA BEZERRA ROCHA - TESE (PPgEE) 2017.pdf: 2547994 bytes, checksum: e7533ebc755ba778f971329b75a40ff2 (MD5) / Made available in DSpace on 2018-08-24T15:00:51Z (GMT). No. of bitstreams: 1
RAISSA BEZERRA ROCHA - TESE (PPgEE) 2017.pdf: 2547994 bytes, checksum: e7533ebc755ba778f971329b75a40ff2 (MD5)
Previous issue date: 2017-03-20 / CNPq / A busca por novos modelos que representem a biofísica da fonação da voz é importante
em aplicações que incluem o processamento do sinal de voz por representar uma ferramenta no conhecimento de característica dos locutores. Esta tese de doutorado apresenta uma nova abordagem para a teoria fonte-filtro de geração de voz, mais precisamente sons sonoros, que realiza a modelagem da voz por meio de três subsistemas independentes: fonte de excitação, trato vocal e radiação dos lábios e narinas. Trata-se de um modelo em que a geração da voz é feita por meio de filtros lineares e invariantes ao deslocamento no tempo e que leva em consideração a física da fonação, a partir da característica cicloestacionária do sinal de voz, proveniente do comportamento de vibração das cordas vocais. É sugerido que a frequência de oscilação das cordas vocais é dada em função da massa e comprimento delas, e que seu valor é alterado principalmente pela tensão longitudinal aplicada a elas. No modelo proposto para geração da voz, o movimento vibratório das cordas vocais é modelado por meio de um de gerador de trem de impulsos cicloestacionário, controlado por um sinal de tensão obtido a partir da forma de onda do sinal de voz. É realizada toda a análise matemática que abrange o novo modelo para a excitação glotal, apresentando-se uma expressão matemática da densidade espectral de potência do sinal que excita a glote, bem como para o sinal de voz, cujos parâmetros podem ser ajustados para emular patologias na glote. Além disso, apresenta-se a análise no domínio da frequência do pulso glotal usado. Para analisar o desempenho do modelo proposto, testes com locução foram realizados e os resultados indicam que o modelo proposto se ajusta bem a geração da voz. / The search for new models that represent the biophysics of voice phonation is important
for applications that include voice signal processing because it represents a tool for getting to
know the characteristics of the speakers. This doctoral thesis presents a new proposal for the
source-filter theory of voice production, more precisely related to voiced sounds, that performs
the voice modelling using three independent subsystems: the excitation source, the vocal tract,
the lip and nostrils radiation system. It is a proposal for a model to generate voice using linear
and time-invariant systems, and takes into account the phonation physics and the cyclestationarity
characteristics of the voice signal, related to the vibrational behavior of the vocal cords.
The model suggests that the frequency oscillation of the vocal folds is a function of the mass and
length, but controlled by the longitudinal tension applied to them. In the proposed voice generation
model, the vibratory movement of the vocal cords is modeled by a cyclestationary train of
impulses, controlled by a tension signal obtained from the voice signal waveform. A mathematical
analysis encompassing the new model for glottal excitation is accomplished by presenting
a mathematical expression of the signal power spectral density which excites the glottis, as well
as the voice signal, whose parameters can be adjusted to emulate pathologies in the glottis. Moreover,
the analysis of the utilized glottal pulse in the frequency domain is presented. To analyze
the performance of the proposed model, tests with locutions were done and the results indicate
that the proposed model adjusts well to voice generation.
|
8 |
Estimação do sinal glotal para padrões acústicos de doenças da laringe / not availableAparecida de Cássia Guerra 03 May 2005 (has links)
Muitas pesquisas tem sido feitas em processamento digital de sinais (PDS) na tentativa de se avaliar o sinal de fala para diagnosticar doenças da laringe. Medidas acústicas têm sido propostas de forma a avaliar indiretamente o trato glotal por meio do sinal de voz coletado através de microfone convencional. Para isso, o modelo paramétrico Liljencrants-Fant (LF) foi desenvolvido para representar o sinal glotal em condições normais e patológicas. Tais parâmetros apresentam vantagens sobre medidas acústicas por possuírem características fisiológicas reais das pregas vocais. Assim, podendo ser empregados para identificação de doenças da laringe. Além da estimação dos parâmetros LF, no domínio do tempo (parâmetros T), a forma de onda da derivativa glotal também pôde ser quantificada através dos parâmetros identificados na literatura por parâmetros R (Rd, Ra, Rk e Rg), parâmetros quocientes Q (SQ, OQ, CQ, AQ e NAQ), parâmetros B1 e B2 que são as extensões de bandas do pulso derivativo LF, e o parâmetro ece, que relaciona os parâmetros β e Ta. Os parâmetros B1 e B2 e ece apesar de serem propostos na literatura, não são encontrados resultados diferentes a essas duas medidas. Os resultados mostraram que os parâmetros B não foram confiáveis na discriminação entre as vozes, por outro lado, o parâmetro ece mostrou-se ser opção na discriminação entre as vozes normais, nódulo e Reinke. O objetivo deste trabalho é direcionar a atenção sobre o sinal glotal, estimando-o automaticamente mediante técnicas de PDS aplicadas ao sinal de fala, visando extrair parâmetros que identifiquem as condições normais e patológicas da laringe. Por fim foram propostos os parâmetros TRp e TRs, visando dissociar os efeitos de primeira ordem dos de ordem superior na fase de retorno do pulso glotal com a finalidade de estimar a real não-linearidade do sub-sistema glotal, retratando as condições normais e patológicas da laringe. Por fim foram propostos os parâmetros TRp e TRs, visando dissociar os efeitos de primeira ordem dos de ordem superior na fase de retorno do pulso glotal com a finalidade de estimar a real não-linearidade do sub-sistema glotal, retratando as condições fisiológicas do movimento das pregas vocais. Com um nível de confiança de 95%, o parâmetro de primeira ordem (TRp) é efetivo na discriminação do Edema de Reinke, porém mostrou-se ineficaz na detecção do nódulo. Em relação ao parâmetro de ordem superior, conclui-se que o TRs é um excelente detetor de vozes patológicas (nódulo e Edema de Reinke), porém não é capaz de discriminar as patologias. / Many researches has been conducted in digital signal processing (DSP) atempting to evaluate the physiological conditions of larynx. Acoustical parameters have been proposed to evaluate the glotal tract from voice signal. One technique proposed is the Liljencrants-Fant model (LF) developed to represent normal and pathologic conditions of the larynx. Those parameters compare favourably as far as real physiologic characteristic of vocal folds is concerned. So, a primary use of the model is the larynx pathologic identification. Beyond LF parameters estimation, (T parameters in the time domain), the waveform of glotal pulse derivative also can be quantified through, R parameters (Rd, Ra, Rk and Rg), quocient parameters (SQ, OQ, CQ, AQ and NAQ), B parameters (B1 and B2) that are band extension of the LF glotal pulse derivative and the ece parameter that in fact, is a relationship between β and Ta. Although proposed in the literature, no results are found, related to B and ece parameters. Our founds show that B parameters do not present good results in voice discrimination, however, ece parameter seems to be good option to discriminate normal voice, nodulo and Reinke edema. The main purpose of this work is to estimate the glotal signal from the voice signal using DSP techniques in order to obtain parameters that identifies the physiological larynx condition. In order to estimate the shape of return phase of glotal pulse, twoparameters have been proposed in this work. The first one evaluates the pulse (TRp, in other words, the first order component of the return phase. The second is responsible to evaluate superior orders components of the return phase (TRs), i.e, the non-linear component of the glotal pulse. With 95% of confidence level, TRp is effective in Reinke edema discrimination however it is inefficient for nodule e dection. By the other hand, the TRs parameter works well to detect pathologic voice however is unable to discriminated them.
|
9 |
Využití tenchnologie GRID při zpracování medicínské informace / Utilization of GRID technology in processing of medical informationKulhánek, Tomáš January 2015 (has links)
This thesis focuses on selected areas of biomedical research in order to benefit from current computational infrastructures established in scientific community in european and global area. The theory of computation, parallelism and distributed computing, with focus on grid computing and cloud computing, is briefly introduced. Exchange of medical images was studied and a seamless integration of grid-based PACS system was established with the current distributed system in order to share DICOM medical images. Voice science was studied and access to real-time voice analysis application via remote desktop technology was introduced using customized protocol to transfer sound recording. This brings a possibility to access current legacy application remotely by voice specialists. The systems biology approach within domain of human physiology and pathophysiology was studied. Modeling methodology of human physiology was improved in order to build complex models based on acausal and object-oriented modeling techniques. Methods for conducting a parameter study (especially parameter estimation and parameter sweep) were introduced using grid computing and cloud computing technology. The identification of parameters gain substantial speedup by utilizing cloud computing deployment when performed on medium complex models of...
|
10 |
Využití tenchnologie GRID při zpracování medicínské informace / Utilization of GRID technology in processing of medical informationKulhánek, Tomáš January 2015 (has links)
This thesis focuses on selected areas of biomedical research in order to benefit from current computational infrastructures established in scientific community in european and global area. The theory of computation, parallelism and distributed computing, with focus on grid computing and cloud computing, is briefly introduced. Exchange of medical images was studied and a seamless integration of grid-based PACS system was established with the current distributed system in order to share DICOM medical images. Voice science was studied and access to real-time voice analysis application via remote desktop technology was introduced using customized protocol to transfer sound recording. This brings a possibility to access current legacy application remotely by voice specialists. The systems biology approach within domain of human physiology and pathophysiology was studied. Modeling methodology of human physiology was improved in order to build complex models based on acausal and object-oriented modeling techniques. Methods for conducting a parameter study (especially parameter estimation and parameter sweep) were introduced using grid computing and cloud computing technology. The identification of parameters gain substantial speedup by utilizing cloud computing deployment when performed on medium complex models of...
|
Page generated in 0.0735 seconds