Spelling suggestions: "subject:"speakers recognition systems""
1 |
Text-Independent Speaker Recognition Using Source Based FeaturesWildermoth, Brett Richard, n/a January 2001 (has links)
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
|
2 |
Verificação de locutores independente de texto: uma análise de robustez a ruídoPINHEIRO, Hector Natan Batista 25 February 2015 (has links)
Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-11-08T19:13:18Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5) / Made available in DSpace on 2016-11-08T19:13:18Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5)
Previous issue date: 2015-02-25 / O processo de identificação de um determinado indivíduo é realizado milhões de vezes,
todos os dias, por organizações dos mais diversos setores. Perguntas como "Quem
é esse indivíduo?" ou "É essa pessoa quem ela diz ser?" são realizadas frequentemente
por organizações financeiras, sistemas de saúde, sistemas de comércio eletrônico, sistemas
de telecomunicações e por instituições governamentais. Identificação biométrica diz
respeito ao processo de realizar essa identificação a partir de características físicas ou
comportamentais. Tais características são comumente referenciadas como características
biométricas e alguns exemplos delas são: face, impressão digital, íris, assinatura e voz.
Reconhecimento de locutores é uma modalidade biométrica que se propõe a realizar o
processo de identificação pessoal a partir das informações presentes unicamente na voz do
indivíduo. Este trabalho foca no desenvolvimento de sistemas de verificação de locutores
independente de texto. O principal desafio no desenvolvimento desses sistemas provém
das chamadas incompatibilidades que podem ocorrer na aquisição dos sinais de voz. As
técnicas propostas para suavizá-las são chamadas de técnicas de compensação e três são
os domínios onde elas podem operar: no processo de extração de características do sinal,
na construção dos modelos dos locutores e no cálculo do score final do sistema. Além de
apresentar uma vasta revisão da literatura do desenvolvimento de sistemas de verificação
de locutores independentes de texto, esse trabalho também apresenta as principais técnicas
de compensação de características, modelos e scores. Na fase de experimentação, uma
análise comparativa das principais técnicas propostas na literatura é apresentada. Além
disso, duas técnicas de compensação são propostas, uma do domínio de modelagem e
outra do domínio dos scores. A técnica de compensação de score proposta é baseada na
Distribuição Normal Acumulada e apresentou, em alguns contextos, resultados superiores
aos apresentados pelas principais técnicas da literatura. Já a técnica de compensação de
modelo é baseada em uma técnica da literatura que combina dois conceitos: treinamento
multi-condicional e Teoria dos Dados Ausentes (Missing Data Theory). A formulação
apresentada pelos autores é baseada nos chamados Modelos de União a Posteriori (Posterior
Union Models), mas não é completamente adequada para verificação de locutores
independente de texto. Este trabalho apresenta uma formulação apropriada para esse
contexto que combina os dois conceitos utilizados pelos autores com um tipo de modelagem
utilizando UBMs (Universal Background Models). A técnica proposta apresentou ganhos
de desempenhos quando comparada à técnica-padrão GMM-UBM, baseada em Modelos
de Misturas Gaussianas (GMMs). / The personal identification of individuals is a task executed millions of times every day
by organizations from diverse fields. Questions such as "Who is this individual?" or "Is
this person who he or she claims to be?" are constantly made by organizations in financial
services, health care, e-commerce, telecommunication systems and governments. Biometric
identification is the process of identifying people using their physiological or behavioral
characteristics. These characteristics are generally known as biometrics and examples
of these include face, fingerprint, iris, handwriting and speech. Speaker recognition is
a biometric modality which makes the personal identification by using speaker-specific
information from the speech. This work focuses on the development of text-independent
speaker verification systems. In these systems, speech from an individual is used to verify the
claimed identity of that individual. Furthermore, the verification must occur independently
of the pronounced word or phrase. The main challenge in the development of speaker
recognition systems comes from the mismatches which may occur in the acquisition of
the speech signals. The techniques proposed to mitigate the mismatch effects are referred
as compensation methods. They may operate in three domains: in the feature extraction
process, in the estimation of the speaker models and in the computation of the decision
score. Besides presenting a wide description of the main techniques used in the development
of text-independent speaker verification systems, this work presents the description of
the main feature-, model- and score-based compensation methods. In the experiments,
this work shows comprehensive comparisons between the conventional techniques and
the alternatively compensations methods. Furthermore, two compensation methods are
proposed: one operates in the model domain and the other in the score-domain. The scoredomain
proposed compensation method is based on the Normal cumulative distribution
function and, in some contexts, outperformed the performance of the main score-domain
compensation techniques. On the other hand, the model-domain compensation technique
proposed in this work is based on a method presented in the literature which combines
two concepts: the multi-condition training and the Missing Data Theory. The formulation
proposed by the authors is based on the Posterior Union models and is not completely
appropriate for the text-independent speaker verification task. This work proposes a more
appropriate formulation for this context which combines the concepts used by the authors
with a type of modeling using Universal Background Models (UBMs). The proposed
method outperformed the usual GMM-UBM modeling technique, based on Gaussian
Mixture Models (GMMs).
|
Page generated in 0.0795 seconds