Spelling suggestions: "subject:"text independent"" "subject:"next independent""
1 |
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker RecognitionWang, Long-Cheng 07 September 2006 (has links)
A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers¡¦ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.
|
2 |
Optimizing text-independent speaker recognition using an LSTM neural networkLarsson, Joel January 2014 (has links)
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.
|
3 |
Verificação de locutores independente de texto: uma análise de robustez a ruídoPINHEIRO, Hector Natan Batista 25 February 2015 (has links)
Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-11-08T19:13:18Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5) / Made available in DSpace on 2016-11-08T19:13:18Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5)
Previous issue date: 2015-02-25 / O processo de identificação de um determinado indivíduo é realizado milhões de vezes,
todos os dias, por organizações dos mais diversos setores. Perguntas como "Quem
é esse indivíduo?" ou "É essa pessoa quem ela diz ser?" são realizadas frequentemente
por organizações financeiras, sistemas de saúde, sistemas de comércio eletrônico, sistemas
de telecomunicações e por instituições governamentais. Identificação biométrica diz
respeito ao processo de realizar essa identificação a partir de características físicas ou
comportamentais. Tais características são comumente referenciadas como características
biométricas e alguns exemplos delas são: face, impressão digital, íris, assinatura e voz.
Reconhecimento de locutores é uma modalidade biométrica que se propõe a realizar o
processo de identificação pessoal a partir das informações presentes unicamente na voz do
indivíduo. Este trabalho foca no desenvolvimento de sistemas de verificação de locutores
independente de texto. O principal desafio no desenvolvimento desses sistemas provém
das chamadas incompatibilidades que podem ocorrer na aquisição dos sinais de voz. As
técnicas propostas para suavizá-las são chamadas de técnicas de compensação e três são
os domínios onde elas podem operar: no processo de extração de características do sinal,
na construção dos modelos dos locutores e no cálculo do score final do sistema. Além de
apresentar uma vasta revisão da literatura do desenvolvimento de sistemas de verificação
de locutores independentes de texto, esse trabalho também apresenta as principais técnicas
de compensação de características, modelos e scores. Na fase de experimentação, uma
análise comparativa das principais técnicas propostas na literatura é apresentada. Além
disso, duas técnicas de compensação são propostas, uma do domínio de modelagem e
outra do domínio dos scores. A técnica de compensação de score proposta é baseada na
Distribuição Normal Acumulada e apresentou, em alguns contextos, resultados superiores
aos apresentados pelas principais técnicas da literatura. Já a técnica de compensação de
modelo é baseada em uma técnica da literatura que combina dois conceitos: treinamento
multi-condicional e Teoria dos Dados Ausentes (Missing Data Theory). A formulação
apresentada pelos autores é baseada nos chamados Modelos de União a Posteriori (Posterior
Union Models), mas não é completamente adequada para verificação de locutores
independente de texto. Este trabalho apresenta uma formulação apropriada para esse
contexto que combina os dois conceitos utilizados pelos autores com um tipo de modelagem
utilizando UBMs (Universal Background Models). A técnica proposta apresentou ganhos
de desempenhos quando comparada à técnica-padrão GMM-UBM, baseada em Modelos
de Misturas Gaussianas (GMMs). / The personal identification of individuals is a task executed millions of times every day
by organizations from diverse fields. Questions such as "Who is this individual?" or "Is
this person who he or she claims to be?" are constantly made by organizations in financial
services, health care, e-commerce, telecommunication systems and governments. Biometric
identification is the process of identifying people using their physiological or behavioral
characteristics. These characteristics are generally known as biometrics and examples
of these include face, fingerprint, iris, handwriting and speech. Speaker recognition is
a biometric modality which makes the personal identification by using speaker-specific
information from the speech. This work focuses on the development of text-independent
speaker verification systems. In these systems, speech from an individual is used to verify the
claimed identity of that individual. Furthermore, the verification must occur independently
of the pronounced word or phrase. The main challenge in the development of speaker
recognition systems comes from the mismatches which may occur in the acquisition of
the speech signals. The techniques proposed to mitigate the mismatch effects are referred
as compensation methods. They may operate in three domains: in the feature extraction
process, in the estimation of the speaker models and in the computation of the decision
score. Besides presenting a wide description of the main techniques used in the development
of text-independent speaker verification systems, this work presents the description of
the main feature-, model- and score-based compensation methods. In the experiments,
this work shows comprehensive comparisons between the conventional techniques and
the alternatively compensations methods. Furthermore, two compensation methods are
proposed: one operates in the model domain and the other in the score-domain. The scoredomain
proposed compensation method is based on the Normal cumulative distribution
function and, in some contexts, outperformed the performance of the main score-domain
compensation techniques. On the other hand, the model-domain compensation technique
proposed in this work is based on a method presented in the literature which combines
two concepts: the multi-condition training and the Missing Data Theory. The formulation
proposed by the authors is based on the Posterior Union models and is not completely
appropriate for the text-independent speaker verification task. This work proposes a more
appropriate formulation for this context which combines the concepts used by the authors
with a type of modeling using Universal Background Models (UBMs). The proposed
method outperformed the usual GMM-UBM modeling technique, based on Gaussian
Mixture Models (GMMs).
|
Page generated in 0.0702 seconds