1 |
Soft AI methods and visual speech recognitionSaeed, Mehreen January 1999 (has links)
No description available.
|
2 |
Aspects of facial biometrics for verification of personal identityRamos Sanchez, M. Ulises January 2000 (has links)
No description available.
|
3 |
A novel lip geometry approach for audio-visual speech recognitionIbrahim, Zamri January 2014 (has links)
By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset.
|
4 |
Rastreamento labial: aplicação em leitura labial. / Lip tracking: an application in lip reading.Negreiros, Tupã 07 November 2012 (has links)
Novas interfaces homem-computador têm sido pesquisadas a fim de se tornarem mais naturais e flexíveis. O rastreamento labial, foco deste trabalho, é parte deste contexto, podendo ser utilizado na detecção de emoções, bem como no auxílio ao reconhecimento de voz. Pode assim tornar-se um módulo inicial para a leitura labial, na criação de interfaces voltadas a deficientes auditivos. Os algoritmos disponíveis na literatura foram analisados e comparados, mostrando os prós e contras de cada método. Finalmente foi escolhido desenvolver uma técnica baseada em Active Appearance Model (AAM). O AAM gera um modelo a partir de um conjunto de imagens de treinamento, que pode ser utilizado no rastreamento labial de novas imagens. A técnica proposta baseia-se no uso de algoritmos genéticos para o ajuste do modelo, diferente, portanto, da técnica proposta originalmente pelo AAM. A convergência da técnica proposta foi extensivamente analisada, com a variação de parâmetros, buscando a análise de erro residual da função custo e sua relação com o tempo de convergência e erro de posição. / New human-computer interfaces have been researched to make them more natural and flexible. The lip tracking, emphasis of this work, is part of this setting it can be used for detecting emotions and to aid the speech recognition. It may well become an initial module for lip-reading, in the creation of interfaces directed to hearing impaired people. Algorithms available on literature are analyzed and compared, explaining the upsides and downsides of each method. Finally was chose to develop a technique based on Active Appearance Model (AAM). The AAM generates a model from a set of training images, which can be used in tracking of new lip images. The proposed technique is based on the use of genetic algorithms for fitting the model, different therefore of the technique originally proposed by AAM. The convergence of the proposed technique has been extensively analyzed with the variation of parameters, searching the error analysis of the cost function residual error and its relation to the execution time and position error.
|
5 |
Perceptual Evaluation of Video-Realistic SpeechGeiger, Gadi, Ezzat, Tony, Poggio, Tomaso 28 February 2003 (has links)
abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
|
6 |
Rastreamento labial: aplicação em leitura labial. / Lip tracking: an application in lip reading.Tupã Negreiros 07 November 2012 (has links)
Novas interfaces homem-computador têm sido pesquisadas a fim de se tornarem mais naturais e flexíveis. O rastreamento labial, foco deste trabalho, é parte deste contexto, podendo ser utilizado na detecção de emoções, bem como no auxílio ao reconhecimento de voz. Pode assim tornar-se um módulo inicial para a leitura labial, na criação de interfaces voltadas a deficientes auditivos. Os algoritmos disponíveis na literatura foram analisados e comparados, mostrando os prós e contras de cada método. Finalmente foi escolhido desenvolver uma técnica baseada em Active Appearance Model (AAM). O AAM gera um modelo a partir de um conjunto de imagens de treinamento, que pode ser utilizado no rastreamento labial de novas imagens. A técnica proposta baseia-se no uso de algoritmos genéticos para o ajuste do modelo, diferente, portanto, da técnica proposta originalmente pelo AAM. A convergência da técnica proposta foi extensivamente analisada, com a variação de parâmetros, buscando a análise de erro residual da função custo e sua relação com o tempo de convergência e erro de posição. / New human-computer interfaces have been researched to make them more natural and flexible. The lip tracking, emphasis of this work, is part of this setting it can be used for detecting emotions and to aid the speech recognition. It may well become an initial module for lip-reading, in the creation of interfaces directed to hearing impaired people. Algorithms available on literature are analyzed and compared, explaining the upsides and downsides of each method. Finally was chose to develop a technique based on Active Appearance Model (AAM). The AAM generates a model from a set of training images, which can be used in tracking of new lip images. The proposed technique is based on the use of genetic algorithms for fitting the model, different therefore of the technique originally proposed by AAM. The convergence of the proposed technique has been extensively analyzed with the variation of parameters, searching the error analysis of the cost function residual error and its relation to the execution time and position error.
|
7 |
Intensive Auditory Comprehension Treatment for People with Severe Aphasia: Outcomes and Use of Self-Directed StrategiesKnollman-Porter, Kelly 05 October 2012 (has links)
No description available.
|
8 |
A Study of Accumulation Times in Translation from Event Streams to Video for the Purpose of Lip Reading / En studie av ackumuleringstid i översättning från eventstreams till video för användning inom läppläsningMunther, Didrik, Puustinen, David January 2022 (has links)
Visually extracting textual context from lips consists of pattern matching which results in a frequent use of machine learning approaches for the task of classification. Previous research has consisted of mostly audiovisual (multi modal) approaches and conventional cameras. This study isolates the visual medium and uses event-based cameras instead of conventional cameras. Classifying visual features is computationally expensive and the minimisation of excessive data can be of importance for performance which motivates the usage of event cameras. Event cameras are inspired by the biological vision and only capture changes in the scene while offering high temporal resolution (corresponding to frame rate for conventional cameras). This study investigates the importance of temporal resolution for the task of lip reading by modifying the ∆time used for collecting events. No correlation could be observed within the collected data set. The paper is not able to come to any conclusions regarding suitability of the chosen approach for the particular application. There are multiple other variables that could effect the results which makes it hard to dismiss the technology’s potential within the domain. / Visuell bedömning av vilka ord läppar talar består av mönstermatchning vilket resulterar i att maskininlärning ofta används för att klassificera data som text. Tidigare studier har i hög grad varit audiovisuella(multimodala) och konventionella kameror. Visuell analys är beräkningsmässigt dyrt vilket motiverar en minimering av överflödig data för att öka prestandan, vilket motiverar användningen av eventkameror. Eventkameror är inspirerade av biologisk syn och registrerar endast skillnaden i omgivningen, samtidigt som de har en hög tidsupplösning (motsvarande frame rate för konventionella kameror). Studien undersöker relevansen av tidsupplösning för maskinell läppläsning genom att modifiera ∆time som används för att samla events. Ingen korrelation mellan ∆time och träffsäkerheten kunde observeras med det dataset som användes. Studien kan inte avfärda potentialen för tekniken eftersom det finns många fler parametrar som kan påverka träffsäkerheten.
|
9 |
Estudo dimensional de características aplicadas à leitura labial automáticaMadureira, Fillipe Levi Guedes 31 August 2018 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This work is a study of the relationship between the intrinsic dimension of feature vectors
applied to the classification of video signals in order to perform lip reading. In pattern
recognition tasks, the extraction of relevant features is crucial for a good performance
of the classifiers. The starting point of this work was the reproduction of the work of
J.R. Movellan [1], which classifies lips gestures with HMM using only the video signal
from the Tulips1 database. The database consists of videos of volunteers’ mouths while
they utter the first 4 numerals in English. The original work uses feature vectors of high
dimensionality in relation to the size of the database. Consequently, the adjustment of
HMM classifiers has become problematic and the maximum accuracy was only 66.67%.
Alternative strategies for feature extraction and classification schemes were proposed in
order to analyze the influence of the intrinsic dimension in the performance of classifiers.
The best solution, in terms of results, achieved an accuracy of approximately 83%. / Este trabalho é um estudo da relação entre a dimensão intrínseca de vetores de características
aplicados à classificação de sinais de vídeo no intuito de realizar-se a leitura
labial. Nas tarefas de reconhecimento de padrões, a extração de características relevantes
é crucial para um bom desempenho dos classificadores. O ponto de partida deste trabalho
foi a reprodução do trabalho de J.R. Movellan [1], que realiza a classificação de gestos
labiais com HMM na base de dados Tulips1, utilizando somente o sinal de vídeo. A base é
composta por vídeos das bocas de voluntários enquanto esses pronunciam os primeiros 4
numerais em inglês. O trabalho original utiliza vetores de características de dimensão muito
alta em relação ao tamanho da base. Consequentemente, o ajuste de classificadores HMM
se tornou problemático e só se alcançou 66,67% de acurácia. Estratégias de extração de
características e esquemas de classificação alternativos foram propostos, a fim de analisar
a influência da dimensão intrínseca no desempenho de classificadores. A melhor solução,
em termos de resultados, obteve uma acurácia de aproximadamente 83%. / São Cristóvão, SE
|
10 |
Severe, Chronic Auditory Comprehension Deficits: An Intensive Treatment and Cueing ProtocolGroh, Ellen Louise 08 May 2012 (has links)
No description available.
|
Page generated in 0.0718 seconds