Global ETD Search

1	Estimação da profundidade por meio da fusão de dados de energia visual de múltiplas câmeras Oliveira, Felipe Gomes de 25 July 2011 (has links) Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-06-18T14:37:14Z No. of bitstreams: 1 Dissertação - Felipe Gomes de Oliveira.pdf: 8063130 bytes, checksum: c3a3271d66779ae934aca4f3e28b999b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-19T21:07:20Z (GMT) No. of bitstreams: 1 Dissertação - Felipe Gomes de Oliveira.pdf: 8063130 bytes, checksum: c3a3271d66779ae934aca4f3e28b999b (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-19T21:10:24Z (GMT) No. of bitstreams: 1 Dissertação - Felipe Gomes de Oliveira.pdf: 8063130 bytes, checksum: c3a3271d66779ae934aca4f3e28b999b (MD5) / Made available in DSpace on 2015-06-19T21:10:24Z (GMT). No. of bitstreams: 1 Dissertação - Felipe Gomes de Oliveira.pdf: 8063130 bytes, checksum: c3a3271d66779ae934aca4f3e28b999b (MD5) Previous issue date: 2011-07-25 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This research presents a visual data fusion approach to recover dense depth map from sequences of images. The conventional methods to estimate depth map have many drawbacks with respect to environment illumination changes and camera positioning. We propose a Global optimization data fusion strategy to improve the measurements from stereo and focus depth maps. Di erent from typical stereo and focus fusion techniques, we use a single pair of stereo cameras to acquire series of images scenes without occlusion and illumination constraints. Then, we use Energy Functional fusion to associate the geometric coherence with multiple frames. In order to evaluate the results we de ned a metric using similarity measurements between traditional stereo and the proposed approach. The experiments are performed in real scene images, and the estimated mapping was superior than those found using traditional stereo methods, which demonstrates the good performance and robustness of our approach. / Este trabalho propõe uma abordagem de Fusão de Dados Visuais para estimar a estrutura tridimensional de uma cena a partir de sequências de imagens obtidas por meio de duas ou mais câmeras. Os métodos convencionais para estimar mapas de profundidade apresentam desvantagens relacionadas a mudanças na iluminação do ambiente e posicionamento de câmeras. Por essa razão, foi proposta uma estrategia de Fusão de Dados baseada em minimiza c~ao de energia para aprimorar as medições proporcionadas pela disparidade entre pixels de uma imagem e pela variação de foco. A abordagem proposta faz uso de uma rede distribuída de sensores visuais utilizando um par de câmeras estéreo sem restrições de oclusão ou iluminação no processo de captura de imagens. A função de energia foi usada para integrar múltiplos frames e inferir a coerência geométrica contida na cena. Para avaliar os resultados obtidos foram utilizadas métricas da literatura através de medições de similaridade entre técnicas de estéreo tradicionais e a estrategia desenvolvida. Os experimentos foram conduzidos a partir de imagens de cenas reais, e as informações de profundidade estimadas foram qualitativamente superior que os resultados obtidos pelos métodos tradicionais. Tais informações demonstram a qualidade dos resultados alcançados pela técnica proposta. Fusão de dados visuais Função de energia Shape from focus Shape from stereo 17 Visual data fusion Energy functional
2	Synchronous HMMs for audio-visual speech processing Dean, David Brendan January 2008 (has links) Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

Search results

Estimação da profundidade por meio da fusão de dados de energia visual de múltiplas câmeras

Synchronous HMMs for audio-visual speech processing