Global ETD Search

21	[en] REAL-TIME LABEL VISUALIZATION IN MASSIVE MODELS OBJECTS / [pt] VISUALIZAÇÃO DE RÓTULOS EM OBJETOS DE MODELOS MASSIVOS EM TEMPO REAL RENATO DERIS PRADO 11 October 2013 (has links) [pt] Rótulos virtuais são utilizados em aplicações de computação gráfica para representar informações textuais dispostas sobre superfícies geométricas. Tais informações consistem em nomes, numerações, ou outros dados relevantes que precisem ser notados rapidamente quando um usuário examina os objetos da cena. Este trabalho tem como foco os chamados modelos massivos, como modelos CAD (Computer Aided Design) de refinarias de petróleo, os quais possuem um grande número de primitivas geométricas cujo rendering apresenta um alto custo computacional. Em grandes projetos de engenharia, é desejável a visualização imediata de informações específicas de cada objeto ou de partes do modelo, as quais, se exibidas por meio de técnicas convencionais de texturização podem extrapolar os recursos computacionais disponíveis. Nesta dissertação desenvolvemos uma forma de exibir, em tempo real, rótulos virtuais com informações distintas, nas superfícies de objetos de modelos massivos. A técnica é implementada inteiramente em GPU, não apresenta perda significativa de desempenho e possui um baixo gasto de memória. Os objetos de modelos CAD são o foco principal do trabalho, apesar de a solução poder ser utilizada em outros tipos de objetos desde que suas coordenadas de textura sejam corretamente ajustadas. / [en] Virtual Labels are used in computer graphics applications to represent textual information arranged on geometric surfaces. Such information consists of names, numbering, or other relevant data that need to be noticed quickly when a user scans the objects in the scene. This paper focuses on the so-called massive models, as CAD models (Computer Aided Design) of oil refineries, which have a large number of geometric primitives whose rendering presents a high computational cost. In large engineering projects, the immediate visualization of information specific to each object or parts of the model is desirable, which, if displayed by conventional texturing techniques can extrapolate the available computational resources. In this work we have developed a way to view, in real time, virtual labels with different information on the surfaces of objects in massive models. The technique is implemented entirely on the GPU, shows no significant loss of performance and low memory cost. CAD models objects are the main focus of the work, although the solution can be used in other types of objects once their texture coordinates are adjusted correctly. [pt] ROTULOS VIRTUAIS [en] VIRTUAL LABELS [pt] ROTULACAO DE SUPERFICIES [en] SURFACE LABELING [pt] VISUALIZACAO EM TEMPO REAL [en] REAL TIME VISUALIZATION [pt] PROGRAMACAO EM GPU [en] GPU PROGRAMMING
22	Real-time shadow detection and removal in aerial motion imagery application Silva, Guilherme Fr?es 14 August 2017 (has links) Submitted by PPG Engenharia El?trica (engenharia.pg.eletrica@pucrs.br) on 2017-11-08T12:48:45Z No. of bitstreams: 1 Guilherme_Silva_Dissertacao.pdf: 5806780 bytes, checksum: dd97b70975650b889aaa277b7e6f2b19 (MD5) / Approved for entry into archive by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-11-17T12:39:26Z (GMT) No. of bitstreams: 1 Guilherme_Silva_Dissertacao.pdf: 5806780 bytes, checksum: dd97b70975650b889aaa277b7e6f2b19 (MD5) / Made available in DSpace on 2017-11-17T12:50:25Z (GMT). No. of bitstreams: 1 Guilherme_Silva_Dissertacao.pdf: 5806780 bytes, checksum: dd97b70975650b889aaa277b7e6f2b19 (MD5) Previous issue date: 2017-08-14 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES Shadow Detection Shadow Removal GPU Programming Wide-Area Motion Imagery System Detec??o de Sombras Remo??o de Sombras Programa??o de GPUs Sistema de Imageamento de ?reas Amplas ENGENHARIAS
23	Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual Walkthroughs Brunton, Alan P 28 November 2012 (has links) This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data. multi-scale wavelets stereo reconstruction omnidirectional vision real-time novel view synthesis real-time virtual walkthroughs spherical parameterizations spherical harmonics GPU programming
24	Contributions to parallel stochastic simulation : application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte Carlo simulations / Contributions à la simulation stochastique parallèle : architectures logicielles pour la distribution de flux pseudo-aléatoires dans les simulations Monte Carlo sur CPU/GPU Passerat-Palmbach, Jonathan 11 October 2013 (has links) Résumé non disponible / The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all the tools must be reimplemented as well. In the particular case of stochastic simulations, one of the major element of the implementation is the pseudorandom numbers source. Employing pseudorandom numbers in parallel applications is not a straightforward task, and it has to be done with caution in order not to introduce biases in the results of the simulation. This problematic has been studied since parallel architectures are available and is called pseudorandom stream distribution. While the literature is full of solutions to handle pseudorandom stream distribution on CPU-based parallel platforms, the young GPU programming community cannot display the same experience yet.In this thesis, we study how to correctly distribute pseudorandom streams on GPU. From the existing solutions, we identified a need for good software engineering solutions, coupled to sound theoretical choices in the implementation. We propose a set of guidelines to follow when a PRNG has to be ported to GPU, and put these advice into practice in a software library called ShoveRand. This library is used in a stochastic Polymer Folding model that we have implemented in C++/CUDA. Pseudorandom streams distribution on manycore architectures is also one of our concerns. It resulted in a contribution named TaskLocalRandom, which targets parallel Java applications using pseudorandom numbers and task frameworks.Eventually, we share a reflection on the methods to choose the right parallel platform for a given application. In this way, we propose to automatically build prototypes of the parallel application running on a wide set of architectures. This approach relies on existing software engineering tools from the Java and Scala community, most of them generating OpenCL source code from a high-level abstraction layer. Pseudorandom Number Generation (PRNG) High Performance Computing (HPC) Software Engineering Stochastic Simulation Graphics Processing Units (GPUs) GPU Programming Automatic Parallelization
25	[en] INTERACTIVE IMAGE-BASED RENDERING FOR VIRTUAL VIEW SYNTHESIS FROM DEPTH IMAGES / [pt] RENDERIZAÇÃO INTERATIVA BASEADA EM IMAGENS PARA SÍNTESE DE VISTAS VIRTUAIS A PARTIR DE IMAGENS COM PROFUNDIDADE CESAR MORAIS PALOMO 19 September 2017 (has links) [pt] Modelagem e renderização baseadas em imagem tem sido uma área de pesquisa muito ativa nas últimas décadas, tendo recebido grande atenção como uma alternativa às técnicas tradicionais de síntese de imagens baseadas primariamente em geometria. Nesta área, algoritmos de visão computacional são usados para processar e interpretar fotos ou vídeos do mundo real a fim de construir um modelo representativo de uma cena, ao passo que técnicas de computação gráfica são usadas para tomar proveito desta representação e criar cenas foto-realistas. O propósito deste trabalho é investigar técnicas de renderização capazes de gerar vistas virtuais de alta qualidade de uma cena, em tempo real. Para garantir a performance interativa do algoritmo, além de aplicar otimizações a métodos de renderização existentes, fazemos uso intenso da GPU para o processamento de geometria e das imagens para gerar as imagens finais. Apesar do foco deste trabalho ser a renderização, sem reconstruir o mapa de profundidade a partir das fotos, ele implicitamente contorna possíveis problemas na estimativa da profundidade para que as cenas virtuais geradas apresentem um nível aceitável de realismo. Testes com dados públicos são apresentados para validar o método proposto e para ilustrar deficiências dos métodos de renderização baseados em imagem em geral. / [en] Image-based modeling and rendering has been a very active research topic as a powerful alternative to traditional geometry-based techniques for image synthesis. In this area, computer vision algorithms are used to process and interpret real-world photos or videos in order to build a model of a scene, while computer graphics techniques use this model to create photorealistic images based on the captured photographs or videos. The purpose of this work is to investigate rendering techniques capable of delivering visually accurate virtual views of a scene in real-time. Even though this work is mainly focused on the rendering task, without the reconstruction of the depth map, it implicitly overcomes common errors in depth estimation, yielding virtual views with an acceptable level of realism. Tests with publicly available datasets are also presented to validate our framework and to illustrate some limitations in the IBR general approach. [pt] PROGRAMACAO EM PLACAS GRAFICAS [en] GPU PROGRAMMING [pt] COMPOSICAO [en] COMPOSITION [pt] MAPA DE PROFUNDIDADE [en] DEPTH MAP [pt] RENDERIZACAO BASEADA EM IMAGENS [en] IMAGE-BASED RENDERING
26	[en] INTERACTIVE VOLUME VISUALIZATION OF UNSTRUCTURED MESHES USING PROGRAMMABLE GRAPHICS CARDS / [pt] VISUALIZAÇÃO VOLUMÉTRICA INTERATIVA DE MALHAS NÃO-ESTRUTURADAS UTILIZANDO PLACAS GRÁFICAS PROGRAMÁVEIS RODRIGO DE SOUZA LIMA ESPINHA 15 June 2005 (has links) [pt] A visualização volumétrica é uma importante técnica para a exploração de dados tridimensionais complexos, como, por exemplo, o resultado de análises numéricas usando o método dos elementos finitos. A aplicação eficiente dessa técnica a malhas não-estruturadas tem sido uma importante área de pesquisa nos últimos anos. Há dois métodos básicos para a visualização dos dados volumétricos: extração de superfícies e renderização direta de volumes. Na primeira, iso-superfícies de um campo escalar são extraídas explicitamente. Na segunda, que é a utilizada neste trabalho, dados escalares são classificados a partir de uma função de transferência, que mapeia valores do campo escalar em cor e opacidade, para serem visualizados. Com a evolução das placas gráficas (GPU) dos computadores pessoais, foram desenvolvidas novas técnicas para visualização volumétrica interativa de malhas não-estruturadas. Os novos algoritmos tiram proveito da aceleração e da possibilidade de programação dessas placas, cujo poder de processamento cresce a um ritmo superior ao dos processadores convencionais (CPU). Este trabalho avalia e compara dois algoritmos para visualização volumétrica de malhas não-estruturadas, baseados em GPU: projeção de células independente do observador e traçado de raios. Adicionalmente, são propostas duas adaptações dos algoritmos estudados. Para o algoritmo de projeção de células, propõe-se uma estruturação dos dados na GPU para eliminar o alto custo de transferência de dados para a placa gráfica. Para o algoritmo de traçado de raios, propõe-se fazer a integração da função de transferência na GPU, melhorando a qualidade da imagem final obtida e permitindo a alteração da função de transferência de maneira interativa. / [en] Volume visualization is an important technique for the exploration of threedimensional complex data sets, such as the results of numerical analysis using the finite elements method. The efficient application of this technique to unstructured meshes has been an important area of research in the past few years. There are two basic methods to visualize volumetric data: surface extraction and direct volume rendering. In the first, the iso-surfaces of the scalar field are explicitly extracted. In the second, which is the one used in this work, scalar data are classified by a transfer function, which maps the scalar values to color and opacity, to be visualized. With the evolution of personal computer graphics cards (GPU), new techniques for volume visualization have been developed. The new algorithms take advantage of modern programmable graphics cards, whose processing power increases at a faster rate than the one observed in conventional processors (CPU). This work evaluates and compares two GPU- based algorithms for volume visualization of unstructured meshes: view- independent cell projection (VICP) and ray-tracing. In addition, two adaptations of the studied algorithms are proposed. For the cell projection algorithm, we propose a GPU data structure in order to eliminate the high costs of the CPU to GPU data transfer. For the raytracing algorithm, we propose to integrate the transfer function in the GPU, which increases the quality of the generated image and allows to interactively change the transfer function. [pt] FUNCOES DE TRANSFERENCIA [en] TRANSFER FUNCTIONS [pt] VISUALIZACAO VOLUMETRICA [en] VOLUME RENDERING [pt] VISUALIZACAO INTERATIVA [en] INTERACTIVE VISUALIZATION [pt] PROGRAMACAO EM PLACAS GRAFICAS [en] GPU PROGRAMMING [pt] MALHAS NAO ESTRUTURADAS [en] UNSTRUCTURED MESHES
27	Multi-scale Methods for Omnidirectional Stereo with Application to Real-time Virtual Walkthroughs Brunton, Alan P January 2012 (has links) This thesis addresses a number of problems in computer vision, image processing, and geometry processing, and presents novel solutions to these problems. The overarching theme of the techniques presented here is a multi-scale approach, leveraging mathematical tools to represent images and surfaces at different scales, and methods that can be adapted from one type of domain (eg., the plane) to another (eg., the sphere). The main problem addressed in this thesis is known as stereo reconstruction: reconstructing the geometry of a scene or object from two or more images of that scene. We develop novel algorithms to do this, which work for both planar and spherical images. By developing a novel way to formulate the notion of disparity for spherical images, we are able effectively adapt our algorithms from planar to spherical images. Our stereo reconstruction algorithm is based on a novel application of distance transforms to multi-scale matching. We use matching information aggregated over multiple scales, and enforce consistency between these scales using distance transforms. We then show how multiple spherical disparity maps can be efficiently and robustly fused using visibility and other geometric constraints. We then show how the reconstructed point clouds can be used to synthesize a realistic sequence of novel views, images from points of view not captured in the input images, in real-time. Along the way to this result, we address some related problems. For example, multi-scale features can be detected in spherical images by convolving those images with a filterbank, generating an overcomplete spherical wavelet representation of the image from which the multiscale features can be extracted. Convolution of spherical images is much more efficient in the spherical harmonic domain than in the spatial domain. Thus, we develop a GPU implementation for fast spherical harmonic transforms and frequency domain convolutions of spherical images. This tool can also be used to detect multi-scale features on geometric surfaces. When we have a point cloud of a surface of a particular class of object, whether generated by stereo reconstruction or by some other modality, we can use statistics and machine learning to more robustly estimate the surface. If we have at our disposal a database of surfaces of a particular type of object, such as the human face, we can compute statistics over this database to constrain the possible shape a new surface of this type can take. We show how a statistical spherical wavelet shape prior can be used to efficiently and robustly reconstruct a face shape from noisy point cloud data, including stereo data. multi-scale wavelets stereo reconstruction omnidirectional vision real-time novel view synthesis real-time virtual walkthroughs spherical parameterizations spherical harmonics GPU programming
28	Hyperspectral Image Analysis Algorithm for Characterizing Human Tissue Wondim, Yonas kassaw January 2011 (has links) AbstractIn the field of Biomedical Optics measurement of tissue optical properties, like absorption, scattering, and reduced scattering coefficient, has gained importance for therapeutic and diagnostic applications. Accuracy in determining the optical properties is of vital importance to quantitatively determine chromophores in tissue.There are different techniques used to quantify tissue chromophores. Reflectance spectroscopy is one of the most common methods to rapidly and accurately characterize the blood amount and oxygen saturation in the microcirculation. With a hyper spectral imaging (HSI) device it is possible to capture images with spectral information that depends both on tissue absorption and scattering. To analyze this data software that accounts for both absorption and scattering event needs to be developed.In this thesis work an HSI algorithm, capable of assessing tissue oxygenation while accounting for both tissue absorption and scattering, is developed. The complete imaging system comprises: a light source, a liquid crystal tunable filter (LCTF), a camera lens, a CCD camera, control units and power supply for light source and filter, and a computer.This work also presents a Graphic processing Unit (GPU) implementation of the developed HSI algorithm, which is found computationally demanding. It is found that the GPU implementation outperforms the Matlab “lsqnonneg” function by the order of 5-7X.At the end, the HSI system and the developed algorithm is evaluated in two experiments. In the first experiment the concentration of chromophores is assessed while occluding the finger tip. In the second experiment the skin is provoked by UV light while checking for Erythema development by analyzing the oxyhemoglobin image at different point of time. In this experiment the melanin concentration change is also checked at different point of time from exposure.It is found that the result matches the theory in the time dependent change of oxyhemoglobin and deoxyhemoglobin. However, the result of melanin does not correspond to the theoretically expected result. Hyper spectral image analysis GPU programming Bio optics CUDA programming Göran Salerud Marcus Larsson Yonas Kassaw Yonas K. Wondim Non negative Least square analysis tissue oxygenation melanin concentration Ethiopia Debub University Linköping University Ethiopians in Sweden
29	Audiovisual voice activity detection and localization of simultaneous speech sources / Detecção de atividade de voz e localização de fontes sonoras simultâneas utilizando informações audiovisuais Minotto, Vicente Peruffo January 2013 (has links) Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Detecção de Atividade de Voz (Voice Activity Detection - VAD) e Localização de Fontes Sonoras (Sound Source Localization - SSL). Consequentemente, VAD e SSL também servem como ferramentas mandatórias de pré-processamento em aplicações de Interfaces Humano-Computador (Human Computer Interface - HCI), como no caso de reconhecimento automático de voz e identificação de locutor. Entretanto, VAD e SSL ainda são problemas desafiadores quando se lidando com cenários acústicos realísticos, particularmente na presença de ruído, reverberação e locutores simultâneos. Neste trabalho, são propostas abordagens para tratar tais problemas, para os casos de uma e múltiplas fontes sonoras, através do uso de informações audiovisuais, explorando-se variadas maneiras de se fundir as modalidades de áudio e vídeo. Este trabalho também emprega um arranjo de microfones para o processamento de som, o qual permite que as informações espaciais dos sinais acústicos sejam exploradas através do algoritmo estado-da-arte SRP (Steered Response Power). Por consequência adicional, uma eficiente implementação em GPU do SRP foi desenvolvida, possibilitando processamento em tempo real do algoritmo. Os experimentos realizados mostram uma acurácia média de 95% ao se efetuar VAD de até três locutores simultâneos, e um erro médio de 10cm ao se localizar tais locutores. / Given the tendency of creating interfaces between human and machines that increasingly allow simple ways of interaction, it is only natural that research effort is put into techniques that seek to simulate the most conventional mean of communication humans use: the speech. In the human auditory system, voice is automatically processed by the brain in an effortless and effective way, also commonly aided by visual cues, such as mouth movement and location of the speakers. This processing done by the brain includes two important components that speech-based communication require: Voice Activity Detection (VAD) and Sound Source Localization (SSL). Consequently, VAD and SSL also serve as mandatory preprocessing tools for high-end Human Computer Interface (HCI) applications in a computing environment, as the case of automatic speech recognition and speaker identification. However, VAD and SSL are still challenging problems when dealing with realistic acoustic scenarios, particularly in the presence of noise, reverberation and multiple simultaneous speakers. In this work we propose some approaches for tackling these problems using audiovisual information, both for the single source and the competing sources scenario, exploiting distinct ways of fusing the audio and video modalities. Our work also employs a microphone array for the audio processing, which allows the spatial information of the acoustic signals to be explored through the stateof- the art method Steered Response Power (SRP). As an additional consequence, a very fast GPU version of the SRP is developed, so that real-time processing is achieved. Our experiments show an average accuracy of 95% when performing VAD of up to three simultaneous speakers and an average error of 10cm when locating such speakers. Reconhecimento : Padroes Reconhecimento : Voz Voz computacional Tempo real Voice activity detection Sound source localization Multiple speakers Competing sources Multimodal fusion Microphone array HiddenMarkov model Support vector machine GPU programming
30	Audiovisual voice activity detection and localization of simultaneous speech sources / Detecção de atividade de voz e localização de fontes sonoras simultâneas utilizando informações audiovisuais Minotto, Vicente Peruffo January 2013 (has links) Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Detecção de Atividade de Voz (Voice Activity Detection - VAD) e Localização de Fontes Sonoras (Sound Source Localization - SSL). Consequentemente, VAD e SSL também servem como ferramentas mandatórias de pré-processamento em aplicações de Interfaces Humano-Computador (Human Computer Interface - HCI), como no caso de reconhecimento automático de voz e identificação de locutor. Entretanto, VAD e SSL ainda são problemas desafiadores quando se lidando com cenários acústicos realísticos, particularmente na presença de ruído, reverberação e locutores simultâneos. Neste trabalho, são propostas abordagens para tratar tais problemas, para os casos de uma e múltiplas fontes sonoras, através do uso de informações audiovisuais, explorando-se variadas maneiras de se fundir as modalidades de áudio e vídeo. Este trabalho também emprega um arranjo de microfones para o processamento de som, o qual permite que as informações espaciais dos sinais acústicos sejam exploradas através do algoritmo estado-da-arte SRP (Steered Response Power). Por consequência adicional, uma eficiente implementação em GPU do SRP foi desenvolvida, possibilitando processamento em tempo real do algoritmo. Os experimentos realizados mostram uma acurácia média de 95% ao se efetuar VAD de até três locutores simultâneos, e um erro médio de 10cm ao se localizar tais locutores. / Given the tendency of creating interfaces between human and machines that increasingly allow simple ways of interaction, it is only natural that research effort is put into techniques that seek to simulate the most conventional mean of communication humans use: the speech. In the human auditory system, voice is automatically processed by the brain in an effortless and effective way, also commonly aided by visual cues, such as mouth movement and location of the speakers. This processing done by the brain includes two important components that speech-based communication require: Voice Activity Detection (VAD) and Sound Source Localization (SSL). Consequently, VAD and SSL also serve as mandatory preprocessing tools for high-end Human Computer Interface (HCI) applications in a computing environment, as the case of automatic speech recognition and speaker identification. However, VAD and SSL are still challenging problems when dealing with realistic acoustic scenarios, particularly in the presence of noise, reverberation and multiple simultaneous speakers. In this work we propose some approaches for tackling these problems using audiovisual information, both for the single source and the competing sources scenario, exploiting distinct ways of fusing the audio and video modalities. Our work also employs a microphone array for the audio processing, which allows the spatial information of the acoustic signals to be explored through the stateof- the art method Steered Response Power (SRP). As an additional consequence, a very fast GPU version of the SRP is developed, so that real-time processing is achieved. Our experiments show an average accuracy of 95% when performing VAD of up to three simultaneous speakers and an average error of 10cm when locating such speakers. Reconhecimento : Padroes Reconhecimento : Voz Voz computacional Tempo real Voice activity detection Sound source localization Multiple speakers Competing sources Multimodal fusion Microphone array HiddenMarkov model Support vector machine GPU programming

Search results