Global ETD Search

681	Shape knowledge for segmentation and tracking Prisacariu, Victor Adrian January 2012 (has links) The aim of this thesis is to provide methods for 2D segmentation and 2D/3D tracking, that are both fast and robust to imperfect image information, as caused for example by occlusions, motion blur and cluttered background. We do this by combining high level shape information with simultaneous segmentation and tracking. We base our work on the assumption that the space of possible 2D object shapes can be either generated by projecting down known rigid 3D shapes or learned from 2D shape examples. We minimise the discrimination between statistical foreground and background appearance models with respect to the parameters governing the shape generative process (the 6 degree-of-freedom 3D pose of the 3D shape or the parameters of the learned space). The foreground region is delineated by the zero level set of a signed distance function, and we define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities. We obtain the differentials of this energy with respect to the parameters governing shape and conduct searches for the correct shape using standard non-linear minimisation techniques. This methodology first leads to a novel rigid 3D object tracker. For a known 3D shape, our optimisation here aims to find the 3D pose that leads to the 2D projection that best segments a given image. We extend our approach to track multiple objects from multiple views and propose novel enhancements at the pixel level based on temporal consistency. Finally, owing to the per pixel nature of much of the algorithm, we support our theoretical approach with a real-time GPU based implementation. We next use our rigid 3D tracker in two applications: (i) a driver assistance system, where the tracker is augmented with 2D traffic sign detections, which, unlike previous work, allows for the relevance of the traffic signs to the driver to be gauged and (ii) a robust, real time 3D hand tracker that uses data from an off-the-shelf accelerometer and articulated pose classification results from a multiclass SVM classifier. Finally, we explore deformable 2D/3D object tracking. Unlike previous works, we use a non-linear and probabilistic dimensionality reduction, called Gaussian Process Latent Variable Models, to learn spaces of shape. Segmentation becomes a minimisation of an image-driven energy function in the learned space. We can represent both 2D and 3D shapes which we compress with Fourier-based transforms, to keep inference tractable. We extend this method by learning joint shape-parameter spaces, which, novel to the literature, enable simultaneous segmentation and generic parameter recovery. These can describe anything from 3D articulated pose to eye gaze. We also propose two novel extensions to standard GP-LVM: a method to explore the multimodality in the joint space efficiently, by learning a mapping from the latent space to a space that encodes the similarity between shapes and a method for obtaining faster convergence and greater accuracy by use of a hierarchy of latent embeddings. 621.3994
682	Numerical Simulation of Bloch Equations for Dynamic Magnetic Resonance Imaging Hazra, Arijit 07 October 2016 (has links) No description available. 510 Magnetic resonance imaging Bloch equation modeling Flowing spins Radial FLASH Operator splitting Finite volume methods GPU computing Mathematik (PPN61756535X)
683	Akcelerace adversariálních algoritmů s využití grafického procesoru / GPU Accelerated Adversarial Search Brehovský, Martin January 2011 (has links) General purpose graphical processing units were proven to be useful for accelerating computationally intensive algorithms. Their capability to perform massive parallel computing significantly improve performance of many algorithms. This thesis focuses on using graphical processors (GPUs) to accelerate algorithms based on adversarial search. We investigate whether or not the adversarial algorithms are suitable for single instruction multiple data (SIMD) type of parallelism, which GPU provides. Therefore, parallel versions of selected algorithms accelerated by GPU were implemented and compared with the algorithms running on CPU. Obtained results show significant speed improvement and proof the applicability of GPU technology in the domain of adversarial search algorithms.
684	Interactive Preview Renderer for Complex Camera Models / Interactive Preview Renderer for Complex Camera Models Zámečník, Bohumír January 2012 (has links) Title: Interactive Preview Renderer for Complex Camera Models Author: Bohumír Zámečník Department: Department of Software and Computer Science Education Supervisor: Dr. Alexander Wilkie Supervisor's e-mail address: alexander@wilkie.at Abstract: An interactive renderer was implemented that allows users to preview the effects of imaging with lenses, such as depth of field, bokeh (defocus highlights) and tilt-shift lens configurations. It is based on a state-of-the-art method which com- bines the power of GPU rasterization and ray tracing. Many models and interactive visualizations were created. A non-interactive simulation of a complex geometrical lens model has been made which is able to produce optical aberrations. Also a prototype implementation of recent fast spreading filters is available. A thorough summary of the principles of optical image formation, lens models and depth of field rendering methods used in computer graphics is given along with a comparison of the approaches and new insights. New possibilities of representing the behavior of complex lenses are suggested, which could be employed to accelerate the rendering. Keywords: image synthesis, camera models, depth of field, GPU, image-based ray tracing
685	Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications Wu, Jiande 18 December 2014 (has links) Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task. Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems. Electrical and Computer Engineering
686	[en] SOLVING LARGE SYSTEMS OF LINEAR EQUATIONS ON MULTI-GPU CLUSTERS USING THE CONJUGATE GRADIENT METHOD IN OPENCLTM / [pt] RESOLUÇÃO DE SISTEMAS DE EQUAÇÕES LINEARES DE GRANDE PORTE EM CLUSTERS MULTI-GPU UTILIZANDO O MÉTODO DO GRADIENTE CONJUGADO EM OPENCLTM ANDRE LUIS CAVALCANTI BUENO 27 September 2013 (has links) [pt] Sistemas de equações lineares esparsos e de grande porte aparecem como resultado da modelagem de vários problemas nas engenharias. Dada sua importância, muitos trabalhos estudam métodos para a resolução desses sistemas. Esta dissertação explora o potencial computacional de múltiplas GPUs, utilizando a tecnologia OpenCL, com a finalidade de resolver sistemas de equações lineares de grande porte. Na metodologia proposta, o método do gradiente conjugado é subdivido em kernels que são resolvidos por múltiplas GPUs. Para tal, se fez necessário compreender como a arquitetura das GPUs se relaciona com a tecnologia OpenCL a fim de obter um melhor desempenho. / [en] The process of modeling problems in the engineering fields tends to produce substantiously large systems of sparse linear equations. Extensive research has been done to devise methods to solve these systems. This thesis explores the computational potential of multiple GPUs, through the use of the OpenCL tecnology, aiming to tackle the solution of large systems of sparse linear equations. In the proposed methodology, the conjugate gradient method is subdivided into kernels, which are delegated to multiple GPUs. In order to achieve an efficient method, it was necessary to understand how the GPUs’ architecture communicates with OpenCL. [pt] GPGPU [en] GPGPU [pt] COMPUTACAO DE ALTO DESEMPENHO [en] HIGH PERFORMANCE COMPUTING [pt] METODO DO GRADIENTE CONJUGADO [pt] MULTI-GPU [pt] OPENCL
687	SIMD Optimizations of Software Rendering in 2D Video Games / SIMD optimeringar i mjukvarurendering av 2D spel Mendel, Oskar, Bergström, Jesper January 2019 (has links) Optimizing rendering is one of the greatest challenges faced by game developers. Most game engines make use of hardware rendering which uses technology specifically built for rendering. Before such hardware existed, game developers had to rely on the CPU to render their games. This is known as software rendering. Software rendering is not commonly used nowadays but has been seen in cases such as a backup for when the end users machine does not support the hardware based renderer of the application. Since the CPU is not purposely built for rendering, unlike the GPU, the developer has to perform optimizations to make the renderer more efﬁcient in terms of speed. In this thesis, we present an approach which is a subset of parallel programming called Single Instruction, Multiple Data. This technique operates on vector based registers which means operations can be performed on multiple pieces of data at once. This is applied to an already built game engine in order to optimize its rendering. The results show a speed-up of 90.5% and a framerate increase from 30 frames per second to 133 frames per second within the rendering routine. SIMD AVX SSE CPU GPU Parallel programming Optimization Game developement Game engine x86 Haswell Rendering Computer Sciences Datavetenskap (datalogi)
688	Optimization Methods for Direct Volume Rendering on the Client Side Web Nilsson, Tobias January 2019 (has links) Volume visualization has been made available on the web using the Direct Volume Rendering (DVR) technique, powered by the WebGL 1 API. While the technique produces visually pleasing output, the performance of the prototypes that implement this leave much desired. 2017 saw the release of the next version of WebGL, WebGL 2.0 and the introduction of WebAsssembly. These APIs and formats are promising tools for formulating a DVR application that can do high performance rendering at interactive frame rates. This thesis investigates, implements and evaluates a prototype application that utilizes the optimization methods of Adaptive Texture Maps, Octree Empty Space Skipping and Distance Transform Empty Space Skipping. The Distance Transform is further evaluated by a CPU bound and a GPU bound algorithm implementation. The techniques are assessed on readily available off the shelf devices and hardware. The performance of the prototype application ran on these devices is quantified by measuring computation times of costly operations, and measuring frames per second. It is concluded that for different hardware, the methods have different properties. While higher FPS is achieved for all devices by utilizing some combination of the optimization methods, the distance transform is the most consistent. A discussion on embedded devices and their quirks is also held, where memory constraints and the resolution of the data is of greater importance than on the non-embedded devices. This results in some suggested actions that can be taken to also potentially enable high-performance rendering of higher resolution data on these devices. Volume Visualization Ray Casting Optimization GPU Web Annan elektroteknik och elektronik
689	Performance prediction of application executed on GPUs using a simple analytical model and machine learning techniques / Predição de desempenho de aplicações executadas em GPUs usando um modelo analítico simples e técnicas de aprendizado de máquina González, Marcos Tulio Amarís 25 June 2018 (has links) The parallel and distributed platforms of High Performance Computing available today have became more and more heterogeneous (CPUs, GPUs, FPGAs, etc). Graphics Processing Units (GPU) are specialized co-processor to accelerate and improve the performance of parallel vector operations. GPUs have a high degree of parallelism and can execute thousands or millions of threads concurrently and hide the latency of the scheduler. GPUs have a deep hierarchical memory of different types as well as different configurations of these memories. Performance prediction of applications executed on these devices is a great challenge and is essential for the efficient use of resources in machines with these co-processors. There are different approaches for these predictions, such as analytical modeling and machine learning techniques. In this thesis, we present an analysis and characterization of the performance of applications executed on GPUs. We propose a simple and intuitive BSP-based model for predicting the CUDA application execution times on different GPUs. The model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. We also compare three different Machine Learning (ML) approaches: Linear Regression, Support Vector Machines and Random Forests with BSP-based analytical model. This comparison is made in two contexts, first, data input or features for ML techniques were the same than analytical model, and, second, using a process of feature extraction, using correlation analysis and hierarchical clustering. We show that GPU applications that scale regularly can be predicted with simple analytical models, and an adjusting parameter. This parameter can be used to predict these applications in other GPUs. We also demonstrate that ML approaches provide reasonable predictions for different cases and ML techniques required no detailed knowledge of application code, hardware characteristics or explicit modeling. Consequently, whenever a large data set with information about similar applications are available or it can be created, ML techniques can be useful for deploying automated on-line performance prediction for scheduling applications on heterogeneous architectures with GPUs. / As plataformas paralelas e distribuídas de computação de alto desempenho disponíveis hoje se tornaram mais e mais heterogêneas (CPUs, GPUs, FPGAs, etc). As Unidades de processamento gráfico são co-processadores especializados para acelerar operações vetoriais em paralelo. As GPUs têm um alto grau de paralelismo e conseguem executar milhares ou milhões de threads concorrentemente e ocultar a latência do escalonador. Elas têm uma profunda hierarquia de memória de diferentes tipos e também uma profunda configuração da memória hierárquica. A predição de desempenho de aplicações executadas nesses dispositivos é um grande desafio e é essencial para o uso eficiente dos recursos computacionais de máquinas com esses co-processadores. Existem diferentes abordagens para fazer essa predição, como técnicas de modelagem analítica e aprendizado de máquina. Nesta tese, nós apresentamos uma análise e caracterização do desempenho de aplicações executadas em Unidades de Processamento Gráfico de propósito geral. Nós propomos um modelo simples e intuitivo fundamentado no modelo BSP para predizer a execução de funções kernels de CUDA sobre diferentes GPUs. O modelo está baseado no número de computações e acessos à memória da GPU, com informação adicional do uso das memórias cachês obtidas do processo de profiling. Nós também comparamos três diferentes enfoques de aprendizado de máquina (ML): Regressão Linear, Máquinas de Vetores de Suporte e Florestas Aleatórias com o nosso modelo analítico proposto. Esta comparação é feita em dois diferentes contextos, primeiro, dados de entrada ou features para as técnicas de aprendizado de máquinas eram as mesmas que no modelo analítico, e, segundo, usando um processo de extração de features, usando análise de correlação e clustering hierarquizado. Nós mostramos que aplicações executadas em GPUs que escalam regularmente podem ser preditas com modelos analíticos simples e um parâmetro de ajuste. Esse parâmetro pode ser usado para predizer essas aplicações em outras GPUs. Nós também demonstramos que abordagens de ML proveem predições aceitáveis para diferentes casos e essas abordagens não exigem um conhecimento detalhado do código da aplicação, características de hardware ou modelagens explícita. Consequentemente, sempre e quando um banco de dados com informação de \\textit esteja disponível ou possa ser gerado, técnicas de ML podem ser úteis para aplicar uma predição automatizada de desempenho para escalonadores de aplicações em arquiteturas heterogêneas contendo GPUs. BSP model CUDA CUDA GPU architectures Machine learning Máquinas de aprendizado Modelo BSP Performance prediction Predição de desempenho Unidades de processamento gráfico
690	Application-Directed DVFS using Multiple Clock Domains on Graphics Hardware Li, Juan 14 January 2009 (has links) As handheld devices have become increasingly popular, powerful programmable graphics hardware for mobile and handheld devices has been deployed. While many resources on mobile devices are limited, the predominant problem for mobile devices is their limited battery power. Several techniques have been proposed to increase the energy efficiency of mobile applications and improve battery life. In this thesis, we propose a new dynamic voltage and frequency scaling (DVFS) on Graphics Processing Units (GPU). In most cases, cues within the graphics appli- cation can be used to predict portions of a GPU that will be used or unused when the application is run. We partition the GPU into six clock domains that can be clocked at different rates. Specifically, each domain it has its own voltage and frequency set- ting based on its predicted workload to save energy without reducing applications frame rates. In addition, we propose an signature-based algorithm for predicting the workload offered to our six clock domains by a given application to decide voltage and frequency settings. We conduct experiments and compare the results of our new signature based workload prediction algorithm with some other traditional interval based workload prediction algorithms. Our results show that our signature-based prediction can save 30-50% energy without afecting application frame rates. Energy Graphics Process Unit(GPU) Multiple Clock Domain(MCD) Pocket computers Computer graphics

Search results