151 |
Nasazení paralelních architektur v podrobnostním vyhledávání / Employing Parallel Architectures in Similarity SearchKruliš, Martin January 2013 (has links)
This work examines the possibilities of employing highly parallel architectures in database systems, which are based on the similarity search paradigm. The main objective of our research is utilizing the computational power of current GPU devices for similarity search in the databases of images. Despite leaping progress made in the past few years, the similarity search problems remain very expensive from a compu- tational point of view, which limits the scope of their applicability. GPU devices have a tremendous computational power at their disposal; however, the usability of this power for particular problems is often complicated due to the specific properties of this architecture. Therefore, the existing algorithms and data structures require extensive modifications if they are to be adapted for the GPUs. We have addressed all the aspects of this domain, such as efficient utilization of the GPU hardware for generic computations, parallelization of similarity search process, and acceleration of image indexing techniques. In most cases, employing the GPU devices brought a speedup of two orders of magnitude with respect to single-core CPUs and approximately one order of magnitude with respect to multiprocessor NUMA servers. This thesis summarizes our experience and discoveries from several years of research,...
|
152 |
GPGPU based implementation of BLIINDS-II NR-IQAJanuary 2016 (has links)
abstract: The technological advances in the past few decades have made possible creation and consumption of digital visual content at an explosive rate. Consequently, there is a need for efficient quality monitoring systems to ensure minimal degradation of images and videos during various processing operations like compression, transmission, storage etc. Objective Image Quality Assessment (IQA) algorithms have been developed that predict quality scores which match well with human subjective quality assessment. However, a lot of research still remains to be done before IQA algorithms can be deployed in real world systems. Long runtimes for one frame of image is a major hurdle. Graphics Processing Units (GPUs), equipped with massive number of computational cores, provide an opportunity to accelerate IQA algorithms by performing computations in parallel. Indeed, General Purpose Graphics Processing Units (GPGPU) techniques have been applied to a few Full Reference IQA algorithms which fall under the. We present a GPGPU implementation of Blind Image Integrity Notator using DCT Statistics (BLIINDS-II), which falls under the No Reference IQA algorithm paradigm. We have been able to achieve a speedup of over 30x over the previous CPU version of this algorithm. We test our implementation using various distorted images from the CSIQ database and present the performance trends observed. We achieve a very consistent performance of around 9 milliseconds per distorted image, which made possible the execution of over 100 images per second (100 fps). / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
153 |
Intelligent Scheduling and Memory Management Techniques for Modern GPU ArchitecturesJanuary 2017 (has links)
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.
First, not all parallel threads have a uniform amount of workload to fully utilize GPU’s computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution time slices and additional cache resources to the critical warp. The evaluation result shows that with CAWA, GPUs can achieve an average of 1.23x speedup.
Second, the shared cache storage in GPUs is often insufficient to accommodate demands of the large number of concurrent threads. As a result, cache thrashing is commonly experienced in GPU’s cache memories, particularly in the L1 data caches. To alleviate the cache contention and thrashing problem, I develop an instruction aware Control Loop Based Adaptive Bypassing algorithm, called Ctrl-C. Ctrl-C learns the cache reuse behavior and bypasses a portion of memory requests with the help of feedback control loops. The evaluation result shows that Ctrl-C can effectively improve cache utilization in GPUs and achieve an average of 1.42x speedup for cache sensitive GPGPU workloads.
Finally, GPU workloads and the co-located processes running on the host chip multiprocessor (CMP) in a heterogeneous system setup can contend for memory resources in multiple levels, resulting in significant performance degradation. To maximize the system throughput and balance the performance degradation of all co-located applications, I design a scalable performance degradation predictor specifically for heterogeneous systems, called HeteroPDP. HeteroPDP predicts the application execution time and schedules OpenCL workloads to run on different devices based on the optimization goal. The evaluation result shows HeteroPDP can improve the system fairness from 24% to 65% when an OpenCL application is co-located with other processes, and gain an additional 50% speedup compared with always offloading the OpenCL workload to GPUs.
In summary, this dissertation aims to provide insights for the future microarchitecture and system architecture designs by identifying, analyzing, and addressing three critical performance problems in modern GPUs. / Dissertation/Thesis / Doctoral Dissertation Computer Engineering 2017
|
154 |
Analysis of Hardware Usage Of Shuffle Instruction Based Performance Optimization in the Blinds-II Image Quality Assessment AlgorithmJanuary 2017 (has links)
abstract: With the advent of GPGPU, many applications are being accelerated by using CUDA programing paradigm. We are able to achieve around 10x -100x speedups by simply porting the application on to the GPU and running the parallel chunk of code on its multi cored SIMT (Single instruction multiple thread) architecture. But for optimal performance it is necessary to make sure that all the GPU resources are efficiently used, and the latencies in the application are minimized. For this, it is essential to monitor the Hardware usage of the algorithm and thus diagnose the compute and memory bottlenecks in the implementation. In the following thesis, we will be analyzing the mapping of CUDA implementation of BLIINDS-II algorithm on the underlying GPU hardware, and come up with a Kepler architecture specific solution of using shuffle instruction via CUB library to tackle the two major bottlenecks in the algorithm. Experiments were conducted to convey the advantage of using shuffle instru3ction in algorithm over only using shared memory as a buffer to global memory. With the new implementation of BLIINDS-II algorithm using CUB library, a speedup of around 13.7% was achieved. / Dissertation/Thesis / Masters Thesis Engineering 2017
|
155 |
Massively parallel nearest neighbors searches in dynamic point clouds on GPUJosé Silva Leite, Pedro 31 January 2010 (has links)
Made available in DSpace on 2014-06-12T15:57:17Z (GMT). No. of bitstreams: 2
arquivo3157_1.pdf: 3737373 bytes, checksum: 7ca491f9a72f2e9cf51764a7acac3e3c (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2010 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Esta dissertação introduz uma estrutura de dados baseada em gride implementada em GPU. Ela foi desenvolvida para pesquisa dos vizinhos mais próximos em nuvens de pontos dinâmicas, de uma forma massivamente paralela. A implementação possui desempenho em tempo real e é executada em GPU, ambas construção do gride e pesquisas dos vizinhos mais próximos (exatos e aproximados). Dessa forma, a transferência de memória entre sistema e dispositivo é minimizada, aumentando o desempenho de uma forma geral. O algoritmo proposto pode ser usado em diferentes aplicações com cenários estáticos ou dinâmicos. Além disso, a estrutura de dados suporta nuvens de pontos tridimensionais e dada sua natureza dinâmica, o usuário pode mudar seus parâmetros em tempo de execução. O mesmo se aplica ao número de vizinhos pesquisados. Uma referência em CPU foi implementada e comparações de desempenho justificam o uso de GPUs como processadores massivamente paralelos. Em adição, o desempenho da estrutura de dados proposta é comparada com implementações em CPU e GPU de trabalhos anteriores. Finalmente, uma aplicação de renderização baseada em pontos foi desenvolvida de forma a verificar o potencial da estrutura de dados
|
156 |
Um pipeline para renderização fotorrealística de tempo real com ray tracing para a realidade aumentadaLemos de Almeida Melo, Diego 31 January 2012 (has links)
Made available in DSpace on 2014-06-12T16:01:28Z (GMT). No. of bitstreams: 2
arquivo9410_1.pdf: 4384561 bytes, checksum: 4ebaaa7cbd8455ac2eed9a38c2530cf4 (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2012 / A Realidade Aumentada é um campo de pesquisa que trata do estudo de técnicas para
integrar informações virtuais com o mundo real. Algumas aplicações de Realidade Aumentada
requerem fotorrealismo, onde os elementos virtuais são tão coerentemente inseridos na cena real
que o usuário não consegue distinguir o virtual do real.
Para a síntese de cenas 3D existem diversas técnicas, entre elas o ray tracing. Ele é um
algoritmo baseado em conceitos básicos da Física Ótica, cuja principal característica é a alta
qualidade visual a um custo computacional elevado, o que condicionava a sua utilização a aplicações
offline. Contudo, com o avanço do poder computacional das GPUs este algoritmo passou a ser viável
para ser utilizado em aplicações de tempo real, devido principalmente ao fato de ser um algoritmo
com a característica de poder ser massivamente paralelizado.
Levando isto em consideração, esta dissertação propõe um pipeline para renderização
fotorrealística em tempo real utilizando a técnica ray tracing em aplicações de Realidade
Aumentada. O ray tracer utilizado foi o Real Time Ray Tracer, ou RT2, de Santos et al., que serviu de
base para a construção de um pipeline com suporte a sombreamento, síntese de diversos tipos de
materiais, oclusão, reflexão, refração e alguns efeitos de câmera. Para que fosse possível obter um
sistema que funciona a taxas interativas, todo o pipeline de renderização foi implementado em GPU,
utilizando a linguagem CUDA, da NVIDIA. Outra contribuição importante deste trabalho é a
integração deste pipeline com o dispositivo Kinect, da Microsoft, possibilitando a obtenção de
informações reais da cena, em tempo real, eliminando assim a necessidade de se conhecer
previamente os objetos pertencentes à cena real
|
157 |
Um Pipeline Para Renderização Fotorrealística de Tempo Real com Ray Tracing para Realidade AumentadaMelo, Diego Lemos de Almeida 09 March 2012 (has links)
Submitted by Pedro Henrique Rodrigues (pedro.henriquer@ufpe.br) on 2015-03-04T18:12:29Z
No. of bitstreams: 2
Dissertacao_completa_Diego_Lemos.pdf: 4382725 bytes, checksum: 304625beefcdb33f03bb97376f48c770 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-04T18:12:29Z (GMT). No. of bitstreams: 2
Dissertacao_completa_Diego_Lemos.pdf: 4382725 bytes, checksum: 304625beefcdb33f03bb97376f48c770 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Previous issue date: 2012-03-09 / A Realidade Aumentada é um campo de pesquisa que trata do estudo de técnicas para
integrar informações virtuais com o mundo real. Algumas aplicações de Realidade Aumentada
requerem fotorrealismo, onde os elementos virtuais são tão coerentemente inseridos na cena real
que o usuário não consegue distinguir o virtual do real.
Para a síntese de cenas 3D existem diversas técnicas, entre elas o ray tracing. Ele é um
algoritmo baseado em conceitos básicos da Física Ótica, cuja principal característica é a alta
qualidade visual a um custo computacional elevado, o que condicionava a sua utilização a aplicações
offline. Contudo, com o avanço do poder computacional das GPUs este algoritmo passou a ser viável
para ser utilizado em aplicações de tempo real, devido principalmente ao fato de ser um algoritmo
com a característica de poder ser massivamente paralelizado.
Levando isto em consideração, esta dissertação propõe um pipeline para renderização
fotorrealística em tempo real utilizando a técnica ray tracing em aplicações de Realidade
Aumentada. O ray tracer utilizado foi o Real Time Ray Tracer, ou RT2, de Santos et al., que serviu de
base para a construção de um pipeline com suporte a sombreamento, síntese de diversos tipos de
materiais, oclusão, reflexão, refração e alguns efeitos de câmera. Para que fosse possível obter um
sistema que funciona a taxas interativas, todo o pipeline de renderização foi implementado em GPU,
utilizando a linguagem CUDA, da NVIDIA. Outra contribuição importante deste trabalho é a
integração deste pipeline com o dispositivo Kinect, da Microsoft, possibilitando a obtenção de
informações reais da cena, em tempo real, eliminando assim a necessidade de se conhecer
previamente os objetos pertencentes à cena real.
|
158 |
Using Multicore Programming on the GPU to Improve Creation of Potential FieldsElmir, Hassan January 2013 (has links)
In the last decade video games have made great improvements in terms of arti cial intelligence and visuals. Researchers have also made advancements in the arti cial intelligence eld and some of the latest research papers have been exploring potential elds. This report will cover the background of potential eld and examine some improvements that can be made to increase the performance of the algorithm. The basic idea is to increase performance by making a GPGPU(General purpose graphic processing unit) solution for the creation of potential elds. Several GPGPU implementations are presented where focus has lied on optimizing memory access patterns to increase performance. The results of this thesis show that an optimized GPGPU implementation can give up to 18.5x speedup over a CPU implementation.
|
159 |
Particle Systems Using 3D Vector Fields with OpenGL Compute Shaders / Partikelsystem genom 3D Vektorfält med OpenGL Compute ShadersAnderdahl, Johan, Darner, Alice January 2014 (has links)
Context. Particle systems and particle effects are used to simulate a realistic and appealing atmosphere in many virtual environments. However, they do occupy a significant amount of computational resources. The demand for more advanced graphics increases by each generation, likewise does particle systems need to become increasingly more detailed. Objectives. This thesis proposes a texture-based 3D vector field particle system, computed on the Graphics Processing Unit, and compares it to an equation-based particle system. Methods. Several tests were conducted comparing different situations and parameters for the methods. All of the tests measured the computational time needed to execute the different methods. Results. We show that the texture-based method was effective in very specific situations where it was expected to outperform the equation-based. Otherwise, the equation-based particle system is still the most efficient. Conclusions. Generally the equation-based method is preferred, except for in very specific cases. The texture-based is most efficient to use for static particle systems and when a huge number of forces is applied to a particle system. Texture-based vector fields is hardly useful otherwise.
|
160 |
Utilizing state-of-art NeuroES and GPGPU to optimize Mario AILövgren, Hans January 2014 (has links)
Context. Reinforcement Learning (RL) is a time consuming effort that requires a lot of computational power as well. There are mainly two approaches to improving RL efficiency, the theoretical mathematics and algorithmic approach or the practical implementation approach. In this study, the approaches are combined in an attempt to reduce time consumption.\newline Objectives. We investigate whether modern hardware and software, GPGPU, combined with state-of-art Evolution Strategies, CMA-Neuro-ES, can potentially increase the efficiency of solving RL problems.\newline Methods. In order to do this, both an implementational as well as an experimental research method is used. The implementational research mainly involves developing and setting up an experimental framework in which to measure efficiency through benchmarking. In this framework, the GPGPU/ES solution is later developed. Using this framework, experiments are conducted on a conventional sequential solution as well as our own parallel GPGPU solution.\newline Results. The results indicate that utilizing GPGPU and state-of-art ES when attempting to solve RL problems can be more efficient in terms of time consumption in comparison to a conventional and sequential CPU approach.\newline Conclusions. We conclude that our proposed solution requires additional work and research but that it shows promise already in this initial study. As the study is focused on primarily generating benchmark performance data from the experiments, the study lacks data on RL efficiency and thus motivation for using our approach. However we do conclude that the GPGPU approach suggested does allow less time consuming RL problem solving.
|
Page generated in 0.0307 seconds