Global ETD Search

181	Αξιοποίηση υπολογιστικών πόρων Σίψας, Κωνσταντίνος 13 December 2010 (has links) Στα πλαίσια αυτής της εργασίας θα εξετάσουμε την δυνατότητα αξιοποίησης της μονάδας επεξεργασίας γραφικών (GPU) για την εκτέλεση ενός αλγορίθμου πολλαπλασιασμού πίνακα-διανύσματος και τριών αλγορίθμων ταξινόμησης και το κατά πόσο είναι δυνατό να επιταχυνθεί η εκτέλεση του κώδικα αυτού. Η αρχιτεκτονική που μελετήθηκε και αναλύεται στην εργασία ονομάζεται Tesla και αναπτύχθηκε από την εταιρία Nvidia, το μοντέλο και το περιβάλλον ανάπτυξης ονομάζονται Cuda (Compute Unified Device Architecture). / In context of this diploma thesis the capability of exploiting the graphics processing unit (GPU) to execute and accelerate an algorithm for matrix vector multiplication and three sorting algorithms was examined. The architecture which was examined and described in this diploma thesis is Tesla and it was created by Nvidia. The CUDA (Compute Unified Device Architecture) programming environment was used to implement the algorithms. 005.741 Parallel sorting Tesla
182	Αρχιτεκτονική προσομοίωση σε επεξεργαστικές μονάδες υψηλού βαθμού παραλληλίας Στρίκος, Νικόλαος 11 January 2011 (has links) Η πρόσφατη εξάπλωση που είδε το μοντέλο της παράλληλης επεξεργασίας στους μικροεπεξεργαστές γενικής χρήσης με την εισαγωγή περισσότερων από έναν πυρήνες εντός του ολοκληρωμένου κυκλώματος έφερε νέες απαιτήσεις στις μεθόδους προσομοίωσης που παραδοσιακά χρησιμοποιήθηκαν για την εξερεύνηση νέων αρχιτεκτονικών. Στην εργασία αυτή προτείνεται ένα πλαίσιο και ένα προγραμματιστικό μοντέλο που κάνει χρήση της αρχιτεκτονικής υψηλού βαθμού παραλληλίας CUDA για να επιτύχει επιτάχυνση στην αρχιτεκτονική προσομοίωση πρωτοκόλλων συνοχής κρυφής μνήμης. / The recent adoption of the parallel computing model in general-use microprocessors with the inclusion of more than one cores in the IC has raised new demands for the simulation methodologies that have been traditionally used. In this work, a framework and a programming model are proposed that make use of the highly parallel CUDA platform to accelerate architectural simulation of cache coherency protocols. Μικροεπεξεργαστές Κρυφή μνήμη 005.275 GPU CUDA Cache coherency protocols Parallel simulation
183	Παραλληλοποίηση αλγορίθμου Aho-Corasick με τεχνολογία CUDA Δημόπουλος, Παναγιώτης 24 October 2012 (has links) Στην παρούσα διπλωματική εκπονείται μία μελέτη για την απόδοση των αλγορίθμων αναζήτησης μοτίβων όταν αυτοί τροποποιηθούν κατάλληλα ώστε να εκμεταλλεύονται την αρχιτεκτονική του Υλικού των καρτών γραφικών. Για τον σκοπό αυτό στην παρούσα διπλωματική παρουσιάζεται στην αρχή το πρόβλημα της αναζήτησης ώστε να γίνει κατανοητό γιατί είναι επιτακτική η ανάγκη βελτιστοποίησης της απόδοσης των υπαρχόντων αλγορίθμων. Επίσης παρουσιάζονται οι κυριότεροι αλγόριθμοι αναζήτησης μοτίβων που χρησιμοποιούνται σήμερα και εξηγείται γιατί επιλέγεται ένας από αυτούς τους αλγόριθμους που στην συνέχεια θα τροποποιηθεί ώστε να εκμεταλλεύεται την ιδιαίτερη αρχιτεκτονική μιας κάρτας γραφικών. Έπειτα εξάγονται συμπεράσματα για την απόδοση που μας προσφέρει αυτή η νέα υλοποίηση του αλγορίθμου σε λογισμικό σε σχέση με την απλή υλοποίηση του αλγορίθμου και για διαφορετικά μεγέθη εισόδων / Conversion of Aho-Corasick algorithm in order to execute in an Nvidia graphic card using CUDA technology. Comparison of speed between the parallel and the classic version of the algorithm. Κάρτες γραφικών Μοτίβο 004.35 Graphic cards Parallel computing CUDA Aho-Corasick
184	Correspondence-based pairwise depth estimation with parallel acceleration Bartosch, Nadine January 2018 (has links) This report covers the implementation and evaluation of a stereo vision corre- spondence-based depth estimation algorithm on a GPU. The results and feed- back are used for a Multi-view camera system in combination with Jetson TK1 devices for parallelized image processing and the aim of this system is to esti- mate the depth of the scenery in front of it. The performance of the algorithm plays the key role. Alongside the implementation, the objective of this study is to investigate the advantages of parallel acceleration inter alia the differences to the execution on a CPU which are significant for all the function, the imposed overheads particular for a GPU application like memory transfer from the CPU to the GPU and vice versa as well as the challenges for real-time and concurrent execution. The study has been conducted with the aid of CUDA on three NVIDIA GPUs with different characteristics and with the aid of knowledge gained through extensive literature study about different depth estimation algo- rithms but also stereo vision and correspondence as well as CUDA in general. Using the full set of components of the algorithm and expecting (near) real-time execution is utopic in this setup and implementation, the slowing factors are in- ter alia the semi-global matching. Investigating alternatives shows that results for disparity maps of a certain accuracy are also achieved by local methods like the Hamming Distance alone and by a filter that refines the results. Further- more, it is demonstrated that the kernel launch configuration and the usage of GPU memory types like shared memory is crucial for GPU implementations and has an impact on the performance of the algorithm. Just concurrency proves to be a more complicated task, especially in the desired way of realization. For the future work and refinement of the algorithm it is therefore recommended to invest more time into further optimization possibilities in regards of shared memory and into integrating the algorithm into the actual pipeline. Depth estimation disparity stereo vision stereo correspondence NVIDIA GPU CUDA parallelization Computer Systems Datorsystem
185	Implementação e análise de algoritmos para estimação de movimento em processadores paralelos tipo GPU (Graphics Processing Units) / Implementation and analysis of algorithms for motion estimation onto parallels processors type GPU Monteiro, Eduarda Rodrigues January 2012 (has links) A demanda por aplicações que processam vídeos digitais têm obtido atenção na indústria e na academia. Considerando a manipulação de um elevado volume de dados em vídeos de alta resolução, a compressão de vídeo é uma ferramenta fundamental para reduzir a quantidade de informações de modo a manter a qualidade viabilizando a respectiva transmissão e armazenamento. Diferentes padrões de codificação de vídeo foram desenvolvidos para impulsionar o desenvolvimento de técnicas avançadas para este fim, como por exemplo, o padrão H.264/AVC. Este padrão é considerado o estado-da-arte, pois proporciona maior eficiência em codificação em relação a padrões existentes (MPEG-4). Entre todas as ferramentas inovadoras apresentadas pelas mais recentes normas de codificação, a Estimação de Movimento (ME) é a técnica que provê a maior parcela dos ganhos. A ME busca obter a relação de similaridade entre quadros vizinhos de uma cena, porém estes ganhos são obtidos ao custo de um elevado custo computacional representando a maior parte da complexidade total dos codificadores atuais. O objetivo do trabalho é acelerar o processo de ME, principalmente quando vídeos de alta resolução são codificados. Esta aceleração concentra-se no uso de uma plataforma massivamente paralela, denominada GPU (Graphics Processing Unit). Os algoritmos da ME apresentam um elevado potencial de paralelização e são adequados para implementação em arquiteturas paralelas. Assim, diferentes algoritmos têm sido propostos a fim de diminuir o custo computacional deste módulo. Este trabalho apresenta a implementação e a exploração do paralelismo de dois algoritmos da ME em GPU, focados na codificação de vídeo de alta definição e no processamento em tempo real. O algoritmo Full Search (FS) é conhecido como algoritmo ótimo, pois encontra os melhores resultados a partir de uma busca exaustiva entre os quadros. O algoritmo rápido Diamond Search (DS) reduz significativamente a complexidade da ME mantendo a qualidade de vídeo próxima ao desempenho apresentado pelo FS. A partir da exploração máxima do paralelismo dos algoritmos FS e DS e do processamento paralelo disponível nas GPUs, este trabalho apresenta um método para mapear estes algoritmos em GPU, considerando a arquitetura CUDA (Compute Unified Device Architecture). Para avaliação de desempenho, as soluções CUDA são comparadas com as respectivas versões multi-core (utilizando biblioteca OpenMP) e distribuídas (utilizando MPI como infraestrutura de suporte). Todas as versões foram avaliadas em diferentes resoluções e os resultados foram comparados com algoritmos da literatura. As implementações propostas em GPU apresentam aumentos significativos, em termos de desempenho, em relação ao software de referência do codificador H.264/AVC e, além disso, apresentam ganhos expressivos em relação às respectivas versões multi-core, distribuída e trabalhos GPGPU propostos na literatura. / The demand for applications processing digital videos has become the focus of attention in industry and academy. Considering the manipulation of the high volume of data contained in high resolution digital videos, video compression is a fundamental tool for reduction in the amount of information in order to maintain the quality and, thus enabling its respective transfer and storage. As to obtain the development of advanced video coding techniques, different standards of video encoding were developed, for example, the H.264/AVC. This standard is considered the state-of-art for proving high coding efficiency compared to previous standards (MPEG-4). Among all innovative tools featured by the latest video coding standards, the Motion Estimation is the technique that provides the most important coding gains. ME searches obtain the similarity relation between neighboring frames of the one scene. However, these gains were obtained by the elevated computational cost, representing the greater part of the total complexity of the current encoders. The goal of this project is to accelerate the Motion Estimation process, mainly when high resolution digital videos were encoded. This acceleration focuses on the use of a massively parallel platform called GPU (Graphics Processing Unit). The Motion Estimation block matching algorithms present a high potential for parallelization and are suitable for implementation in parallel architectures. Therefore, different algorithms have been proposed to decrease the computational complexity of this module. This work presents the implementation and parallelism exploitation of two motion estimation algorithms in GPU focused in encoding high definition video and the real time processing. Full Search algorithm (FS) is known as optimal since it finds the best match by exhaustively searching between frames. The fast Diamond Search algorithm reduces significantly the ME complexity while keeping the video quality near FS performance. By exploring the maximum inherent parallelism of FS and DS and the available parallel processing capability of GPUs, this work presents an efficient method to map out these algorithms onto GPU considering the CUDA architecture (Compute Unified Device Architecture). For performance evaluation, the CUDA solutions are compared with respective multi-core (using OpenMP library) and distributed (using MPI as supporting infrastructure) versions. All versions were evaluated in different video resolutions and the results were compared with algorithms found in the literature. The proposed implementations onto GPU present significant increase, in terms of performance, in relation with the H.264/AVC encoder reference software and, moreover, present expressive gains in relation with multi-core, distributed versions and GPGPU alternatives proposed in literature. Compressao : Video Algoritmos Microeletrônica Motion estimation Full search Diamond search GPU CUDA
186	Performance Metrics Analysis of GamingAnywhere with GPU accelerated NVIDIA CUDA Sreenibha Reddy, Byreddy January 2018 (has links) The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU. Cloud Computing Cloud Gaming GPU Acceleration NVIDIA CUDA GamingAnywhere. Telecommunications Telekommunikation
187	Performance Metrics Analysis of GamingAnywhere with GPU accelerated Nvidia CUDA Sreenibha Reddy, Byreddy January 2018 (has links) The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU. Cloud Computing Cloud Gaming GPU Acceleration NVIDIA CUDA GamingAnywhere. Telecommunications Telekommunikation
188	Performance Metrics Analysis of GamingAnywhere with GPU acceletayed NVIDIA CUDA using gVirtuS Zaahid, Mohammed January 2018 (has links) The modern world has opened the gates to a lot of advancements in cloud computing, particularly in the field of Cloud Gaming. The most recent development made in this area is the open-source cloud gaming system called GamingAnywhere. The relationship between the CPU and GPU is what is the main object of our concentration in this thesis paper. The Graphical Processing Unit (GPU) performance plays a vital role in analyzing the playing experience and enhancement of GamingAnywhere. In this paper, the virtualization of the GPU has been concentrated on and is suggested that the acceleration of this unit using NVIDIA CUDA, is the key for better performance while using GamingAnywhere. After vast research, the technique employed for NVIDIA CUDA has been chosen as gVirtuS. There is an experimental study conducted to evaluate the feasibility and performance of GPU solutions by VMware in cloud gaming scenarios given by GamingAnywhere. Performance is measured in terms of bitrate, packet loss, jitter and frame rate. Different resolutions of the game are considered in our empirical research and our results show that the frame rate and bitrate have increased with different resolutions, and the usage of NVIDIA CUDA enhanced GPU. Cloud Computing Cloud Gaming GPU Acceleration NVIDIA CUDA GamingAnywhere Telecommunications Telekommunikation
189	Determinação de autovalores e autovetores de matrizes tridiagonais simétricas usando CUDA Rocha, Lindomar José 04 August 2015 (has links) Dissertação (mestrado)–Universidade de Brasília, Universidade UnB de Planaltina, Programa de Pós-Graduação em Ciência de Materiais, 2015. / Submitted by Fernanda Percia França (fernandafranca@bce.unb.br) on 2015-12-15T17:59:17Z No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5) / Approved for entry into archive by Raquel Viana(raquelviana@bce.unb.br) on 2016-02-29T22:14:44Z (GMT) No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5) / Made available in DSpace on 2016-02-29T22:14:44Z (GMT). No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5) / Diversos ramos do conhecimento humano fazem uso de autovalores e autovetores, dentre eles têm-se Física, Engenharia, Economia, etc. A determinação desses autovalores e autovetores pode ser feita utilizando diversas rotinas computacionais, porém umas mais rápidas que outras nesse senário de ganho de velocidade aparece a opção de se usar a computação paralela de forma mais especifica a CUDA da Nvidia é uma opção que oferece um ganho de velocidade significativo, nesse modelo as rotinas são executadas na GPU onde se tem diversos núcleos de processamento. Dada a tamanha importância dos autovalores e autovetores o objetivo desse trabalho é determinar rotinas que possam efetuar o cálculos dos mesmos com matrizes tridiagonais simétricas reais de maneira mais rápida e segura, através de computação paralela com uso da CUDA. Objetivo esse alcançado através da combinação de alguns métodos numéricos para a obtenção dos autovalores e um alteração no método da iteração inversa utilizado na determinação dos autovetores. Temos feito uso de rotinas LAPACK para comparar com as nossas rotinas desenvolvidas em CUDA. De acordo com os resultados, a rotina desenvolvida em CUDA tem a vantagem clara de velocidade quer na precisão simples ou dupla, quando comparado com o estado da arte das rotinas de CPU a partir da biblioteca LAPACK. ______________________________________________________________________________________________ ABSTRACT / Severa branches of human knowledge make use of eigenvalues and eigenvectors, among them we have physics, engineering, economics, etc. The determination of these eigenvalues and eigenvectors can be using various computational routines, som faster than others in this speed increase scenario appears the option to use the parallel computing more specifically the Nvidia’s CUDA is an option that provides a gain of significant speed, this model the routines are performed on the GPU which has several processing cores. Given the great importance of the eigenvalues and eigenvectors the objective of this study is to determine routines that can perform the same calculations with real symmetric tridiagonal matrices more quickly and safely, through parallel computing with use of CUDA. Objective that achieved by some combination of numerical methods to obtain the eigenvalues and a change in the method of inverse iteration used to determine of the eigenvectors, which was used LAPACK routines to compare with routine developed in CUDA. According to the results of the routine developed in CUDA has marked superiority with single or double precision, in the question speed regarding the routines of LAPACK. Matriz simétrica Autovalores Matriz tridiagonal Programação paralela (Computação) Iteração inversa
190	GPGPU based implementation of BLIINDS-II NR-IQA January 2016 (has links) abstract: The technological advances in the past few decades have made possible creation and consumption of digital visual content at an explosive rate. Consequently, there is a need for efficient quality monitoring systems to ensure minimal degradation of images and videos during various processing operations like compression, transmission, storage etc. Objective Image Quality Assessment (IQA) algorithms have been developed that predict quality scores which match well with human subjective quality assessment. However, a lot of research still remains to be done before IQA algorithms can be deployed in real world systems. Long runtimes for one frame of image is a major hurdle. Graphics Processing Units (GPUs), equipped with massive number of computational cores, provide an opportunity to accelerate IQA algorithms by performing computations in parallel. Indeed, General Purpose Graphics Processing Units (GPGPU) techniques have been applied to a few Full Reference IQA algorithms which fall under the. We present a GPGPU implementation of Blind Image Integrity Notator using DCT Statistics (BLIINDS-II), which falls under the No Reference IQA algorithm paradigm. We have been able to achieve a speedup of over 30x over the previous CPU version of this algorithm. We test our implementation using various distorted images from the CSIQ database and present the performance trends observed. We achieve a very consistent performance of around 9 milliseconds per distorted image, which made possible the execution of over 100 images per second (100 fps). / Dissertation/Thesis / Masters Thesis Computer Science 2016 Computer science Computer engineering BLIINDS-II CUDA GPGPU Image Quality Assessment

Search results