Global ETD Search

341	Comparación del uso de GPGPU y cluster de multicore en problemas con alta demanda computacional Montes de Oca, Erica January 2012 (has links) La presente Tesina de Grado tiene como objetivo la investigación y el estudio de las plataformas de memoria compartida GPU y cluster de Multicore para la resolución de problemas con alta demanda computacional. Se presentan soluciones al problema planteado con el fin de comparar rendimiento en sus versiones secuencial, paralela con memoria compartida, paralela con pasaje de mensajes, paralela híbrida y paralela en GPU. Se analiza la bondad de las soluciones en relación al tiempo de ejecución y aceleración, y se introduce el análisis de consumo energético. programación paralela multicore cluster de multicore GPU GPGPU CUDA problemas de alta demanda computacional N-Body Parallel programming Clustering Information Systems Ciencias Informáticas
342	Arquitetura de computação paralela para resolução de problemas de dinâmica dos fluidos e interação fluido-estrutura. / Parallel computing archictecture for solving fluid dynamics and fluid-structure interaction problems. Luiz Felipe Marchetti do Couto 27 June 2016 (has links) Um dos grandes desafios da engenharia atualmente é viabilizar soluções computacionais que reduzam o tempo de processamento e forneçam respostas ainda mais precisas. Frequentemente surgem propostas com as mais diversas abordagens que exploram novas formas de resolver tais problemas ou tentam, ainda, melhorar as soluções existentes. Uma das áreas que se dedica a propor tais melhorias é a computação paralela e de alto desempenho - HPC (High Performance Computing). Técnicas que otimizem o tempo de processamento, algoritmos mais eficientes e computadores mais rápidos abrem novos horizontes possibilitando realizar tarefas que antes eram inviáveis ou levariam muito tempo para serem concluídas. Neste projeto propõe-se a implementação computacional de uma arquitetura de computação paralela com o intuito de resolver, de forma mais eficiente, em comparação com a arquitetura sequencial, problemas de Dinâmica dos Fluidos e Interação Fluido-Estrutura e que também seja possível estender esta arquitetura para a resolução de outros problemas relacionados com o Método dos Elementos Finitos. O objetivo deste trabalho é desenvolver um algoritmo computacional eficiente em linguagem de programação científica C++ e CUDA - de propriedade da NVIDIAr - tendo como base trabalhos anteriores desenvolvidos no LMC (Laboratório de Mecânica Computacional) e, posteriormente, com a arquitetura desenvolvida, executar e investigar problemas de Dinâmica dos Fluidos e Interação Fluido-Estrutura (aplicando o método dos Elementos Finitos com Fronteiras Imersas e a solução direta do sistema de equações lineares com PARDISO) com o auxílio dos computadores do LMC. Uma análise de sensibilidade para cada problema é realizada de forma a encontrar a melhor combinação entre o número de elementos da malha de elementos finitos e o speedup, e posteriormente é feita uma análise comparativa de desempenho entre a arquitetura paralela a sequencial. Com uma única GPU conseguiu-se uma considerável redução no tempo para o assembly das matrizes globais e no tempo total da simulação. / One of the biggest challenges of engineering is enable computational solutions that reduce processing time and provide more accurate numerical solutions. Proposals with several approaches that explore new ways of solving such problems or improve existing solutions emerge. One of the biggest areas dedicated to propose such improvements is the parallel and high performance computing. Techniques that improve the processing time, more efficient algorithms and faster computers open up new horizons allowing to perform tasks that were previously unfeasible or would take too long to complete. We can point out, among several areas of interest, Fluid Dynamics and Interaction Fluid-Structure. In this work it is developed a parallel computing architecture in order to solve numerical problems more efficiently, compared to sequential architecture (e.g. Fluid Dynamics and Fluid-Structure Interaction problems) and it is also possible to extend this architecture to solve different problems (e.g. Structural problems). The objective is to develop an efficient computational algorithm in scientific programming language C ++, based on previous work carried out in Computational Mechanics Laboratory (CML) at Polytechnic School at University of São Paulo, and later with the developed architecture, execute and investigate Fluid Dynamics and Fluid-Structure Interaction problems with the aid of CML computers. A sensitivity analysis is executed for different problems in order to assess the best combination of elements quantity and speedup, and then a perfomance comparison. Using only one GPU, we could get a 10 times speedup compared to a sequential software, using the Finite Element with Immersed Boundary Method and a direct solver (PARDISO). Computação gráfica Dinâmica dos fluídos Interação fluido-estrutura Método dos elementos finitos Multiprogramação e multiprocessamento CUDA Finite elements Fluid-structure interaction High performance computing
343	Real-time Object Recognition on a GPU Pettersson, Johan January 2007 (has links) Shape-Based matching (SBM) is a known method for 2D object recognition that is rather robust against illumination variations, noise, clutter and partial occlusion. The objects to be recognized can be translated, rotated and scaled. The translation of an object is determined by evaluating a similarity measure for all possible positions (similar to cross correlation). The similarity measure is based on dot products between normalized gradient directions in edges. Rotation and scale is determined by evaluating all possible combinations, spanning a huge search space. A resolution pyramid is used to form a heuristic for the search that then gains real-time performance. For SBM, a model consisting of normalized edge gradient directions, are constructed for all possible combinations of rotation and scale. We have avoided this by using (bilinear) interpolation in the search gradient map, which greatly reduces the amount of storage required. SBM is highly parallelizable by nature and with our suggested improvements it becomes much suited for running on a GPU. This have been implemented and tested, and the results clearly outperform those of our reference CPU implementation (with magnitudes of hundreds). It is also very scalable and easily benefits from future devices without effort. An extensive evaluation material and tools for evaluating object recognition algorithms have been developed and the implementation is evaluated and compared to two commercial 2D object recognition solutions. The results show that the method is very powerful when dealing with the distortions listed above and competes well with its opponents. object recognition pattern matching GPU CUDA transformation rotation scale noise illumination occlusion clutter evaluation
344	Simulace lomové zkoušky ve stavebnictví / Simulation of Fracture Tests in Civil Engineering Bordovský, Gabriel January 2017 (has links) In this thesis, a program for fracture test in civil engineering has been optimized. The simulation is used for a validation of the fracture characteristics for blocks of construct material used for historic buildings reconstructure. This thesis illustrates the possibilities of an effective usage of the processor’s potential without the loss of the output quality. The individual parts of the simulation are analyzed and this thesis proposes for the critical sections some possible optimizations such as vectorization or parallel processing. The techniques used in this thesis may be used on similar computing problems and help shorten the required runtime. The prototype of the simulation was able to process the simulation in 7.7 hours. Optimized version is capable to process the same simulation in 2.1 hours on one core or 21 minutes on eight cores. The parallel optimized version is 21 times faster than the prototype.
345	Rekurentní neuronové sítě pro klasifikaci textů / Recurrent Neural Network for Text Classification Myška, Vojtěch January 2018 (has links) Thesis deals with the proposal of the neural networks for classification of positive and negative texts. Development took place in the Python programming language. Design of deep neural network models was performed using the Keras high-level API and the TensorFlow numerical computation library. The computations were performed using GPU with support of the CUDA architecture. The final outcome of the thesis is linguistically independent neural network model for classifying texts at character level reaching up to 93,64% accuracy. Training and testing data were provided by multilingual and Yelp databases. The simulations were performed on 1200000 English, 12000 Czech, German and Spanish texts.
346	Akcelerace ultrazvukových simulací pomocí multi-GPU systémů / Acceleration of Ultrasound Simulations on Multi-GPU Systems Stodůlka, Martin January 2021 (has links) The main focus of this project is usage of multi - GPU systems and usage of CUDA unified memory . Its goal is to accelerate computation of 2D and 3D FFT, which is the main part of simulations in k- Wave library .K- Wave is a C++/ Matlab library used for simulations of propagation of ultrasonic waves in 1D , 2D or 3D space . Acceleration of these functions is necessary , because the simulations are computationally intensive .
347	GPU-akcelerovná syntéza pravděpodobnostních programů / GPU-Accelerated Synthesis of Probabilistic Programs Marcin, Vladimír January 2021 (has links) V tejto práci sa zoberáme problémom automatizovanej syntézy pravdepodobnostných programov: majme konečnú rodinu kandidátnych programov, v ktorej chceme efektívne identifikovať program spĺňajúci danú špecifikáciu. Aj riešenie tých najjednoduchších syntéznych problémov v praxi predstavuje NP-ťažký problém. Pokrok v tejto oblasti prináša nástroj Paynt, ktorý na riešenie tohto problému používa novú integrovanú metódu syntézy pravdepodobnostných programov. Aj keď sa tento prístup dokáže efektívne vysporiadať s exponenciálnym rastom rodín kandidátnych riešení, stále tu existuje problém spôsobený exponenciálnym rastom jednotlivých členov týchto rodín. S cieľom vysporiadať sa aj s týmto problémom, sme implementovali GPU orientované algoritmy slúžiace na overovanie kandidátnych programov (modelov), ktoré danú úlohu paralelizujú na stavovej úrovni pravdepodobnostých modelov. Celkové zrýchlenie doshiahnuté týmto prístupom za určitých podmienok potom prinieslo takmer teoretický limit možného zrýchlenia syntézneho procesu.
348	Detekce objektů na GPU / Object Detection on GPU Macenauer, Pavel January 2015 (has links) This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.
349	Derichův detektor hran / Deriche Edge Detector Němec, Zbyšek January 2012 (has links) This thesis presents the Deriche edge detector as an interesting alternative to the commonly used edge detectors. The Deriche edge detector's design is presented to the reader as well as its strengths and weaknesses. Performance issues of the Deriche edge detector are described in comparison with the Canny edge detector together with recommendations for using the Deriche detector. Finally, edge detection quality of the Deriche edge detector is compared to the Canny edge detector using robust subjective evaluation method.
350	Detekce deformovatelného pole markerů / Detection of Deformable Marker Field Schery, Miroslav January 2013 (has links) This Thesis is focused on study of augmented reality and creation of algorithm for a uniform marker field detector. The marker field is modified to be tolerant to a high degree of deformation. Existing marker types are studied. Important part of the paper is a description of uniform marker field technique, from which a modified assignment is derived. It also describes CUDA architecture on which the first part of the detection algorithm is implemented. Deformation tolerance, detection rate and speed tests are performed on the resulting detector algorithm.

Search results