Global ETD Search

11	Implementace neúplného inverzního rozkladu na grafických kartách / Implementing incomplete inverse decomposition on graphical processing units Dědeček, Jan January 2013 (has links) The goal of this Thesis was to evaluate a possibility to solve systems of linear algebraic equations with the help of graphical processing units (GPUs). While such solvers for generally dense systems seem to be more or less a part of standard production libraries, the Thesis concentrates on this low-level parallelization of equations with a sparse system that still presents a challenge. In particular, the Thesis considers a specific algorithm of an approximate inverse decomposition of symmetric and positive definite systems combined with the conjugate gradient method. An important part of this work is an innovative parallel implementation. The presented experimental results for systems of various sizes and sparsity structures point out that the approach is rather promising and should be further developed. Summarizing our results, efficient preconditioning of sparse systems by approximate inverses on GPUs seems to be worth of consideration. Powered by TCPDF (www.tcpdf.org)
12	Parallélisation de simulations physiques utilisant un modéle de Boltzmann mullti-phases et multi-composants en vue d'un épandage de GNL sur sol / Parallelisation of physical simulations using Boltzmann method multiphase and multicomponent with the aim of manuring GNL on ground Duchateau, Julien 09 December 2015 (has links) Cette thèse a pour but de définir et de développer des solutions informatiques de manière à permettre la mise en place de simulations physiques sur des domaines de simulation très grands tels qu'un site industriel comme le terminal méthanier de Dunkerque. Le modèle d'écoulement mis en place est basé sur la méthode de Boltzmann sur réseau et permet de gérer de nombreux cas de simulation. Différentes architectures de calculs sont étudiées dans ce travail de thèse. L'utilisation du processeur central ainsi que de processeurs graphiques pour la parallélisation des calculs est abordée. Des solutions sont mises en place de manière à obtenir une parallélisation efficace du modèle de calcul sur plusieurs GPUS pouvant calculer en parallèle. Une approche de maillage progressif du maillage de simulation est également abordée pour gérer dynamiquement la quantité de mémoire nécessaire pour simuler en fonction des besoins de la simulation et de sa progression. Son intégration sur une architecture de calcul composée de plusieurs processeurs graphiques est également mise en avant. Finalement, une solution de type "Out-of-core" a été mise en place pour traiter des cas où la mémoire liée aux processeurs graphiques est insuffisante pour simuler. En effet, les processeurs graphiques disposent généralement d'une quantité de mémoire nettement inférieure à celle de la RAM du processeur central. La mise en place d'un système d'échange efficace entre les processeurs graphiques et la RAM est donc essentielle. / This thesis has for goal to define and develop solutions in order to achieve physical simulations on large simulation domains such as industrial sites (Dunkerque LNG Terminal). The simulation model is based on the lattice Boltzmann method (LBM) and allows to treat several simulation cases. The use of several computing architectures are studied in this work. The use of a multicore central processing unit (CPU) and also several graphics processing units (GPUS) is considered. An efficient parllelization of the simulation model is obtained by the use of several GPUS able to calculate in parallel. A progressive mesh algorithm is also defined in order to automatically mesh the simulation domain according to fluids propagation. Its integration on a multi-GPU architecture is studied. Finally, an "out-of-core" method is introduced in order to handle cases that require more memory than all GPUS have. Indeed, GPU memory is generally significantly inferior to the CPU memory. The definition of an exchange system between GPUS and the CPU is therefore essential. Parallélisme GPU Simulation Méthode de Boltzmann Parallelism Graphics Processing Unit Simulation Boltzmann method
13	Využití grafických procesorů v úlohách celočíselného programování / Solving vehicle routing problems and algorithm implementation on GPU Hájek, Jan January 2010 (has links) A very wide-ranging subgroup of vehicle routing problems from the graph theory is a common and frequent problem handled daily by transport companies, airline businesses, hi-tech companies with planning drilling of printed circuits boards or other companies from different industries. During numerous previous researches of these problems a lot of analyses were made and many solutions proposed -- of which an outline is in this paper. Some of them giving better or worse results in longer or shorter computing time. In spite of the fact that the processors and new technologies performance is increasing, with some algorithms we cannon compute the result in a reasonable time. That is why this paper is asking a question, if there can be found a fitting algorithm which could be applied on different and faster processing unit structures so it could be ensured a multiple computing speed increase so far. The analysis was carried out using computer experiments on a new build and implemented branch and bound algorithm with a matrix rate reduction.
14	SSAGA: Streaming Multiprocessors (SMs) Sculpted for Asymmetric General Purpose Graphics Processing Unit (GPGPU) Applications Saha, Shamik 01 May 2016 (has links) The evolution of the Graphics Processing Units (GPUs) over the last decade, has reinforced general purpose computing while sustaining a steady performance growth in graphics intensive applications. However, the immense performance improvement is generally associated with a steep rise in GPU power consumption. Consequently, GPUs are already close to the abominable power wall. With a massive popularity of the mobile devices running general-purpose GPU (GPGPU) applications, it is of utmost importance to ensure a high energy efficiency, while meeting the strict performance requirements. In this work, we demonstrate that, customizing a Streaming Multiprocessor (SM) of a GPU, at a lower frequency, is significantly more energy efficient, compared to employing Dynamic Voltage and Frequency Scaling (DVFS) on an SM, designed for a high frequency operation. Using a system level Computer Aided Design (CAD) technique, we propose SSAGA - Streaming Multiprocessors Sculpted for Asymmetric GPGPU Applications, an energy efficient GPU design paradigm. SSAGA creates architecturally identical SM cores, customized for different voltage-frequency domains. graphics processing unit computer aided design custom design energy efficiency VLSI Electrical and Computer Engineering
15	Cellular matrix for parallel k-means and local search to Euclidean grid matching / Matrice cellulaire pour des algorithmes parallèles de k-means et de recherche locale appliqués à des problèmes euclidiens d’appariement de graphes Wang, Hongjian 03 December 2015 (has links) Dans cette thèse, nous proposons un modèle de calcul parallèle, appelé « matrice cellulaire », pour apporter des réponses aux problématiques de calcul parallèle appliqué à la résolution de problèmes d’appariement de graphes euclidiens. Ces problèmes d’optimisation NP-difficiles font intervenir des données réparties dans le plan et des structures élastiques représentées par des graphes qui doivent s’apparier aux données. Ils recouvrent des problèmes connus sous des appellations diverses telles que geometric k-means, elastic net, topographic mapping, elastic image matching. Ils permettent de modéliser par exemple le problème du voyageur de commerce euclidien, le problème du cycle médian, ainsi que des problèmes de mise en correspondance d’images. La contribution présentée est divisée en trois parties. Dans la première partie, nous présentons le modèle de matrice cellulaire qui partitionne les données et définit le niveau de granularité du calcul parallèle. Nous présentons une boucle générique de calcul parallèle qui modélise le principe des projections de graphes et de leur appariement. Dans la deuxième partie, nous appliquons le modèle de calcul parallèle aux algorithmes de k-means avec topologie dans le plan. Les algorithmes proposés sont appliqués au voyageur de commerce, à la génération de maillage structuré et à la segmentation d'image suivant le concept de superpixel. L’approche est nommée superpixel adaptive segmentation map (SPASM). Dans la troisième partie, nous proposons un algorithme de recherche locale parallèle, appelé distributed local search (DLS). La solution du problème résulte des opérations locales sur les structures et les données réparties dans le plan, incluant des évaluations, des recherches de voisinage, et des mouvements structurés. L’algorithme est appliqué à des problèmes d’appariement de graphe tels que le stéréo-matching et le problème de flot optique. / In this thesis, we propose a parallel computing model, called cellular matrix, to provide answers to problematic issues of parallel computation when applied to Euclidean graph matching problems. These NP-hard optimization problems involve data distributed in the plane and elastic structures represented by graphs that must match the data. They include problems known under various names, such as geometric k-means, elastic net, topographic mapping, and elastic image matching. The Euclidean traveling salesman problem (TSP), the median cycle problem, and the image matching problem are also examples that can be modeled by graph matching. The contribution presented is divided into three parts. In the first part, we present the cellular matrix model that partitions data and defines the level of granularity of parallel computation. We present a generic loop for parallel computations, and this loop models the projection between graphs and their matching. In the second part, we apply the parallel computing model to k-means algorithms in the plane extended with topology. The proposed algorithms are applied to the TSP, structured mesh generation, and image segmentation following the concept of superpixel. The approach is called superpixel adaptive segmentation map (SPASM). In the third part, we propose a parallel local search algorithm, called distributed local search (DLS). The solution results from the many local operations, including local evaluation, neighborhood search, and structured move, performed on the distributed data in the plane. The algorithm is applied to Euclidean graph matching problems including stereo matching and optical flow. Matrice cellulaire L'appariement de graphes K-means Recherche locale Algorithmiques parallèles Graphics processing unit (GPU) 620
16	Computational Medical Image Analysis : With a Focus on Real-Time fMRI and Non-Parametric Statistics Eklund, Anders January 2012 (has links) Functional magnetic resonance imaging (fMRI) is a prime example of multi-disciplinary research. Without the beautiful physics of MRI, there wouldnot be any images to look at in the first place. To obtain images of goodquality, it is necessary to fully understand the concepts of the frequencydomain. The analysis of fMRI data requires understanding of signal pro-cessing, statistics and knowledge about the anatomy and function of thehuman brain. The resulting brain activity maps are used by physicians,neurologists, psychologists and behaviourists, in order to plan surgery andto increase their understanding of how the brain works. This thesis presents methods for real-time fMRI and non-parametric fMRIanalysis. Real-time fMRI places high demands on the signal processing,as all the calculations have to be made in real-time in complex situations.Real-time fMRI can, for example, be used for interactive brain mapping.Another possibility is to change the stimulus that is given to the subject, inreal-time, such that the brain and the computer can work together to solvea given task, yielding a brain computer interface (BCI). Non-parametricfMRI analysis, for example, concerns the problem of calculating signifi-cance thresholds and p-values for test statistics without a parametric nulldistribution. Two BCIs are presented in this thesis. In the first BCI, the subject wasable to balance a virtual inverted pendulum by thinking of activating theleft or right hand or resting. In the second BCI, the subject in the MRscanner was able to communicate with a person outside the MR scanner,through a virtual keyboard. A graphics processing unit (GPU) implementation of a random permuta-tion test for single subject fMRI analysis is also presented. The randompermutation test is used to calculate significance thresholds and p-values forfMRI analysis by canonical correlation analysis (CCA), and to investigatethe correctness of standard parametric approaches. The random permuta-tion test was verified by using 10 000 noise datasets and 1484 resting statefMRI datasets. The random permutation test is also used for a non-localCCA approach to fMRI analysis. functional magnetic resonance imaging brain computer interfaces canonical correlation analysis random permutation test graphics processing unit
17	Design of a Multi-Core Multi-thread Floating-Point Processor and Its Application in Computer Graphics Yeh, Chia-Yu 06 September 2011 (has links) Graphics processing unit (GPU) designs usually adopts various computer architecture techniques to boost the computation speed, including single-instruction multiple data (SIMD), very-long-instruction word (VLIW), multi-threading, and/or multi-core. In OpenGL ES 2.0, user programmable vertex shader (VS) hardware unit can be designed using vectored SIMD computation unit so that it can efficiently compute the matrix-vector multiplication, one of the key operations in vertex transformation. Recently, high-performance GPU, such as Telsa series from nVidia, is designed with many-core architectures with each core responsible for scalar operations. The intention is to allow for efficient execution of general-purpose computations in addition to the specialized graphics computations. In this thesis, we design a scalar-based multi-threaded GPU design that is composed of four scalar processors, one special-function unit, and can execute multi-threaded instructions. We use the example of vertex transformation to demonstrate execution efficiency of the scalar-based multi-threaded GPU. We also make comparison with the vector-based SIMD GPU. multi-threading graphics processing unit (GPU) vertex shader SIMD matrix-vector multiplication OpenGL ES 2.0
18	A High Performance Register Allocator for Vector Architectures with a Unified Register-Set Su, Yu-Dan 29 June 2012 (has links) This thesis describes a compiler optimization targeted for machines with unified, vector-based register sets. This optimization combines register allocation and instruction scheduling. It examines places where the code performs computations on scalar variables. The goal is to identify instances where the same operation is performed. For example, a program might calculate ¡§base+offset¡¨ and then calculate ¡§i+j¡¨. Even though these computations are unrelated, yet they use the same operator; if ¡§base¡¨ and ¡§i¡¨ are packed into one vector register, while ¡§offset¡¨ and ¡§j¡¨ are packed into another, then these two computations can be performed simultaneously through the vectors¡¦ parallel addition operation. This would reduce the execution time of the compiled code. Although other researchers have considered similar packing methods, their work has been limited by the hardware that they were studying. Such hardware usually imposed high costs for moving data between scalar and vector register banks. This present thesis, however, considers a novel hardware architecture that imposes no such costs. As a consequence, we are able to obtain significant speedups. The architecture that we consider is a Graphics Processing Unit (GPU) for embedded systems that is under development at this university. This GPU has a single register set for integers, float, and vectors. instruction scheduling register allocator compiler optimization unified register set vector architecture novel Graphics Processing Unit
19	The Comparison of Using MATLAB, C++ and Parallel Computing for Proton Echo Planar Spectroscopic Imaging Reconstruction Tai, Chia-Hsing 10 July 2012 (has links) Proton echo planar spectroscopic imaging(PEPSI) is a novel and rapid technique of magnetic resonance spectroscopic imaging(MRSI). To analyze the metabolite in PEPSI by using LCModel, an automatic reconstruction system is necessary. Recently, many researches use graphic processing unit(GPU) to accelerate imaging reconstruction, and Compute Unified Device Architecture(CUDA) is developed by C language, so the programmers can write the program in parallel computing easily. PEPSI data acquisition includes non water suppression and water suppression scans, each scan contains odd and even echoes, these two data are reconstructed separately. The image reconstruction contains k-space filter, time-domain filter, three-dimension fast Fourier transform(FFT), phase correction and combine odd and even data. We use MATLAB, C++ and parallel computing to implement PEPSI reconstruction, and parallel computing applied CUDA which proposed by NVIDIA. In our study, the averaged non water suppression spectroscopic imaging executed by three different programming language are almost the same. In our data scale, the execution time of parallel computing is faster than MATLAB and C++, especially in the FFT step. Therefore, we simulated and compared the performance of one- to three-dimension FFT. Our result shows that accelerating performance of GPU depends on the number of data points according to the performance of FFT and the execution time of single coil PEPSI reconstruction. While the amount of data points is larger than 65536, as demonstrated in our study, parallel computing contribute in terms of computational acceleration. Parallel Computing Reconstruction Graphic Processing Unit Proton Echo Planar Spectroscopic Imaging Magnetic Resonance Spectroscopic Imaging
20	Σχεδίαση & υλοποίηση ενός μικροϋπολογιστικού συστήματος βασισμένου σε μια επαυξημένη σχετικά απλή CPU Γαλετάκης, Εμμανουήλ 26 July 2012 (has links) Η παρούσα ειδική ερευνητική εργασία εκπονήθηκε στα πλαίσια του Διατμηματικού Προγράμματος Μεταπτυχιακών Σπουδών Ειδίκευσης στην “Ηλεκτρονική και Επεξεργασία της Πληροφορίας” στο Τμήμα Φυσικής του Πανεπιστημίου Πατρών. Αντικείμενο της παρούσας εργασίας είναι η σχεδίαση και ανάπτυξη ενός βασικού μικροϋπολογιστικού συστήματος με τη χρήση της VHDL και FPGAs. Το σύστημα βασίζεται σε μία επαυξημένη, σε δυνατότητες, εκδοχή της σχετικά απλής cpu του Carpinelli και ενσωματώνει τη δυνατότητα παράλληλης διασύνδεσης μίας σειράς περιφερειακών διατάξεων και υποκυκλωμάτων. Στο πρώτο κεφάλαιο παρουσιάζεται πλήρως η σχεδίαση ενός τέτοιου συστήματος και μελετάται η δομή των επιμέρους δομικών στοιχείων που το απαρτίζουν. Στο δεύτερο κεφάλαιο παρουσιάζεται η περιγραφή του μικροϋπολογιστικού συστήματος σε γλώσσα VHDL και η πλήρης εξομοίωσή του με τη βοήθεια του λογισμικού Quartus v7.2 της ALTERA. Στο τελευταίο κεφάλαιο παρουσιάζεται η υλοποίηση του μικροϋπολογιστικού συστήματος στην αναπτυξιακή πλατφόρμα DE2 της εταιρείας ALTERA. / This project objective is the design and development of an FPGA based microcomputer system in VHDL. The system is based on an enhanced version of Carpinelli’s relative simple cpu and is implemented with parallel input and output ports and interrupts. The first chapter presents the full design of such a system and study the structure of the individual components that compose it. The second chapter presents the implementation of the microcomputer system in VHDL and the simulation results using Quartus v7.2 software suite. The last chapter presents the implementation of the system in a FPGA using DE2 development board of ALTERA. 004.16 Central Processing Unit (CPU) Field-programmable gate array (FPGA) VHDL DE2

Search results