Spelling suggestions: "subject:"cographic processing unit"" "subject:"12graphic processing unit""
1 |
The Comparison of Using MATLAB, C++ and Parallel Computing for Proton Echo Planar Spectroscopic Imaging ReconstructionTai, Chia-Hsing 10 July 2012 (has links)
Proton echo planar spectroscopic imaging(PEPSI) is a novel and rapid technique of magnetic resonance spectroscopic imaging(MRSI). To analyze the metabolite in PEPSI by using LCModel, an automatic reconstruction system is necessary. Recently, many researches use graphic processing unit(GPU) to accelerate imaging reconstruction, and Compute Unified Device Architecture(CUDA) is developed by C language, so the programmers can write the program in parallel computing easily.
PEPSI data acquisition includes non water suppression and water suppression scans, each scan contains odd and even echoes, these two data are reconstructed separately. The image reconstruction contains k-space filter, time-domain filter, three-dimension fast Fourier transform(FFT), phase correction and combine odd and even data. We use MATLAB, C++ and parallel computing to implement PEPSI reconstruction, and parallel computing applied CUDA which proposed by NVIDIA.
In our study, the averaged non water suppression spectroscopic imaging executed by three different programming language are almost the same. In our data scale, the execution time of parallel computing is faster than MATLAB and C++, especially in the FFT step. Therefore, we simulated and compared the performance of one- to three-dimension FFT.
Our result shows that accelerating performance of GPU depends on the number of data points according to the performance of FFT and the execution time of single coil PEPSI reconstruction. While the amount of data points is larger than 65536, as demonstrated in our study, parallel computing contribute in terms of computational acceleration.
|
2 |
GPGPU design space exploration using neural networksJooya, Ali 28 September 2018 (has links)
General Purpose computing on Graphic Processing Unit (GPGPU) gained atten-
tion in 2006 with NVIDIA’s first Tesla Graphic Processing Unit (GPU) which could
perform high performance computing. Ever since, researchers have been working on software and hardware techniques to improve the efficiency of running general purpose applications on GPUs. The efficiency can be evaluated using metrics such as energy consumption and throughput and is defined based on the requirements of the system. I define it as obtaining high throughput by consuming minimum energy.
GPUs are equipped with a large number of processing units, a high memory
bandwidth, and different types of on-chip memory and caches. To run efficiently,
an application should maximize the utilization of GPU resources. Therefore, a good correspondence between the computing and memory resources of the GPU and those of application is critical. Since an application’s requirements are fixed, the GPU’s configuration should be tuned to these requirements. Having models to study and predict the power consumption and throughput of running a GPGPU application on a given GPU configuration can help achieve high efficiency.
The main purpose of this dissertation is to find a GPU configuration that best
matches the requirements of a given application. I propose three models that predict a GPU configuration that runs an application with maximum throughput while consuming minimum energy.
The first model is a fast, low-cost and effective approach to optimize resource allocation in future GPUs. The model finds the optimal GPU configuration for different available chip real-estate budgets .
The second model considers the power consumption and throughput of a GPGPU application as functions of the GPU configuration parameters. The proposed model accurately predicts the power consumption and throughput of the modeled GPGPU application. I then propose to accelerate the process of building the model using optimization techniques and quantum annealing. I use the proposed model to explore the GPU configuration space of different applications. I apply multiobjective optimization technique to find the configurations that offer minimum power consumption and maximum throughput.
Finally, using clustering and classification techniques, I develop models to re-
late the power consumption and throughput of GPGPU applications to the code
attributes. Both models could accurately predict the optimum configuration for any given GPGPU application.
To build these models I have used different machine learning techniques and optimization methods such as Pareto Front and Knapsack optimization problem. I validated the model produced results with simulation results and showed that the models make accurate predictions.
These models could be used by GPGPU programmers to identify the architectural parameters that most affect an application’s power consumption and throughput.
This information could be translated into software optimization opportunities. Also, these models can be implemented as part of a compiler to help it to make the best optimization decisions. Moreover, GPU manufacturers could gain insight on architectural parameters which would profit GPGPU applications the most in terms of power and performance and hence invest on these. / Graduate
|
3 |
Computação paralela em GPU para resolução de sistemas de equações algébricas resultantes da aplicação do método de elementos finitos em eletromagnetismo. / Parallel computing on GPU for solving systems of algebraic equations resulting from application of finite element method in electromagnetism.Camargos, Ana Flávia Peixoto de 04 August 2014 (has links)
Este trabalho apresenta a aplicação de técnicas de processamento paralelo na resolução de equações algébricas oriundas do Método de Elementos Finitos aplicado ao Eletromagnetismo, nos regimes estático e harmônico. As técnicas de programação paralelas utilizadas foram OpenMP, CUDA e GPUDirect, sendo esta última para as plataformas do tipo Multi-GPU. Os métodos iterativos abordados incluem aqueles do subespaço Krylov: Gradientes Conjugados, Gradientes Biconjugados, Conjugado Residual, Gradientes Biconjugados Estabilizados, Gradientes Conjugados para equações normais (CGNE e CGNR) e Gradientes Conjugados ao Quadrado. Todas as implementações fizeram uso das bibliotecas CUSP, CUSPARSE e CUBLAS. Para problemas estáticos, os seguintes pré-condicionadores foram adotados, todos eles com implementações paralelizadas e executadas na GPU: Decomposições Incompletas LU e de Cholesky, Multigrid Algébrico, Diagonal e Inversa Aproximada. Para os problemas harmônicos, apenas os dois primeiros pré-condicionadores foram utilizados, porém na sua versão sequencial, com execução na CPU, resultando em uma implementação híbrida CPU-GPU. As ferramentas computacionais desenvolvidas foram testadas na simulação de problemas de aterramento elétrico. No caso do regime harmônico, em que o fenômeno é regido pela Equação de Onda completa com perdas e não homogênea, a formulação adotada foi aquela em dois potenciais, A-V aresta-nodal. Em todas as situações, os aplicativos desenvolvidos para GPU apresentaram speedups apreciáveis, demonstrando a potencialidade dessa tecnologia para a simulação de problemas de larga escala na Engenharia Elétrica, com excelente relação custo-benefício. / This work presents the use of parallel processing techniques in Graphics Processing Units (GPU) for the solution of algebraic equations arising from the Finite Element modeling of electromagnetic phenomena, both in steadystate and time-harmonic regime. The techniques used were parallel programming OpenMP, CUDA and GPUDirect, the latter for those platforms of type Multi-GPU. The iterative methods discussed include those of the Krylov subspace: Conjugate Gradients, Bi-conjugate Gradients, Conjugate Residual, Bi-conjugate Gradients Stabilized, Conjugate Gradients for Normal Equations (CGNE and CGNR) and Conjugate Gradients Squared. All implementations have made use of CUSP, CUSPARSE and CUBLAS libraries. For the static problems, the following pre-conditioners were adopted, all with parallelized implementations and executed on the GPU: Incomplete decompositions, both LU and Cholesky, Algebraic Multigrid, Diagonal and Approximate Inverse. For the time-harmonic varying problems, only the first two pre-conditioners were used, but in their sequential version and running in the CPU, which yielded a hybrid CPU-GPU implementation. The developed computational tools were tested in the simulation of electrical grounding systems. In the case of the harmonic regime, in which the phenomenon is governed by the driven, lossy wave equation, the formulation adopted was that in two potential, the ungauged edge A-V formulation. In all cases, the developed GPU-based tools showed considerable speedups, showing that this is a promising technology for the simulation of large-scale Electrical Engineering problems, with excellent cost-benefit.
|
4 |
Computação paralela em GPU para resolução de sistemas de equações algébricas resultantes da aplicação do método de elementos finitos em eletromagnetismo. / Parallel computing on GPU for solving systems of algebraic equations resulting from application of finite element method in electromagnetism.Ana Flávia Peixoto de Camargos 04 August 2014 (has links)
Este trabalho apresenta a aplicação de técnicas de processamento paralelo na resolução de equações algébricas oriundas do Método de Elementos Finitos aplicado ao Eletromagnetismo, nos regimes estático e harmônico. As técnicas de programação paralelas utilizadas foram OpenMP, CUDA e GPUDirect, sendo esta última para as plataformas do tipo Multi-GPU. Os métodos iterativos abordados incluem aqueles do subespaço Krylov: Gradientes Conjugados, Gradientes Biconjugados, Conjugado Residual, Gradientes Biconjugados Estabilizados, Gradientes Conjugados para equações normais (CGNE e CGNR) e Gradientes Conjugados ao Quadrado. Todas as implementações fizeram uso das bibliotecas CUSP, CUSPARSE e CUBLAS. Para problemas estáticos, os seguintes pré-condicionadores foram adotados, todos eles com implementações paralelizadas e executadas na GPU: Decomposições Incompletas LU e de Cholesky, Multigrid Algébrico, Diagonal e Inversa Aproximada. Para os problemas harmônicos, apenas os dois primeiros pré-condicionadores foram utilizados, porém na sua versão sequencial, com execução na CPU, resultando em uma implementação híbrida CPU-GPU. As ferramentas computacionais desenvolvidas foram testadas na simulação de problemas de aterramento elétrico. No caso do regime harmônico, em que o fenômeno é regido pela Equação de Onda completa com perdas e não homogênea, a formulação adotada foi aquela em dois potenciais, A-V aresta-nodal. Em todas as situações, os aplicativos desenvolvidos para GPU apresentaram speedups apreciáveis, demonstrando a potencialidade dessa tecnologia para a simulação de problemas de larga escala na Engenharia Elétrica, com excelente relação custo-benefício. / This work presents the use of parallel processing techniques in Graphics Processing Units (GPU) for the solution of algebraic equations arising from the Finite Element modeling of electromagnetic phenomena, both in steadystate and time-harmonic regime. The techniques used were parallel programming OpenMP, CUDA and GPUDirect, the latter for those platforms of type Multi-GPU. The iterative methods discussed include those of the Krylov subspace: Conjugate Gradients, Bi-conjugate Gradients, Conjugate Residual, Bi-conjugate Gradients Stabilized, Conjugate Gradients for Normal Equations (CGNE and CGNR) and Conjugate Gradients Squared. All implementations have made use of CUSP, CUSPARSE and CUBLAS libraries. For the static problems, the following pre-conditioners were adopted, all with parallelized implementations and executed on the GPU: Incomplete decompositions, both LU and Cholesky, Algebraic Multigrid, Diagonal and Approximate Inverse. For the time-harmonic varying problems, only the first two pre-conditioners were used, but in their sequential version and running in the CPU, which yielded a hybrid CPU-GPU implementation. The developed computational tools were tested in the simulation of electrical grounding systems. In the case of the harmonic regime, in which the phenomenon is governed by the driven, lossy wave equation, the formulation adopted was that in two potential, the ungauged edge A-V formulation. In all cases, the developed GPU-based tools showed considerable speedups, showing that this is a promising technology for the simulation of large-scale Electrical Engineering problems, with excellent cost-benefit.
|
5 |
Applying Contact Angle to a Two-dimensional Smoothed Particle Hydrodynamics (SPH) model on a Graphics Processing Unit (GPU) PlatformFarrokhpanah, Amirsaman 22 November 2012 (has links)
A parallel GPU compatible Lagrangian mesh free particle solver for multiphase fluid flow based on SPH scheme is developed and used to capture the interface evolution during droplet impact. Surface tension is modeled employing the multiphase scheme of Hu et al. (2006). In order to precisely simulate the wetting phenomena, a method based on the work of Šikalo et al. (2005) is jointly used with the model proposed by Afkhami et al. (2009) to ensure accurate dynamic contact angle calculations. Accurate predictions were obtained for droplet contact angle during spreading.
A two-dimensional analytical model is developed as an expansion to the work of Chandra et al. (1991). Results obtain from the solver agrees well to this analytical results.
Effects of memory management techniques along with a variety of task assigning algorithms on GPU are studied. GPU speedups of up to 120 times faster than a single processor CPU were obtained.
|
6 |
Applying Contact Angle to a Two-dimensional Smoothed Particle Hydrodynamics (SPH) model on a Graphics Processing Unit (GPU) PlatformFarrokhpanah, Amirsaman 22 November 2012 (has links)
A parallel GPU compatible Lagrangian mesh free particle solver for multiphase fluid flow based on SPH scheme is developed and used to capture the interface evolution during droplet impact. Surface tension is modeled employing the multiphase scheme of Hu et al. (2006). In order to precisely simulate the wetting phenomena, a method based on the work of Šikalo et al. (2005) is jointly used with the model proposed by Afkhami et al. (2009) to ensure accurate dynamic contact angle calculations. Accurate predictions were obtained for droplet contact angle during spreading.
A two-dimensional analytical model is developed as an expansion to the work of Chandra et al. (1991). Results obtain from the solver agrees well to this analytical results.
Effects of memory management techniques along with a variety of task assigning algorithms on GPU are studied. GPU speedups of up to 120 times faster than a single processor CPU were obtained.
|
7 |
Development of Parallel Architectures for Radar/Video Signal Processing ApplicationsJarrah, Amin January 2014 (has links)
No description available.
|
8 |
應用機器學習預測利差交易的收益 / Application of machine learning to predicting the returns of carry trade吳佳真 Unknown Date (has links)
本研究提出了一個類神經網路機制,可以及時有效的預測利差交易(carry trade)的收益。為了實現及時性,我們將通過Tensorflow和圖形處理單元(GPU)來實作這個機制。此外,類神經網路機制需要處理具有概念飄移和異常值的時間序列數據。而我們將透過設計的實驗來驗證這個機制的及時性與有效性。
在實驗過程中,我們發現在演算法設置不同的參數將影響類神經網路的性能。本研究將討論不同參數下所產生的不同結果。實驗結果表明,我們所提出的類神經網路機制可以預測出利差交易的收益的動向。希望這個研究將對機器學習和金融領域皆有所貢獻。 / This research derives an artificial neural networks (ANN) mechanism for timely and effectively predicting the return of carry trade. To achieve the timeliness, the ANN mechanism is implemented via the infrastructure of TensorFlow and graphic processing unit (GPU). Furthermore, the ANN mechanism needs to cope with the time series data that may have concept-drifting phenomenon and outliers. An experiment is also designed to verify the timeliness and effectiveness of the proposed mechanism.
During the experiment, we find that different parameters we set in the algorithm will affect the performance of the neural network. And this research will discuss the different results in different parameters. Our experiment result represents that the proposed ANN mechanism can predict movement of the returns of carry trade well. Hope this research would contribute for both machine learning and finance field.
|
9 |
Modélisation ultra-rapide des transferts de chaleur par rayonnement et par conduction et exemple d'application / Fast Modeling of Radiation and Conduction Heat Transfer and application exampleGhannam, Boutros 19 October 2012 (has links)
L'apparition de CUDA en 2007 a rendu les GPU hautement programmables permettant ainsi aux applications scientifiques et techniques de profiter de leur capacité de calcul élevée. Des solutions ultra-rapides pour la résolution des transferts de chaleur par rayonnement et par conduction sur GPU sont présentées dans ce travail. Tout d'abord, la méthode MACZM pour le calcul des facteurs de transferts radiatifs directs en 3D et en milieu semi-transparent est représentée et validée. Ensuite, une implémentation efficace de la méthode à la base d'algorithmes de géométrie discrète et d'une parallélisation optimisée sur GPU dans CUDA atteignant 300 à 600 fois d'accélération, est présentée. Ceci est suivi par la formulation du NRPA, une version non-récursive de l'algorithme des revêtements pour le calcul des facteurs d'échange radiatifs totaux. La complexité du NRPA est inférieure à celle du PA et sont exécution sur GPU est jusqu'à 750 fois plus rapide que l'exécution du PA sur CPU. D'autre part, une implémentation efficace de la LOD sur GPU est présentée, consistant d'une alternance optimisée des solveurs et schémas de parallélisation et achevant une accélération GPU de 75 à 250 fois. Finalement, toutes les méthodes sont appliquées ensemble pour la résolution des transferts de chaleur en 3D dans un four de réchauffage sidérurgique de brames d'acier. Dans ce but, MACZM est appliquée avec un maillage multi-grille et le NRPA est appliqué au four en le découpant en zones, permettant d'avoir un temps de calcul très rapide une précision élevée. Ceci rend les méthodes utilisées de très grande importance pour la conception de stratégies de contrôle efficaces et précises. / The release of CUDA by NVIDIA in 2007 has tremendously increased GPU programmability, thus allowing scientific and engineering applications to take advantage of the high GPU compute capability. In this work, we present ultra-fast solutions for radiation and diffusion heat transfer on the GPU. First, the Multiple Absorption Coefficient Zonal Method (MACZM) for computing direct radiative exchange factors in 3D semi-transparent media is reviewed and validated. Then, an efficient implementation for MACZM is presented, based on discrete geometry algorithms, and an optimized GPU CUDA parallelization. The CUDA implementation achieves 300 to 600 times speed-up. The Non-recursive Plating Algorithm (NRPA), a non-recursive version of the plating algorithm for computing total exchange factors is then formulated. Due to low-complexity matrix multiplication algorithms, the NRPA has lower complexity than the PA does and it runs up to 750 times faster on the GPU by comparison to the CPU PA. On the other hand, an efficient GPU implementation for the Locally One Dimensional (LOD) finite difference split method for solving heat diffusion is presented, based on an optimiwed alternation between parallelization schemes and equation solvers, achieving accelerations from 75 to 250 times. Finally, all the methods are applied together for solving 3D heat transfer in a steel reheating furnace. A multi-grid approach is applied for MACZM and a zone-by zone computation for the NRPA. As a result, high precision and very fast computation time are achieved, making the methods of high interest for building precise and efficient control units.
|
10 |
Detekce pohyblivého objektu ve videu na CUDA / Moving Object Detection in Video Using CUDAČermák, Michal January 2011 (has links)
This thesis deals with model-based approach to 3D tracking from monocular video. The 3D model pose dynamically estimated through minimization of objective function by particle filter. Objective function is based on rendered scene to real video similarity.
|
Page generated in 0.1314 seconds