421 |
Inferência de redes de regulação gênica usando algoritmo de busca exaustiva em clusters de GPUsBorelli, Fabrizio Ferreira January 2013 (has links)
Orientador: Luiz Carlos da Silva Rozante / Dissertação (mestrado) - Universidade Federal do ABC. Programa de Pós-Graduação em Ciência da Computação, 2013
|
422 |
Parallélisation de simulations physiques utilisant un modéle de Boltzmann mullti-phases et multi-composants en vue d'un épandage de GNL sur sol / Parallelisation of physical simulations using Boltzmann method multiphase and multicomponent with the aim of manuring GNL on groundDuchateau, Julien 09 December 2015 (has links)
Cette thèse a pour but de définir et de développer des solutions informatiques de manière à permettre la mise en place de simulations physiques sur des domaines de simulation très grands tels qu'un site industriel comme le terminal méthanier de Dunkerque. Le modèle d'écoulement mis en place est basé sur la méthode de Boltzmann sur réseau et permet de gérer de nombreux cas de simulation. Différentes architectures de calculs sont étudiées dans ce travail de thèse. L'utilisation du processeur central ainsi que de processeurs graphiques pour la parallélisation des calculs est abordée. Des solutions sont mises en place de manière à obtenir une parallélisation efficace du modèle de calcul sur plusieurs GPUS pouvant calculer en parallèle. Une approche de maillage progressif du maillage de simulation est également abordée pour gérer dynamiquement la quantité de mémoire nécessaire pour simuler en fonction des besoins de la simulation et de sa progression. Son intégration sur une architecture de calcul composée de plusieurs processeurs graphiques est également mise en avant. Finalement, une solution de type "Out-of-core" a été mise en place pour traiter des cas où la mémoire liée aux processeurs graphiques est insuffisante pour simuler. En effet, les processeurs graphiques disposent généralement d'une quantité de mémoire nettement inférieure à celle de la RAM du processeur central. La mise en place d'un système d'échange efficace entre les processeurs graphiques et la RAM est donc essentielle. / This thesis has for goal to define and develop solutions in order to achieve physical simulations on large simulation domains such as industrial sites (Dunkerque LNG Terminal). The simulation model is based on the lattice Boltzmann method (LBM) and allows to treat several simulation cases. The use of several computing architectures are studied in this work. The use of a multicore central processing unit (CPU) and also several graphics processing units (GPUS) is considered. An efficient parllelization of the simulation model is obtained by the use of several GPUS able to calculate in parallel. A progressive mesh algorithm is also defined in order to automatically mesh the simulation domain according to fluids propagation. Its integration on a multi-GPU architecture is studied. Finally, an "out-of-core" method is introduced in order to handle cases that require more memory than all GPUS have. Indeed, GPU memory is generally significantly inferior to the CPU memory. The definition of an exchange system between GPUS and the CPU is therefore essential.
|
423 |
Plateforme de calcul parallèle « Design for Demise » / Parallel computing platform « Design for Demise »Plazolles, Bastien 10 January 2017 (has links)
Les risques liés aux débris spatiaux sont à présent considérés comme critiques par les gouvernements et les agences spa-tiales internationales. Durant la dernière décennie les agences spatiales ont développé des logiciels pour simuler la rentrée atmosphérique des satellites et des stations orbitales afin de déterminer les risques et possibles dommages au sol. Néan-moins les outils actuels fournissent des résultats déterministes alors que les modèles employés utilisent des valeurs de paramètres qui sont mal connues. De plus les résultats obtenus dépendent fortement des hypothèses qui sont faites. Une solution pour obtenir des résultats pertinents et exploitables est de prendre en considération les incertitudes que l’on a sur les différents paramètres de la modélisation afin d’effectuer des analyses de type Monte-Carlo. Mais une telle étude est particulièrement gourmande en temps de calcul à cause du grand espace des paramètres à explorer (ce qui nécessite des centaines de milliers de simulations numériques). Dans le cadre de ces travaux de thèse nous proposons un nouveau logiciel de simulation numérique de rentrée atmosphérique de satellite, permettant de façon native de prendre en consi-dération les incertitudes sur les différents paramètres de modélisations pour effectuer des analyses statistiques. Afin de maitriser les temps de calculs cet outil tire avantage de la méthode de Taguchi pour réduire le nombre de paramètres à étudier et aussi des accélérateurs de calculs de type Graphics Processing Units (GPUs) et Intel Xeon Phi. / The risk of space debris is now perceived as primordial by government and international space agencies. Since the last decade, international space agencies have developed tools to simulate the re-entry of satellites and orbital stations in order to assess casualty risk on the ground. Nevertheless , all current tools provide deterministic solutions, though models include various parameters that are not well known. Therefore, the provided results are strongly dependent on the as-sumptions made. One solution to obtain relevant and exploitable results is to include uncertainties around those parame-ters in order to perform Monte-Carlo analysis. But such a study is very time consuming due to the large parameter space to explore (that necessitate hundreds of thousands simulations). As part of this thesis work we propose a new satellite atmospheric reentry simulation to perform statistical analysis. To master computing time this tool takes advantage of Taguchi method to restrain the amount of parameter to study and also takes advantage of computing accelerators like Graphic Processing Units (GPUs) and Intel Xeon Phi.
|
424 |
Advanced optimization and sampling techniques for biomolecules using a polarizable force fieldLitman, Jacob Mordechai 01 May 2019 (has links)
Biophysical simulation can be an excellent complement to experimental techniques, but there are unresolved practical constraints to simulation. While computers have continued to improve, the scale of systems we wish to study has continued to increase. This has driven the use of approximate energy functions (force fields), compensating for relatively short simulations via careful structure preparation and accelerated sampling techniques. To address structure preparation, we developed the many-body dead end elimination (MB-DEE) optimizer. We first proved the MB-DEE algorithm on a set of PCNA crystal structures, and accelerated it on GPUs to optimize 472 homology models of proteins implicated in inherited deafness. Advanced physics has been clearly demonstrated to help optimize structures, and with GPU acceleration, this becomes a possibility for large numbers of structures. We also show the novel “simultaneous bookending” algorithm, which is a new approach to indirect free energy (IFE) methods. These first perform simulations under a cheaper “reference” potential, then correct the thermodynamics to a more sophisticated “target” potential, combining the speed of the reference potential with the accuracy of the target potential. Simultaneous bookending is shown as a valid IFE approach, and methods to realize speedups vs. the direct path are discussed. Finally, we are developing the Monte Carlo Orthogonal Space Random Walk (MC-OSRW) algorithm for high-performance alchemical free energy simulations, bypassing some of the difficulty in OSRW methods. This work helps prevent inaccuracies caused by simpler electrostatic models by making advanced polarizable force fields more accessible for routine simulation.
|
425 |
Methods for improving performance of particle tracking and image registration in computational lung modeling using multi-core CPUs And GPUsEllingwood, Nathan David 01 December 2014 (has links)
Graphics Processing Units (GPUs) have grown in popularity beyond the original video game enthusiast audience. They have been embraced by the high-performance computing community due to their high computational throughput, low cost, low energy demands, wide availability, and ability to dramatically improve application performance. In addition, as hybrid computing continues into mainstream applications, the use of GPUs will continue to grow. However, due to architectural difference between the CPU and GPU, adapting CPU-based scientific computing applications to fully exploit the potential speedup that GPUs offer is a non-trivial task. Algorithms must be designed with the architecture benefits and limitations in mind in order to unlock the full performance gains afforded by the use GPU. In this work, we develop fast GPU methods to improve the performance of two important components in computational lung modeling - image registration and particle tracking. We first propose a novel method for multi-level mass-preserving deformable image registration. The strength of this method is that it allows for flexibility of choice for the similarity criteria to be used by the registration method, making possible the implementation of simple and complex similarity measures on the GPU with excellent performance results. The method is tested using three similarity criteria for registering two CT lung datasets - the commonly used sum of squared intensity differences (SSD), the sum of squared tissue value differences (SSTVD), and a symmetric version of SSTVD currently being developed by our research group. The GPU method is validated against a previously validated single-threaded CPU counterpart using six healthy human subjects, and demonstrated strong agreement of results. Separately, three GPU methods were developed for tracking particle trajectories and deposition efficiencies in the human airway tree, including a multiple-GPU method. Though parallelization was straightforward, the complex geometry of the lungs and use of an unstructured mesh provided challenges that were addressed by the GPU methods. The results of the GPU methods were tested for various numbers of particles and compared to a previously validated single-threaded CPU version and demonstrated dramatic speedup over the single-threaded CPU version and 12-threaded CPU versions.
|
426 |
IMPROVING PERFORMANCE AND ENERGY EFFICIENCY FOR THE INTEGRATED CPU-GPU HETEROGENEOUS SYSTEMSWen, Hao 01 January 2018 (has links)
Current heterogeneous CPU-GPU architectures integrate general purpose CPUs and highly thread-level parallelized GPUs (Graphic Processing Units) in the same die. This dissertation focuses on improving the energy efficiency and performance for the heterogeneous CPU-GPU system.
Leakage energy has become an increasingly large fraction of total energy consumption, making it important to reduce leakage energy for improving the overall energy efficiency. Cache occupies a large on-chip area, which are good targets for leakage energy reduction. For the CPU cache, we study how to reduce the cache leakage energy efficiently in a hybrid SPM (Scratch-Pad Memory) and cache architecture. For the GPU cache, the access pattern of GPU cache is different from the CPU, which usually has little locality and high miss rate. In addition, GPU can hide memory latency more effectively due to multi-threading. Because of the above reasons, we find it is possible to place the cache lines of the GPU data caches into the low power mode more aggressively than traditional leakage management for CPU caches, which can reduce more leakage energy without significant performance degradation.
The contention in shared resources between CPU and GPU, such as the last level cache (LLC), interconnection network and DRAM, may degrade both CPU and GPU performance. We propose a simple yet effective method based on probability to control the LLC replacement policy for reducing the CPU’s inter-core conflict misses caused by GPU without significantly impacting GPU performance. In addition, we develop two strategies to combine the probability based method for the LLC and an existing technique called virtual channel partition (VCP) for the interconnection network to further improve the CPU performance.
For a specific graph application of Breadth first search (BFS), which is a basis for graph search and a core building block for many higher-level graph analysis applications, it is a typical example of parallel computation that is inefficient on GPU architectures. In a graph, a small portion of nodes may have a large number of neighbors, which leads to irregular tasks on GPUs. These irregularities limit the parallelism of BFS executing on GPUs. Unlike the previous works focusing on fine-grained task management to address the irregularity, we propose Virtual-BFS (VBFS) to virtually change the graph itself. By adding virtual vertices, the high-degree nodes in the graph are divided into groups that have an equal number of neighbors, which increases the parallelism such that more GPU threads can work concurrently. This approach ensures correctness and can significantly improve both the performance and energy efficiency on GPUs.
|
427 |
Structural and functional assessments of COPD populations via image registration and unsupervised machine learningHaghighi, Babak 01 August 2018 (has links)
There is notable heterogeneity in clinical presentation of patients with chronic obstructive pulmonary disease (COPD). Classification of COPD is usually based on the severity of airflow limitation (pre- and post- bronchodilator FEV1), which may not sensitively differentiate subpopulations with distinct phenotypes. A recent advance of quantitative medical imaging and data analysis techniques allows for deriving quantitative computed tomography (QCT) imaging-based metrics. These imaging-based metrics can be used to link structural and functional alterations at multiscale levels of human lung. We acquired QCT images of 800 former and current smokers from Subpopulations and Intermediate Outcomes in COPD Study (SPIROMICS). A GPU-based symmetric non-rigid image registration method was applied at expiration and inspiration to derived QCT-based imaging metrics at multiscale levels. With these imaging-based variables, we employed a machine learning method (an unsupervised clustering technique (K-means)) to identify imaging-based clusters. Four clusters were identified for both current and former smokers. Four clusters were identified for both current and former smokers with meaningful associations with clinical and biomarker measures. Results demonstrated that QCT imaging-based variables in patients with COPD can derive statistically stable and clinically meaningful clusters. This sub-grouping can help better categorize the disease phenotypes, ultimately leading to a development of an efficient therapy.
|
428 |
Cellular matrix for parallel k-means and local search to Euclidean grid matching / Matrice cellulaire pour des algorithmes parallèles de k-means et de recherche locale appliqués à des problèmes euclidiens d’appariement de graphesWang, Hongjian 03 December 2015 (has links)
Dans cette thèse, nous proposons un modèle de calcul parallèle, appelé « matrice cellulaire », pour apporter des réponses aux problématiques de calcul parallèle appliqué à la résolution de problèmes d’appariement de graphes euclidiens. Ces problèmes d’optimisation NP-difficiles font intervenir des données réparties dans le plan et des structures élastiques représentées par des graphes qui doivent s’apparier aux données. Ils recouvrent des problèmes connus sous des appellations diverses telles que geometric k-means, elastic net, topographic mapping, elastic image matching. Ils permettent de modéliser par exemple le problème du voyageur de commerce euclidien, le problème du cycle médian, ainsi que des problèmes de mise en correspondance d’images. La contribution présentée est divisée en trois parties. Dans la première partie, nous présentons le modèle de matrice cellulaire qui partitionne les données et définit le niveau de granularité du calcul parallèle. Nous présentons une boucle générique de calcul parallèle qui modélise le principe des projections de graphes et de leur appariement. Dans la deuxième partie, nous appliquons le modèle de calcul parallèle aux algorithmes de k-means avec topologie dans le plan. Les algorithmes proposés sont appliqués au voyageur de commerce, à la génération de maillage structuré et à la segmentation d'image suivant le concept de superpixel. L’approche est nommée superpixel adaptive segmentation map (SPASM). Dans la troisième partie, nous proposons un algorithme de recherche locale parallèle, appelé distributed local search (DLS). La solution du problème résulte des opérations locales sur les structures et les données réparties dans le plan, incluant des évaluations, des recherches de voisinage, et des mouvements structurés. L’algorithme est appliqué à des problèmes d’appariement de graphe tels que le stéréo-matching et le problème de flot optique. / In this thesis, we propose a parallel computing model, called cellular matrix, to provide answers to problematic issues of parallel computation when applied to Euclidean graph matching problems. These NP-hard optimization problems involve data distributed in the plane and elastic structures represented by graphs that must match the data. They include problems known under various names, such as geometric k-means, elastic net, topographic mapping, and elastic image matching. The Euclidean traveling salesman problem (TSP), the median cycle problem, and the image matching problem are also examples that can be modeled by graph matching. The contribution presented is divided into three parts. In the first part, we present the cellular matrix model that partitions data and defines the level of granularity of parallel computation. We present a generic loop for parallel computations, and this loop models the projection between graphs and their matching. In the second part, we apply the parallel computing model to k-means algorithms in the plane extended with topology. The proposed algorithms are applied to the TSP, structured mesh generation, and image segmentation following the concept of superpixel. The approach is called superpixel adaptive segmentation map (SPASM). In the third part, we propose a parallel local search algorithm, called distributed local search (DLS). The solution results from the many local operations, including local evaluation, neighborhood search, and structured move, performed on the distributed data in the plane. The algorithm is applied to Euclidean graph matching problems including stereo matching and optical flow.
|
429 |
Quelques applications de la programmation des processeurs graphiques à la simulation neuronale et à la vision par ordinateurChariot, Alexandre 16 December 2008 (has links) (PDF)
Largement poussés par l'industrie vidéoludique, la recherche et le développement d'outils matériels destinés à la génération d'images de synthèse, tels les cartes graphiques (ou GPU, Graphics Processing Units), ont connu un essor formidable ces dernières années. L'augmentation de puissance et de
|
430 |
Algorithmes sur GPU de visualisation et de calcul pour des maillages non-structurésBuatois, Luc 16 May 2008 (has links) (PDF)
Les algorithmes les plus récents de traitement numérique de la géométrie ou bien encore de simulation numérique de type CFD (Computational Fluid Dynamics) utilisent à présent de nouveaux types de grilles composées de polyèdres arbitraires, autrement dit des grilles fortement non-structurées. Dans le cas de simulations de type CFD, ces grilles peuvent servir de support à des champs scalaires ou vectoriels qui représentent des grandeurs physiques (par exemple : densité, porosité, perméabilité). La problématique de cette thèse concerne la définition de nouveaux outils de visualisation et de calcul sur de telles grilles. Pour la visualisation, cela pose `a la fois le problème du stockage et de l'adaptativité des algorithmes `a une géométrie et une topologie variables. Pour le calcul, cela pose le problème de la résolution de grands systèmes linéaires creux non-structurés. Pour aborder ces problèmes, l'augmentation incessante ces dernières années de la puissance de calcul parallèle des processeurs graphiques nous fournit de nouveaux outils. Toutefois, l'utilisation de ces GPU nécessite de définir de nouveaux algorithmes adaptés aux modèles de programmation parallèle qui leur sont spécifiques. Nos contributions sont les suivantes : (1) Une méthode générique de visualisation tirant partie de la puissance de calcul des GPU pour extraire des isosurfaces à partir de grandes grilles fortement nonstructurées. (2) Une méthode de classification de cellules qui permet d'accélérer l'extraction d'isosurfaces grâce à une pré-sélection des seules cellules intersectées. (3) Un algorithme d'interpolation temporelle d'isosurfaces. Celui-ci permet de visualiser de manière continue dans le temps l'évolution d'isosurfaces. (4) Un algorithme massivement parallèle de résolution de grands systèmes linéaires non-structurés creux sur le GPU. L'originalité de celui-ci concerne son adaptation à des matrices de motif arbitraire, ce qui le rend applicable `a n'importe quel système creux, dont ceux issus de maillages fortement non-structurés.
|
Page generated in 0.0324 seconds