Spelling suggestions: "subject:"high performance computing."" "subject:"igh performance computing.""
431 |
ROSENET: a remote server-based network emulation systemGu, Yan 08 January 2008 (has links)
Network emulation has been widely used to aid in the development and evaluation of real-time applications. Many of today s applications and protocols need to be tested and evaluated in large scale network environments such as the Internet, which requires emulation tools that meet the requirements of scale, accuracy, timeliness. Due to physical resource constraints in network emulators, existing emulation tools fail to meet these requirements as they are either limited to small and static networks, use simplified network models, or fail to deliver timely emulation results. If more physical resources are devoted to network emulation by utilizing high performance computing facilities, the accuracy and scalability of network emulation can be greatly improved. However, for many users, high performance computing facilities may not be readily available in a local laboratory environment, and co-locating application code with a remote high performance computing facility may be cumbersome and inconvenient.
This thesis proposes a network emulation approach called ROSENET (RemOte SErver-based Network EmulaTion) that utilizes a distributed server-based architecture in which local low-fidelity emulators provide real-time QoS predictions to distributed applications, coupled with a remote large scale high-fidelity simulator that continuously updates and calibrates the local low-fidelity emulators. A library-based modeling approach based on online simulation data collection is proposed and a system identification modeling technique is presented. Experimental results examining emulation end-to-end delay and loss show that ROSENET provides a promising approach to network emulation supporting accuracy and scale while meeting real-time constraints. Challenges faced in applying ROSENET to real world applications are addressed through two case studies including applying synthetic workload on DARPA s NMS network topology for large scale network simulation and a contemporary real-time distributed VoIP application Skype.
|
432 |
Efficient and Reliable Simulation of Quantum Molecular DynamicsKormann, Katharina January 2012 (has links)
The time-dependent Schrödinger equation (TDSE) models the quantum nature of molecular processes. Numerical simulations based on the TDSE help in understanding and predicting the outcome of chemical reactions. This thesis is dedicated to the derivation and analysis of efficient and reliable simulation tools for the TDSE, with a particular focus on models for the interaction of molecules with time-dependent electromagnetic fields. Various time propagators are compared for this setting and an efficient fourth-order commutator-free Magnus-Lanczos propagator is derived. For the Lanczos method, several communication-reducing variants are studied for an implementation on clusters of multi-core processors. Global error estimation for the Magnus propagator is devised using a posteriori error estimation theory. In doing so, the self-adjointness of the linear Schrödinger equation is exploited to avoid solving an adjoint equation. Efficiency and effectiveness of the estimate are demonstrated for both bounded and unbounded states. The temporal approximation is combined with adaptive spectral elements in space. Lagrange elements based on Gauss-Lobatto nodes are employed to avoid nondiagonal mass matrices and ill-conditioning at high order. A matrix-free implementation for the evaluation of the spectral element operators is presented. The framework uses hybrid parallelism and enables significant computational speed-up as well as the solution of larger problems compared to traditional implementations relying on sparse matrices. As an alternative to grid-based methods, radial basis functions in a Galerkin setting are proposed and analyzed. It is found that considerably higher accuracy can be obtained with the same number of basis functions compared to the Fourier method. Another direction of research presented in this thesis is a new algorithm for quantum optimal control: The field is optimized in the frequency domain where the dimensionality of the optimization problem can drastically be reduced. In this way, it becomes feasible to use a quasi-Newton method to solve the problem. / eSSENCE
|
433 |
Two Optimization Problems in Genetics : Multi-dimensional QTL Analysis and Haplotype InferenceNettelblad, Carl January 2012 (has links)
The existence of new technologies, implemented in efficient platforms and workflows has made massive genotyping available to all fields of biology and medicine. Genetic analyses are no longer dominated by experimental work in laboratories, but rather the interpretation of the resulting data. When billions of data points representing thousands of individuals are available, efficient computational tools are required. The focus of this thesis is on developing models, methods and implementations for such tools. The first theme of the thesis is multi-dimensional scans for quantitative trait loci (QTL) in experimental crosses. By mating individuals from different lines, it is possible to gather data that can be used to pinpoint the genetic variation that influences specific traits to specific genome loci. However, it is natural to expect multiple genes influencing a single trait to interact. The thesis discusses model structure and model selection, giving new insight regarding under what conditions orthogonal models can be devised. The thesis also presents a new optimization method for efficiently and accurately locating QTL, and performing the permuted data searches needed for significance testing. This method has been implemented in a software package that can seamlessly perform the searches on grid computing infrastructures. The other theme in the thesis is the development of adapted optimization schemes for using hidden Markov models in tracing allele inheritance pathways, and specifically inferring haplotypes. The advances presented form the basis for more accurate and non-biased line origin probabilities in experimental crosses, especially multi-generational ones. We show that the new tools are able to reconstruct haplotypes and even genotypes in founder individuals and offspring alike, based on only unordered offspring genotypes. The tools can also handle larger populations than competing methods, resolving inheritance pathways and phase in much larger and more complex populations. Finally, the methods presented are also applicable to datasets where individual relationships are not known, which is frequently the case in human genetics studies. One immediate application for this would be improved accuracy for imputation of SNP markers within genome-wide association studies (GWAS). / eSSENCE
|
434 |
Desenvolvimento de um simulador para espectrometria por fluorescência de raios X usando computação distribuída / Development of a X-ray fluorescence spectrometry simulator using distributed computingMarcio Henrique dos Santos 30 March 2012 (has links)
Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro / A Física das Radiações é um ramo da Física que está presente em diversas áreas de estudo e se relaciona ao conceito de espectrometria. Dentre as inúmeras técnicas espectrométricas
existentes, destaca-se a espectrometria por fluorescência de raios X. Esta também possui uma gama de variações da qual pode-se dar ênfase a um determinado subconjunto de técnicas. A produção de fluorescência de raios X permite (em certos casos) a análise das propriedades físico-químicas de uma amostra específica, possibilitando a determinação de sua constituiçõa química e abrindo um leque de aplicações. Porém, o estudo experimental pode exigir uma grande carga de trabalho, tanto em termos do aparato físico quanto em relação conhecimento técnico. Assim, a técnica de simulação entra em cena como um caminho viável, entre a teoria e a experimentação. Através do método de Monte Carlo, que se utiliza da manipulação de números aleatórios, a simulação se mostra como uma espécie de alternativa ao trabalho experimental.Ela desenvolve este papel por meio de um processo de modelagem, dentro de um ambiente seguro e livre de riscos. E ainda pode contar com a computação de alto desempenho, de forma a otimizar todo o trabalho por meio da arquitetura distribuída. O objetivo central deste trabalho é a elaboração de um simulador computacional para análise e estudo de sistemas de fluorescência de raios X desenvolvido numa plataforma de computação distribuída de forma nativa com o intuito de gerar dados otimizados. Como resultados deste trabalho, mostra-se a viabilidade da construção do simulador através da linguagem CHARM++, uma linguagem baseada em C++ que incorpora rotinas para processamento distribuído, o valor da metodologia para a modelagem de sistemas e a aplicação desta na construção de um simulador para espectrometria por fluorescência de raios X. O simulador foi construído com a capacidade de reproduzir uma fonte de radiação eletromagnética, amostras complexas e um conjunto de detectores. A modelagem dos detectores incorpora a capacidade de geração de imagens baseadas nas contagens registradas. Para validação do simulador, comparou-se os resultados espectrométricos com os resultados gerados por outro simulador já validado: o MCNP. / Radiation Physics is a branch of Physics that is present in various studying areas and relates to the concept of spectrometry. Among the numerous existing spectrometry techniques, there is the X-ray fluorescence spectrometry. It also has a range of variations which can emphasize a particular subset of techniques. The production of X-ray fluorescence enables (in some cases) the analysis of physical and chemical properties of a given sample, allowing the determination of its chemical constitution and also a range of applications. However, the experimental analysis may require a large workload, both in terms of physical apparatus and in relation to technical knowledge. Thus, the simulation comes into play as a viable path between theory and experiment. Through the Monte Carlo method, which uses the manipulation of random numbers, the simulation is a kind of alternative to the experimental analysis. It develops this role
by a modeling process, within a secure environment and risk free. And it can count on high performance computing in order to optimize all the work through the distributed architecture.
The aim of this paper is the development of a computational simulator for analysis and studying of X-ray fluorescence systems developed on a communication platform distributed natively, in order to generate optimal data. As results, has been proved the viability of the simulator implementation
through the CHARM++ language, a language based on C++ which incorporate procedures to distributed processing, the value of the methodology to system modelling e its application to build a simulator for X-ray fluorescence spectrometry. The simulator was built with the ability to reproduce a eletromagnetic radiation source, complex samples and a set of
detectors. The modelling of the detectors embody the ability to yield images based on recorded counts. To validate the simulator, the results were compared with the results provided by other known simulator: MCNP.
|
435 |
Ordonnancement de E/S transversal : des applications à des dispositifs / Transversal I/O Scheduling : from Applications to Devices / Escalonamento de E/S Transversal para Sistemas de Arquivos Paralelos : das Aplicações aos DispositivosZanon Boito, Francieli 30 March 2015 (has links)
Ordonnancement d’E/S Transversal pour les Systèmes de Fichiers Parallèles : desApplications aux DispositifsCette thèse porte sur l’utilisation de l’ordonnancement d’Entrées/Sorties (E/S) pour atténuer leseffets d’interférence et améliorer la performance d’E/S des systèmes de fichiers parallèles. Ilest commun pour les plates-formes de calcul haute performance (HPC) de fournir une infrastructurede stockage partagée pour les applications qui y sont hébergées. Dans cette situation,où plusieurs applications accèdent simultanément au système de fichiers parallèle partagé, leursaccès vont souffrir de l’interférence, ce qui compromet l’efficacité des stratégies d’optimisationd’E/S.Nous avons évalué la performance de cinq algorithmes d’ordonnancement dans les serveurs dedonnées d’un système de fichiers parallèle. Ces tests ont été exécutés sur différentes platesformeset sous différents modèles d’accès. Les résultats indiquent que la performance des ordonnanceursest affectée par les modèles d’accès des applications, car il est important pouraméliorer la performance obtenue grâce à un algorithme d’ordonnancement de surpasser sessurcoûts. En même temps, les résultats des ordonnanceurs sont affectés par les caractéristiquesdu système d’E/S sous-jacent - en particulier par des dispositifs de stockage. Différents dispositifsprésentent des niveaux de sensibilité à la séquentialité et la taille des accès distincts, ce quipeut influencer sur le niveau d’amélioration de obtenue grâce à l’ordonnancement d’E/S.Pour ces raisons, l’objectif principal de cette thèse est de proposer un modèle d’ordonnancementd’E/S avec une double adaptabilité : aux applications et aux dispositifs. Nous avons extraitdes informations sur les modèles d’accès des applications en utilisant des fichiers de trace,obtenus à partir de leurs exécutions précédentes. Ensuite, nous avons utilisé de l’apprentissageautomatique pour construire un classificateur capable d’identifier la spatialité et la taille desaccès à partir du flux de demandes antérieures. En outre, nous avons proposé une approche pourobtenir efficacement le ratio de débit séquentiel et aléatoire pour les dispositifs de stockage enexécutant des benchmarks pour un sous-ensemble des paramètres et en estimant les restantsavec des régressions linéaires.Nous avons utilisé les informations sur les caractéristiques des applications et des dispositifsde stockage pour décider automatiquement l’algorithme d’ordonnancement le plus appropriéen utilisant des arbres de décision. Notre approche améliore les performances jusqu’à 75% parrapport à une approche qui utilise le même algorithme d’ordonnancement dans toutes les situations,sans capacité d’adaptation. De plus, notre approche améliore la performance dans 64%de scénarios en plus, et diminue les performances dans 89% moins de situations. Nos résultatsmontrent que les deux aspects - des applications et des dispositifs - sont essentiels pour faire desbons choix d’ordonnancement. En outre, malgré le fait qu’il n’y a pas d’algorithme d’ordonnancementqui fournit des gains de performance pour toutes les situations, nous montrons queavec la double adaptabilité il est possible d’appliquer des techniques d’ordonnancement d’E/Spour améliorer la performance, tout en évitant les situations où cela conduirait à une diminutionde performance. / This thesis focuses on I/O scheduling as a tool to improve I/O performance on parallel file systemsby alleviating interference effects. It is usual for High Performance Computing (HPC)systems to provide a shared storage infrastructure for applications. In this situation, when multipleapplications are concurrently accessing the shared parallel file system, their accesses willaffect each other, compromising I/O optimization techniques’ efficacy.We have conducted an extensive performance evaluation of five scheduling algorithms at aparallel file system’s data servers. Experiments were executed on different platforms and underdifferent access patterns. Results indicate that schedulers’ results are affected by applications’access patterns, since it is important for the performance improvement obtained througha scheduling algorithm to surpass its overhead. At the same time, schedulers’ results are affectedby the underlying I/O system characteristics - especially by storage devices. Differentdevices present different levels of sensitivity to accesses’ sequentiality and size, impacting onhow much performance is improved through I/O scheduling.For these reasons, this thesis main objective is to provide I/O scheduling with double adaptivity:to applications and devices. We obtain information about applications’ access patternsthrough trace files, obtained from previous executions. We have applied machine learning tobuild a classifier capable of identifying access patterns’ spatiality and requests size aspects fromstreams of previous requests. Furthermore, we proposed an approach to efficiently obtain thesequential to random throughput ratio metric for storage devices by running benchmarks for asubset of the parameters and estimating the remaining through linear regressions.We use this information on applications’ and storage devices’ characteristics to decide the bestfit in scheduling algorithm though a decision tree. Our approach improves performance byup to 75% over an approach that uses the same scheduling algorithm to all situations, withoutadaptability. Moreover, our approach improves performance for up to 64% more situations, anddecreases performance for up to 89% less situations. Our results evidence that both aspects- applications and storage devices - are essential for making good scheduling choices. Moreover,despite the fact that there is no scheduling algorithm able to provide performance gainsfor all situations, we show that through double adaptivity it is possible to apply I/O schedulingtechniques to improve performance, avoiding situations where it would lead to performanceimpairment. / Esta tese se concentra no escalonamento de operações de entrada e saída (E/S) como uma soluçãopara melhorar o desempenho de sistemas de arquivos paralelos, aleviando os efeitos dainterferência. É usual que sistemas de computação de alto desempenho (HPC) ofereçam umainfraestrutura compartilhada de armazenamento para as aplicações. Nessa situação, em quemúltiplas aplicações acessam o sistema de arquivos compartilhado de forma concorrente, osacessos das aplicações causarão interferência uns nos outros, comprometendo a eficácia de técnicaspara otimização de E/S.Uma avaliação extensiva de desempenho foi conduzida, abordando cinco algoritmos de escalonamentotrabalhando nos servidores de dados de um sistema de arquivos paralelo. Foramexecutados experimentos em diferentes plataformas e sob diferentes padrões de acesso. Osresultados indicam que os resultados obtidos pelos escalonadores são afetados pelo padrão deacesso das aplicações, já que é importante que o ganho de desempenho provido por um algoritmode escalonamento ultrapasse o seu sobrecusto. Ao mesmo tempo, os resultados doescalonamento são afetados pelas características do subsistema local de E/S - especialmentepelos dispositivos de armazenamento. Dispositivos diferentes apresentam variados níveis desensibilidade à sequencialidade dos acessos e ao seu tamanho, afetando o quanto técnicas deescalonamento de E/S são capazes de aumentar o desempenho.Por esses motivos, o principal objetivo desta tese é prover escalonamento de E/S com duplaadaptabilidade: às aplicações e aos dispositivos. Informações sobre o padrão de acesso dasaplicações são obtidas através de arquivos de rastro, vindos de execuções anteriores. Aprendizadode máquina foi aplicado para construir um classificador capaz de identificar os aspectosespacialidade e tamanho de requisição dos padrões de acesso através de fluxos de requisiçõesanteriores. Além disso, foi proposta uma técnica para obter eficientemente a razão entre acessossequenciais e aleatórios para dispositivos de armazenamento, executando testes para apenas umsubconjunto dos parâmetros e estimando os demais através de regressões lineares.Essas informações sobre características de aplicações e dispositivos de armazenamento são usadaspara decidir a melhor escolha em algoritmo de escalonamento através de uma árvore dedecisão. A abordagem proposta aumenta o desempenho em até 75% sobre uma abordagem queusa o mesmo algoritmo para todas as situações, sem adaptabilidade. Além disso, essa técnicamelhora o desempenho para até 64% mais situações, e causa perdas de desempenho em até 89%menos situações. Os resultados obtidos evidenciam que ambos aspectos - aplicações e dispositivosde armazenamento - são essenciais para boas decisões de escalonamento. Adicionalmente,apesar do fato de não haver algoritmo de escalonamento capaz de prover ganhos de desempenhopara todas as situações, esse trabalho mostra que através da dupla adaptabilidade é possívelaplicar técnicas de escalonamento de E/S para melhorar o desempenho, evitando situações emque essas técnicas prejudicariam o desempenho.
|
436 |
Programmation des architectures hétérogènes à l'aide de tâches divisibles ou modulables / Programmation of heterogeneous architectures using moldable tasksCojean, Terry 26 March 2018 (has links)
Les ordinateurs équipés d'accélérateurs sont omniprésents parmi les machines de calcul haute performance. Cette évolution a entraîné des efforts de recherche pour concevoir des outils permettant de programmer facilement des applications capables d'utiliser toutes les unités de calcul de ces machines. Le support d'exécution StarPU développé dans l'équipe STORM de INRIA Bordeaux, a été conçu pour servir de cible à des compilateurs de langages parallèles et des bibliothèques spécialisées (algèbre linéaire, développements de Fourier, etc.). Pour proposer la portabilité des codes et des performances aux applications, StarPU ordonnance des graphes dynamiques de tâches de manière efficace sur l’ensemble des ressources hétérogènes de la machine. L’un des aspects les plus difficiles, lors du découpage d’une application en graphe de tâches, est de choisir la granularité de ce découpage, qui va typiquement de pair avec la taille des blocs utilisés pour partitionner les données du problème. Les granularités trop petites ne permettent pas d’exploiter efficacement les accélérateurs de type GPU, qui ont besoin de peu de tâches possédant un parallélisme interne de données massif pour « tourner à plein régime ». À l’inverse, les processeurs traditionnels exhibent souvent des performances optimales à des granularités beaucoup plus fines. Le choix du grain d’un tâche dépend non seulement du type de l'unité de calcul sur lequel elle s’exécutera, mais il a en outre une influence sur la quantité de parallélisme disponible dans le système : trop de petites tâches risque d’inonder le système en introduisant un surcoût inutile, alors que peu de grosses tâches risque d’aboutir à un déficit de parallélisme. Actuellement, la plupart des approches pour solutionner ce problème dépendent de l'utilisation d'une granularité des tâches intermédiaire qui ne permet pas un usage optimal des ressources aussi bien du processeur que des accélérateurs. L'objectif de cette thèse est d'appréhender ce problème de granularité en agrégeant des ressources afin de ne plus considérer de nombreuses ressources séparées mais quelques grosses ressources collaborant à l'exécution de la même tâche. Un modèle théorique existe depuis plusieurs dizaines d'années pour représenter ce procédé : les tâches parallèles. Le travail de cette thèse consiste alors en l'utilisation pratique de ce modèle via l'implantation de mécanismes de gestion de tâches parallèles dans StarPU et l'implantation ainsi que l'évaluation d'ordonnanceurs de tâches parallèles de la littérature. La validation du modèle se fait dans le cadre de l'amélioration de la programmation et de l'optimisation de l'exécution d'applications numériques au dessus de machines de calcul modernes. / Hybrid computing platforms equipped with accelerators are now commonplace in high performance computing platforms. Due to this evolution, researchers concentrated their efforts on conceiving tools aiming to ease the programmation of applications able to use all computing units of such machines. The StarPU runtime system developed in the STORM team at INRIA Bordeaux was conceived to be a target for parallel language compilers and specialized libraries (linear algebra, Fourier transforms,...). To provide the portability of codes and performances to applications, StarPU schedules dynamic task graphs efficiently on all heterogeneous computing units of the machine. One of the most difficult aspects when expressing an application into a graph of task is to choose the granularity of the tasks, which typically goes hand in hand with the size of blocs used to partition the problem's data. Small granularity do not allow to efficiently use accelerators such as GPUs which require a small amount of task with massive inner data-parallelism in order to obtain peak performance. Inversely, processors typically exhibit optimal performances with a big amount of tasks possessing smaller granularities. The choice of the task granularity not only depends on the type of computing units on which it will be executed, but in addition it will influence the quantity of parallelism available in the system: too many small tasks may flood the runtime system by introducing overhead, whereas too many small tasks may create a parallelism deficiency. Currently, most approaches rely on finding a compromise granularity of tasks which does not make optimal use of both CPU and accelerator resources. The objective of this thesis is to solve this granularity problem by aggregating resources in order to view them not as many small resources but fewer larger ones collaborating to the execution of the same task. One theoretical machine and scheduling model allowing to represent this process exists since several decades: the parallel tasks. The main contributions of this thesis are to make practical use of this model by implementing a parallel task mechanism inside StarPU and to implement and study parallel task schedulers of the literature. The validation of the model is made by improving the programmation and optimizing the execution of numerical applications on top of modern computing machines.
|
437 |
Simulation 3D d'une décharge couronne pointe-plan, dans l'air : calcul haute performance, algorithmes de résolution de l'équation de Poisson et analyses physiques / 3D simulation of a pine to plane corona discharge in dry air : High performance computing, Poisson equation solvers and PhysicsPlewa, Joseph-Marie 13 October 2017 (has links)
Cette thèse porte sur la simulation tridimensionnelle (3D) des décharges couronnes à l'aide du calcul haute performance. Lorsqu'on applique une impulsion de haute tension entre une pointe et un plan, les lignes de champ électrique fortement resserrées autour de la pointe induisent la propagation simultanée de plusieurs streamers et la formation d'une décharge couronne de structure arborescente. Dans ces conditions, seule une simulation électro-hydrodynamique 3D est apte à reproduire cette structure et fournir les ordres de grandeur de l'énergie déposée et de la concentration des espèces créées durant la phase de décharge. Cependant, cette simulation 3D est très consommatrice en temps et mémoire de calcul et n'est désormais accessible que grâce à l'accroissement permanent de la puissance des ordinateurs dédié au calcul haute performance. Dans le cadre d'une simulation électro-hydrodynamique 3D, une attention particulière doit être prise concernant l'efficacité des solveurs à résoudre les équations elliptiques 3D car leur contribution en termes de temps de calcul peut dépasser 80% du temps global de la simulation. Ainsi, une partie de manuscrit est consacrée aux tests de performances de méthodes de résolution d'équations elliptiques directes ou itératives telle que SOR R&B, BiCGSTAB et MUMPS, en utilisant le calcul massivement parallèle et les librairies MPI. Les calculs sont réalisés sur le supercalculateur EOS du réseau CALMIP, avec un nombre de cœurs de calcul allant jusqu'à 1800, et un nombre de mailles atteignant 8003 (soit plus 1/2 Milliard de mailles). Les tests de performances sont réalisés en statique sur le calcul du potentiel géométrique et en dynamique en propageant une densité de charge d'espace analytique caractéristique des streamers. Pour réaliser une simulation complète 3D de la décharge il faut également intégrer au programme un algorithme capable de résoudre les équations de transport de particule chargée à fort gradients de densité caractéristiques aux streamers. Dans ce manuscrit, l'algorithme MUSCL est testé dans différentes conditions de propagation d'un cube de densité (à vitesse homogène ou non homogène spatialement) afin d'optimiser le transport des densités d'espèces chargées impliquées. Le code 3D, conçu pour résoudre le modèle électro- hydrodynamique complet de la décharge (couplant les équations de transport, de Poisson et de cinétique réactionnelle) est ensuite validé par la confrontation des résultats 3D et 2D dans une condition de simulation présentant une symétrie de révolution autour de l'axe de propagation d'un streamer. Enfin, les premiers résultats des simulations 3D de la phase décharge avec la propagation d'un ou plusieurs streamers asymétriques sont présentés et analysés. Ces simulations permettent de suivre la structure arborescente de la décharge lorsqu'on applique une tension pulsée entre une pointe et un plan. L'initiation de la structure arborescente est étudiée en fonction de la position de spots plasmas et de leur influence sur l'amorçage des streamers. / This work is devoted to the three dimensional (3D) simulation of streamer corona discharges in air at atmospheric pressure using high-performance parallel computing. When a pulsed high-voltage is applied between a tip and a plane in air, the strong electric field lines constricted around the tip induce the simultaneous propagation of several streamers leading to a corona discharge with a tree structure. Only a true 3D electro-hydrodynamics simulation is able to reproduce this branching and to provide the orders of magnitude of the local deposited energy and the concentration of the species created during the discharge phase. However, such a 3D simulation which requires large computational memory and huge time calculation is nowadays accessible only when performed with massively parallel computation. In the field of 3D electro-hydrodynamics simulations, a special attention must be paid to the efficiency of solvers in solving 3D elliptic equations because their contribution can exceed 80% of the global computation time. Therefore, a specific chapter is devoted to test the performance of iterative and direct methods (such as SOR R&B, BiCGSTAB and MUMPS) in solving elliptic equations, using the massively parallel computation and the MPI library. The calculations are performed on the supercomputer EOS of the CALMIP network, with a number of computing cores and meshes increasing up to respectively 1800 and 8003 (i.e. more than 1/2 Billion meshes). The performances are compared for the calculation of the geometric potential and in a dynamic simulation conditions consisting in the propagation of an analytical space charge density characteristic of the streamers. To perform a complete 3D simulation of the streamer discharge, must also involve a robust algorithm able to solve the coupled conservation equations of the charged particle density with very sharp gradients characteristic of the streamers. In this manuscript, the MUSCL algorithm is tested under different propagation conditions of a cubic density (with uniform or non-uniform velocity field). The 3D code, designed to solve the complete electro-hydrodynamics model of the discharge (coupling the conservation equations, the Poisson equation and the chemical kinetics) is validated by comparing the 3D and 2D results in a simulation conditions presenting a rotational symmetry around the propagation axis of a mono-filamentary streamer. Finally, the first results of the 3D simulations of the discharge phase with the propagation of one or several asymmetric streamers are presented and analyzed. These simulations allow to follow the tree structure of a corona discharge when a pulsed voltage is applied between a tip and a plane. The ignition of the tree structure is studied as a function of the initial position of the plasma spots.
|
438 |
Contributions to parallel stochastic simulation : application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte Carlo simulations / Contributions à la simulation stochastique parallèle : architectures logicielles pour la distribution de flux pseudo-aléatoires dans les simulations Monte Carlo sur CPU/GPUPasserat-Palmbach, Jonathan 11 October 2013 (has links)
Résumé non disponible / The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all the tools must be reimplemented as well. In the particular case of stochastic simulations, one of the major element of the implementation is the pseudorandom numbers source. Employing pseudorandom numbers in parallel applications is not a straightforward task, and it has to be done with caution in order not to introduce biases in the results of the simulation. This problematic has been studied since parallel architectures are available and is called pseudorandom stream distribution. While the literature is full of solutions to handle pseudorandom stream distribution on CPU-based parallel platforms, the young GPU programming community cannot display the same experience yet.In this thesis, we study how to correctly distribute pseudorandom streams on GPU. From the existing solutions, we identified a need for good software engineering solutions, coupled to sound theoretical choices in the implementation. We propose a set of guidelines to follow when a PRNG has to be ported to GPU, and put these advice into practice in a software library called ShoveRand. This library is used in a stochastic Polymer Folding model that we have implemented in C++/CUDA. Pseudorandom streams distribution on manycore architectures is also one of our concerns. It resulted in a contribution named TaskLocalRandom, which targets parallel Java applications using pseudorandom numbers and task frameworks.Eventually, we share a reflection on the methods to choose the right parallel platform for a given application. In this way, we propose to automatically build prototypes of the parallel application running on a wide set of architectures. This approach relies on existing software engineering tools from the Java and Scala community, most of them generating OpenCL source code from a high-level abstraction layer.
|
439 |
PaVo un tri parallèle adaptatif / PaVo. An Adaptative Parallel Sorting Algorithm.Durand, Marie 25 October 2013 (has links)
Les joueurs exigeants acquièrent dès que possible une carte graphique capable de satisfaire leur soif d'immersion dans des jeux dont la précision, le réalisme et l'interactivité redoublent d'intensité au fil du temps. Depuis l'avènement des cartes graphiques dédiées au calcul généraliste, ils n'en sont plus les seuls clients. Dans un premier temps, nous analysons l'apport de ces architectures parallèles spécifiques pour des simulations physiques à grande échelle. Cette étude nous permet de mettre en avant un goulot d'étranglement en particulier limitant la performance des simulations. Partons d'un cas typique : les fissures d'une structure complexe de type barrage en béton armé peuvent être modélisées par un ensemble de particules. La cohésion de la matière ainsi simulée est assurée par les interactions entre elles. Chaque particule est représentée en mémoire par un ensemble de paramètres physiques à consulter systématiquement pour tout calcul de forces entre deux particules. Ainsi, pour que les calculs soient rapides, les données de particules proches dans l'espace doivent être proches en mémoire. Dans le cas contraire, le nombre de défauts de cache augmente et la limite de bande passante de la mémoire peut être atteinte, particulièrement en parallèle, bornant les performances. L'enjeu est de maintenir l'organisation des données en mémoire tout au long de la simulation malgré les mouvements des particules. Les algorithmes de tri standard ne sont pas adaptés car ils trient systématiquement tous les éléments. De plus, ils travaillent sur des structures denses ce qui implique de nombreux déplacements de données en mémoire. Nous proposons PaVo, un algorithme de tri dit adaptatif, c'est-à-dire qu'il sait tirer parti de l'ordre pré-existant dans une séquence. De plus, PaVo maintient des trous dans la structure, répartis de manière à réduire le nombre de déplacements mémoires nécessaires. Nous présentons une généreuse étude expérimentale et comparons les résultats obtenus à plusieurs tris renommés. La diminution des accès à la mémoire a encore plus d'importance pour des simulations à grande échelles sur des architectures parallèles. Nous détaillons une version parallèle de PaVo et évaluons son intérêt. Pour tenir compte de l'irrégularité des applications, la charge de travail est équilibrée dynamiquement par vol de travail. Nous proposons de distribuer automatiquement les données en mémoire de manière à profiter des architectures hiérarchiques. Les tâches sont pré-assignées aux cœurs pour utiliser cette distribution et nous adaptons le moteur de vol pour favoriser des vols de tâches concernant des données proches en mémoire. / Gamers are used to throw onto the latest graphics cards to play immersive games which precision, realism and interactivity keep increasing over time. With general-propose processing on graphics processing units, scientists now participate in graphics card use too. First, we examine these architectures interest for large-scale physics simulations. Drawing on this experience, we highlight in particular a bottleneck in simulations performance. Let us consider a typical situation: cracks in complex reinforced concrete structures such as dams are modelised by many particles. Interactions between particles simulate the matter cohesion. In computer memory, each particle is represented by a set of physical parameters used for every force calculations between two particles. Then, to speed up computations, data from particles close in space should be close in memory. Otherwise, the number of cache misses raises up and memory bandwidth may be reached, specially in parallel environments, limiting global performance. The challenge is to maintain data organization during the simulations despite particle movements. Classical sorting algorithms do not suit such situations because they consistently sort all the elements. Besides, they work upon dense structures leading to a lot of memory transfers. We propose PaVo, an adaptive sort which means it benefits from sequence presortedness. Moreover, to reduce the number of necessary memory transfers, PaVo spreads some gaps inside the data structure. We present a large experimental study and confront results to reputed sort algorithms. Reducing memory requests is again more important for large scale simulations with parallel architectures. We detail a parallel version of PaVo and evaluate its interest. To deal with application irregularities, we do load balancing with work-stealing. We take advantage of hierarchical architectures by automatically distributing data in memory. Thus, tasks are pre-assigned to cores with respect to this organization and we adapt the scheduler to favor steals of tasks working on data close in memory.
|
440 |
Protection obligatoire répartie : usage pour le calcul intensif et les postes de travail / Distributed mandatory protectionGros, Damien 30 June 2014 (has links)
La thèse porte sur deux enjeux importants de sécurité. Le premier concerne l’amélioration de la sécurité des systèmes Linux présents dans le calcul intensif et le second la protection des postes de travail Windows. Elle propose une méthode commune pour l’observation des appels système et la répartition d’observateurs afin de renforcer la sécurité et mesurer les performances obtenues. Elle vise des observateurs du type moniteur de référence afin de garantir de la confidentialité et de l’intégrité. Une solution utilisant une méthode de calcul intensif est mise en oeuvre pour réduire les surcoûts de communication entre les deux moniteurs de référence SELinux et PIGA. L’évaluation des performances montre les surcoûts engendrés par les moniteurs répartis et analyse la faisabilité pour les différents noeuds d’environnements de calcul intensif. Concernant la sécurité des postes de travail, un moniteur de référence est proposé pour Windows. Il repose sur les meilleures protections obligatoires issues des systèmes Linux et simplifie l’administration. Nous présentons une utilisation de ce nouveau moniteur pour analyser le fonctionnement de logiciels malveillants. L’analyse permet une protection avancée qui contrôle l’ensemble du scénario d’attaque de façon optimiste. Ainsi, la sécurité est renforcée sans nuire aux activités légitimes. / This thesis deals with two major issues in the computer security field. The first is enhancing the security of Linux systems for scientific computation, the second is the protection of Windows workstations. In order to strengthen the security and measure the performances, we offer a common method for the distributed observation of system calls. It relies on reference monitors to ensure confidentiality and integrity. Our solution uses specific high performance computing technologies to lower the communication latencies between the SELinux and PIGA monitors. Benchmarks study the integration of these distributed monitors in the scientific computation. Regarding workstation security, we propose a new reference monitor implementing state of the art protection models from Linux and simplifying administration. We present how to use our monitor to analyze the behavior of malware. This analysis enables an advanced protection to prevent attack scenarii in an optimistic manner. Thus, security is enforced while allowing legitimate activities.
|
Page generated in 0.1491 seconds