Spelling suggestions: "subject:"highperformance computing"" "subject:"highperformance computing""
391 |
Application of the compressible and low-mach number approaches to large-eddy simulation of turbulent flows in aero-engines / Application de l'approche compressible et de l'approche bas-Mach pour la simulation aux grandes échelles des écoulements turbulents dans des foyers aéronautiqueKraushaar, Matthias 01 December 2011 (has links)
La Simulation aux Grandes Echelles (SGE) est de plus en plus utilisée dans les processus de développement et la conception des réacteurs aéronautiques industriels. L'une des raisons pour ce besoin résulte dans la capacité de la SGE à fournir des informations instantanées d'un écoulement turbulent augmentant la quantité des prédictions de la composition des gaz d'échappement. Ce manuscrit de thèse aborde deux sujets récurrents de la SGE. D'une part, les schémas numériques pour la SGE nécessitent certaines propriétés, notamment une précision élevée avec une diffusivité faible pour ne pas nuire aux modèles de turbulence. Afin de répondre à ce pré requis, une famille de schémas d'intégration temporelle d'ordre élevée est proposée, permettant de modifier la diffusion numérique du schéma. D'autre part, la SGE étant intrinsèquement instationnaire, elle est très consommatrice en temps CPU. De plus, une géométrie complexe prend beaucoup de temps de simulation même avec les super calculateurs d'aujourd'hui. Dans le cas particulier d'intérêt et souvent rencontré dans les applications industrielles, l'approche bas-Mach est constitue une alternative intéressante permettant de réduire le coût et le temps de retour d'une simulation LES. L'impact et la comparaison des formalismes compressible et incompressible sont toutefois rarement quantifiés, ce qui est proposé dans ce travail pour une configuration représentative d'un brûleur swirlé industriel mesuré au CORIA / Large-Eddy Simulation (LES) becomes a more and more demanded tool to improve the design of aero-engines. The main reason for this request stems from the constraints imposed on the next generation low-emission engines at the industrial development level and the ability for LES to provide information on the instantaneous turbulent flow field which greatly contributes to improving the prediction of mixing and combustion thereby offering an improved prediction of the exhaust emission. The work presented in this thesis discusses two recurring issues of LES. For one, numerical schemes for LES require certain properties, i.e. low-diffusion schemes of high order of accuracy so as not to interfere with the turbulence models. To meet this purpose in the context of fully unstructured solvers, a new family of high-order time-integration schemes is proposed. With this class of schemes, the diffusion implied by the numerical scheme become adjustable and built-in. Second, since fully unsteady by nature, LES is very consuming in terms of CPU time. Even with today's supercomputers complex problems require long simulation times. Due to the low flow velocities often occurring in industrial applications, the use of a low-Mach number solver seems suitable and can lead to large reductions in CPU time if comparable to fully compressible solvers. The impact of the incompressibility assumption and the different nature of the numerical algorithms are rarely discussed. To partly answer the question, detailed comparisons are proposed for an experimental swirled configuration representative of a real burner that is simulated by LES using a fully explicit compressible solver and an incompressible solution developed at CORIA
|
392 |
Exploitation efficace des architectures parallèles de type grappes de NUMA à l’aide de modèles hybrides de programmationClet-Ortega, Jérôme 18 April 2012 (has links)
Les systèmes de calcul actuels sont généralement des grappes de machines composés de nombreux processeurs à l'architecture fortement hiérarchique. Leur exploitation constitue le défi majeur des implémentations de modèles de programmation tels MPI ou OpenMP. Une pratique courante consiste à mélanger ces deux modèles pour bénéficier des avantages de chacun. Cependant ces modèles n'ont pas été pensés pour fonctionner conjointement ce qui pose des problèmes de performances. Les travaux de cette thèse visent à assister le développeur dans la programmation d'application de type hybride. Il s'appuient sur une analyse de la hiérarchie architecturale du système de calcul pour dimensionner les ressources d'exécution (processus et threads). Plutôt qu'une approche hybride classique, créant un processus MPI multithreadé par noeud, nous évaluons de façon automatique des solutions alternatives, avec plusieurs processus multithreadés par noeud, mieux adaptées aux machines de calcul modernes. / Modern computing servers usually consist in clusters of computers with several multi-core CPUs featuring a highly hierarchical hardware design. The major challenge of the programming models implementations is to efficiently take benefit from these servers. Combining two type of models, like MPI and OpenMP, is a current trend to reach this point. However these programming models haven't been designed to work together and that leads to performance issues. In this thesis, we propose to assist the programmer who develop hybrid applications. We lean on an analysis of the computing system architecture in order to set the number of processes and threads. Rather than a classical hybrid approach, that is to say creating one multithreaded MPI process per node, we automatically evaluate alternative solutions, with several multithreaded processes per node, better fitted to modern computing systems.
|
393 |
Jahresbericht 2012 zur kooperativen DV-Versorgung06 October 2016 (has links) (PDF)
No description available.
|
394 |
Exécution efficace de systèmes Multi-Agents sur GPU / Efficient execution of multi-agent systems on GPULaville, Guillaume 27 June 2014 (has links)
Ces dernières années ont consacré l’émergence du parallélisme dans la plupart des branches de l’informatique.Au niveau matériel, tout d’abord, du fait de la stagnation des fréquences de fonctionnement des unités decalcul. Au niveau logiciel, ensuite, avec la popularisation de nombreuses plates-formes d’exécution parallèle.Une forme de parallélisme est également présente dans les systèmes multi-agents, qui facilitent la description desystèmes complexes comme ensemble d’entités en interaction. Si l’adéquation entre ce parallélisme d’exécutionlogiciel et conceptuel semble naturelle, la parallélisation reste une démarche difficile, du fait des nombreusesadaptations devant être effectuées et des dépendances présentes explicitement dans de très nombreux systèmesmulti-agents.Dans cette thèse, nous proposons une solution pour faciliter l’implémentation de ces modèles sur une plateformed’exécution parallèle telle que le GPU. Notre bibliothèque MCMAS vient répondre à cette problématiqueau moyen de deux interfaces de programmation, une couche de bas niveau MCM permettant l’accès direct àOpenCL et un ensemble de plugins utilisables sans connaissances GPU. Nous étudions ensuite l’utilisation decette bibliothèque sur trois systèmes multi-agents existants : le modèle proie-prédateur, le modèle MIOR etle modèle Collemboles. Pour montrer l’intérêt de cette approche, nous présentons une étude de performancede chacun de ces modèles et une analyse des facteurs contribuant à une exécution efficace sur GPU. Nousdressons enfin un bilan du travail et des réflexions présentées dans notre mémoire, avant d’évoquer quelquespistes d’amélioration possibles de notre solution. / These last years have seen the emergence of parallelism in many fields of computer science. This is explainedby the stagnation of the frequency of execution units at the hardware level and by the increasing usage ofparallel platforms at the software level. A form of parallelism is present in multi-agent systems, that facilitatethe description of complex systems as a collection of interacting entities. If the similarity between this softwareand this logical parallelism seems obvious, the parallelization process remains difficult in this case because ofthe numerous dependencies encountered in many multi-agent systems.In this thesis, we propose a common solution to facilitate the adaptation of these models on a parallel platformsuch as GPUs. Our library, MCMAS, provides access to two programming interface to facilitate this adaptation:a low-level layer providing direct access to OpenCL, MCM, and a high-level set of plugins not requiring anyGPU-related knowledge.We study the usage of this library on three existing multi-agent models : predator-prey,MIOR and Collembola. To prove the interest of the approach we present a performance study for each modeland an analysis of the various factors contributing to an efficient execution on GPUs. We finally conclude on aoverview of the work and results presented in the report and suggest future directions to enhance our solution.
|
395 |
The Mystery of the Failing Jobs: Insights from Operational Data from Two University-Wide Computing SystemsRakesh Kumar (7039253) 14 August 2019 (has links)
Node downtime and failed jobs in a computing cluster translate into wasted resources and user dissatisfaction. Therefore understanding why nodes and jobs fail in HPC clusters is essential. This paper provides analyses of node and job failures in two university-wide computing clusters at two Tier I US research universities. We analyzed approximately 3.0M job execution data of System A and 2.2M of System B with data sources coming from accounting logs, resource usage for all primary local and remote resources (memory, IO, network), and node failure data. We observe different kinds of correlations of failures with resource usages and propose a job failure prediction model to trigger event-driven checkpointing and avoid wasted work. We provide generalizable insights for cluster management to improve reliability, such as, for some execution environments local contention dominates, while for others system-wide contention dominates.
|
396 |
Développement d'un modèle level set performant pour la modélisation de la recristallisation en 3D / Development of an efficient level set framework for the full field modeling recrystallization in 3DScholtes, Benjamin 05 December 2016 (has links)
Les propriétés mécaniques et fonctionnelles des matériaux métalliques sont conditionnées par leurs microstructures, qui sont elles-mêmes héritées des traitements thermomécaniques subis. Etre capable de prévoir et simuler la microstructure et ses hétérogénéités lors des procédés de mise en forme complexes est récemment devenu crucial dans l'industrie métallurgique. C'est également un véritable challenge d'un point de vue numérique qui met en évidence l'importance des matériaux numériques dans les nouvelles méthodes de modélisation. Dans ce travail, nous nous intéressons à un modèle en champ complet récent basé sur la méthode level set (LS) dans un cadre éléments finis (EF) pour la modélisation des mécanismes de recristallisation.Les points forts de cette approche par rapport à l'état de l'art ont motivé le développement d'un logiciel appelé DIGIMU® par la société TRANSVALOR avec le soutien de grandes entreprises industrielles. Toutefois, le principal inconvénient de cette approche, commun aux autres méthodes en champ complet utilisant des maillages EF non structurés, reste son coût numérique important.Le principal objectif de ce travail a donc été d'améliorer considérablement le coût numérique de la formulation LS utilisée dans le contexte de maillages EF non structurés. De nouveaux développements génériques ont été réalisés pour améliorer l'efficacité globale du modèle. La formulation 2D LS existante, déjà utilisée pour modéliser la croissance de grains, la recristallisation statique et l'effet d'ancrage de Smith-Zener, a été étendue et améliorée afin de modéliser ces mécanismes en 3D pour des polycristaux à grand nombre de grains en des temps de calcul raisonnables. / Mechanical and functional properties of metallic materials are strongly related to their microstructures, which are themselves inherited from thermal and mechanical processing. Being able to accurately predict and simulate the microstructure and its heterogeneities after complex forming paths recently became crucial for the metallurgy industry. This is also a real challenge from a numerical point of view which highlights the importance of digital materials in new modeling techniques. In this work, we focus on a recent front-capturing full field model based on the level set (LS) method within a finite element (FE) framework to model recrystallization mechanisms.The strengths of this approach comparatively to the state of the art have motivated the development of a software package called DIGIMU® by the company TRANSVALOR with the support of major industrial companies. However, the main drawback of this approach, common with other front-capturing full field approaches working on unstructured FE meshes, is its important computational cost, especially in 3D.Main purpose of this work was finally to drastically improve the numerical cost of the considered LS-FE formulation in context of unstructured FE meshes. New generic numerical developments have been proposed to improve the global efficiency of the model. The existing 2D LS formulation, already used to model grain growth, static recrystallization and the Smith-Zener pinning effect, has been extended and improved in order to model these mechanisms in 3D for large-scale polycrystals with reasonable computational costs.
|
397 |
Optimisation numérique stochastique évolutionniste : application aux problèmes inverses de tomographie sismique / Numerical optimization using stochastic evolutionary algorithms : application to seismic tomography inverse problemsLuu, Keurfon 28 September 2018 (has links)
La tomographie sismique des temps de trajet est un problème d'optimisation mal-posé du fait de la non-linéarité entre les temps et le modèle de vitesse. Par ailleurs, l'unicité de la solution n'est pas garantie car les données peuvent être expliquées par de nombreux modèles. Les méthodes de Monte-Carlo par Chaînes de Markov qui échantillonnent l'espace des paramètres sont généralement appréciées pour répondre à cette problématique. Cependant, ces approches ne peuvent pleinement tirer partie des ressources computationnelles fournies par les super-calculateurs modernes. Dans cette thèse, je me propose de résoudre le problème de tomographie sismique à l'aide d'algorithmes évolutionnistes. Ce sont des méthodes d'optimisation stochastiques inspirées de l'évolution naturelle des espèces. Elles opèrent sur une population de modèles représentés par un ensemble d'individus qui évoluent suivant des processus stochastiques caractéristiques de l'évolution naturelle. Dès lors, la population de modèles peut être intrinsèquement évaluée en parallèle ce qui rend ces algorithmes particulièrement adaptés aux architectures des super-calculateurs. Je m'intéresse plus précisément aux trois algorithmes évolutionnistes les plus populaires, à savoir l'évolution différentielle, l'optimisation par essaim particulaire, et la stratégie d'évolution par adaptation de la matrice de covariance. Leur faisabilité est étudiée sur deux jeux de données différents: un jeu réel acquis dans le contexte de la fracturation hydraulique et un jeu synthétique de réfraction généré à partir du modèle de vitesse Marmousi réputé pour sa géologie structurale complexe. / Seismic traveltime tomography is an ill-posed optimization problem due to the non-linear relationship between traveltime and velocity model. Besides, the solution is not unique as many models are able to explain the observed data. The non-linearity and non-uniqueness issues are typically addressed by using methods relying on Monte Carlo Markov Chain that thoroughly sample the model parameter space. However, these approaches cannot fully handle the computer resources provided by modern supercomputers. In this thesis, I propose to solve seismic traveltime tomography problems using evolutionary algorithms which are population-based stochastic optimization methods inspired by the natural evolution of species. They operate on concurrent individuals within a population that represent independent models, and evolve through stochastic processes characterizing the different mechanisms involved in natural evolution. Therefore, the models within a population can be intrinsically evaluated in parallel which makes evolutionary algorithms particularly adapted to the parallel architecture of supercomputers. More specifically, the works presented in this manuscript emphasize on the three most popular evolutionary algorithms, namely Differential Evolution, Particle Swarm Optimization and Covariance Matrix Adaptation - Evolution Strategy. The feasibility of evolutionary algorithms to solve seismic tomography problems is assessed using two different data sets: a real data set acquired in the context of hydraulic fracturing and a synthetic refraction data set generated using the Marmousi velocity model that presents a complex geology structure.
|
398 |
Arquitetura de computação paralela para resolução de problemas de dinâmica dos fluidos e interação fluido-estrutura. / Parallel computing archictecture for solving fluid dynamics and fluid-structure interaction problems.Couto, Luiz Felipe Marchetti do 27 June 2016 (has links)
Um dos grandes desafios da engenharia atualmente é viabilizar soluções computacionais que reduzam o tempo de processamento e forneçam respostas ainda mais precisas. Frequentemente surgem propostas com as mais diversas abordagens que exploram novas formas de resolver tais problemas ou tentam, ainda, melhorar as soluções existentes. Uma das áreas que se dedica a propor tais melhorias é a computação paralela e de alto desempenho - HPC (High Performance Computing). Técnicas que otimizem o tempo de processamento, algoritmos mais eficientes e computadores mais rápidos abrem novos horizontes possibilitando realizar tarefas que antes eram inviáveis ou levariam muito tempo para serem concluídas. Neste projeto propõe-se a implementação computacional de uma arquitetura de computação paralela com o intuito de resolver, de forma mais eficiente, em comparação com a arquitetura sequencial, problemas de Dinâmica dos Fluidos e Interação Fluido-Estrutura e que também seja possível estender esta arquitetura para a resolução de outros problemas relacionados com o Método dos Elementos Finitos. O objetivo deste trabalho é desenvolver um algoritmo computacional eficiente em linguagem de programação científica C++ e CUDA - de propriedade da NVIDIAr - tendo como base trabalhos anteriores desenvolvidos no LMC (Laboratório de Mecânica Computacional) e, posteriormente, com a arquitetura desenvolvida, executar e investigar problemas de Dinâmica dos Fluidos e Interação Fluido-Estrutura (aplicando o método dos Elementos Finitos com Fronteiras Imersas e a solução direta do sistema de equações lineares com PARDISO) com o auxílio dos computadores do LMC. Uma análise de sensibilidade para cada problema é realizada de forma a encontrar a melhor combinação entre o número de elementos da malha de elementos finitos e o speedup, e posteriormente é feita uma análise comparativa de desempenho entre a arquitetura paralela a sequencial. Com uma única GPU conseguiu-se uma considerável redução no tempo para o assembly das matrizes globais e no tempo total da simulação. / One of the biggest challenges of engineering is enable computational solutions that reduce processing time and provide more accurate numerical solutions. Proposals with several approaches that explore new ways of solving such problems or improve existing solutions emerge. One of the biggest areas dedicated to propose such improvements is the parallel and high performance computing. Techniques that improve the processing time, more efficient algorithms and faster computers open up new horizons allowing to perform tasks that were previously unfeasible or would take too long to complete. We can point out, among several areas of interest, Fluid Dynamics and Interaction Fluid-Structure. In this work it is developed a parallel computing architecture in order to solve numerical problems more efficiently, compared to sequential architecture (e.g. Fluid Dynamics and Fluid-Structure Interaction problems) and it is also possible to extend this architecture to solve different problems (e.g. Structural problems). The objective is to develop an efficient computational algorithm in scientific programming language C ++, based on previous work carried out in Computational Mechanics Laboratory (CML) at Polytechnic School at University of São Paulo, and later with the developed architecture, execute and investigate Fluid Dynamics and Fluid-Structure Interaction problems with the aid of CML computers. A sensitivity analysis is executed for different problems in order to assess the best combination of elements quantity and speedup, and then a perfomance comparison. Using only one GPU, we could get a 10 times speedup compared to a sequential software, using the Finite Element with Immersed Boundary Method and a direct solver (PARDISO).
|
399 |
[en] ANNCOM: ARTIFICIAL NEURAL NETWORK LIBRARY FOR HIGH PERFORMANCE COMPUTING USING GRAPHIC CARDS / [pt] ANNCOM: BIBLIOTECA DE REDES NEURAIS ARTIFICIAIS PARA ALTO DESEMPENHO UTILIZANDO PLACAS DE VÍDEODANIEL SALLES CHEVITARESE 24 May 2019 (has links)
[pt] As Redes Neurais Artificiais têm sido utilizadas com bastante sucesso em problemas de previsão, inferência e classificação de padrões. Por essa razão, já se encontram disponíveis diversas bibliotecas que facilitam a modelagem e o treinamento de redes, tais como o NNtool do Matlab ou o WEKA. Embora essas bibliotecas
sejam muito utilizadas, elas possuem limitações quanto à mobilidade, à flexibilidade e ao desempenho. Essa última limitação é devida, principalmente, ao treinamento que pode exigir muito tempo quando existe uma grande quantidade de dados com muitos atributos. O presente trabalho propõe o desenvolvimento de
uma biblioteca (ANNCOM) de fácil utilização, flexível, multiplataforma e que utiliza a arquitetura CUDA (Compute Unified Device Architecture) para reduzir os tempos de treinamento das redes. Essa arquitetura é uma forma de GPGPU (General-Purpose computing on Graphics Processing Units) e tem sido utilizada
como uma solução em computação paralela na área de alto desempenho, uma vez que a tecnologia utilizada nos processadores atuais está chegando ao limite de velocidade. Adicionalmente, foi criada uma ferramenta gráfica que auxilia o desenvolvimento de soluções aplicando as técnicas de redes neurais de forma fácil e clara usando a biblioteca desenvolvida. Para avaliação de desempenho da ANNCOM, foram realizados seis treinamentos para classificação de clientes de baixa tensão de uma distribuidora de energia elétrica. O treinamento das redes, utilizando a ANNCOM com a tecnologia CUDA, alcançou um desempenho quase 30 vezes maior do que a ANNCOM auxiliada pela MKL (Math Kernel Library) da Intel, também utilizada pelo Matlab. / [en] The Artificial Neural Networks have been used quite successfully in problems of prediction, inference and classification standards. For this reason, are already available several libraries that facilitate the modeling and training networks, such as NNtool Matlab or WEKA. While these libraries are widely used, they
have limited mobility, flexibility and performance. This limitation is due mainly to the training that can take a long time when there is a large amount of data with many attributes. This paper proposes the development of a library (ANNCOM) easy to use, flexible platform and architecture that uses the CUDA (Compute Unified Device Architecture) to reduce the training times of the networks. This architecture is a form of GPGPU (GeneralPurpose computing on Graphics Processing Units) and has been used as a solution in parallel computing in the area of high performance, since the technology used in current processors are reaching the limit of speed. Additionally created a graphical tool that helps the development of solutions using the techniques of neural networks easily and clearly using the library developed. For performance evaluation ANNCOM were conducted six trainings for customer classification of a low voltage electricity distribution. The training of networks using ANNCOM with CUDA technology, achieved a performance
nearly 30 times greater than the ANNCOM aided by MKL (Math Kernel Library) by Intel, also used by Matlab.
|
400 |
Algoritmo distribuído para alocação de múltiplos recursos em ambientes distribuídos. / Distributed algorithm for multiple resource allocation in a distributed environment.Ribacionka, Francisco 07 June 2013 (has links)
Ao considerar um sistema distribuído composto por um conjunto de servidores, clientes e recursos, que caracterizam ambientes como grades ou nuvens computacionais, que oferecem um grande número de recursos distribuídos como CPUs ou máquinas virtuais, os quais são utilizados conjuntamente por diferentes tipos de aplicações, tem-se a necessidade de se ter uma solução para alocação destes recursos. O apoio à alocação dos recursos fornecidos por tais ambientes deve satisfazer todas as solicitações de recursos das aplicações, e fornecer respostas afirmativas para alocação eficiente de recursos, fazer justiça na alocação no caso de pedidos simultâneos entre vários clientes de recursos e responder em um tempo finito a requisições. Considerando tal contexto de grande escala em sistemas distribuídos, este trabalho propõe um algoritmo distribuído para alocação de recursos. Este algoritmo explora a Lógica Fuzzy sempre que um servidor está impossibilitado de atender a uma solicitação feita por um cliente, encaminhando esta solicitação a um servidor remoto. O algoritmo utiliza o conceito de relógio lógico para garantir justiça no atendimento das solicitações feitas em todos os servidores que compartilham recursos. Este algoritmo segue o modelo distribuído, onde uma cópia do algoritmo é executada em cada servidor que compartilha recursos para seus clientes, e todos os servidores tomam parte das decisões com relação a alocação destes recursos. A estratégia desenvolvida tem como objetivo minimizar o tempo de resposta na alocação de recursos, funcionando como um balanceamento de carga em um ambiente cliente-servidor com alto índice de solicitações de recursos pelos clientes. A eficiência do algoritmo desenvolvido neste trabalho foi comprovada através da implementação e comparação com outros algoritmos tradicionais, mostrando a possibilidade de utilização de recursos que pertencem a distintos servidores por uma mesma solicitação de recursos, com a garantia de que esta requisição será atendida, e em um tempo finito. / When considering a distributed system composed of a set of servers, clients, and resources that characterize environments like computational grids or clouds that offer a large number of distributed resources such as CPUs or virtual machines, which are used jointly by different types of applications, there is the need to have a solution for allocating these resources. Support the allocation of resources provided by such environments must satisfy all Requests for resources such applications, and provide affirmative answers to the efficient allocation of resources, to do justice in this allocation in the case of simultaneous Requests from multiple clients and answer these resources in a finite time these Requests. Considering such a context of large- scale distributed systems, this paper proposes a distributed algorithm for resource allocation This algorithm exploits fuzzy logic whenever a server is unable to meet a request made by a client, forwarding this request to a remote server. The algorithm uses the concept of logical clock to ensure fairness in meeting the demands made on all servers that share resources. This algorithm follows a distributed model, where a copy of the algorithm runs on each server that shares resources for its clients and all servers take part in decisions regarding allocation of resources. The strategy developed aims to minimize the response time in allocating resources, functioning as a load-balancing in a client-server environment with high resource Requests by customers.
|
Page generated in 0.1094 seconds