• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 337
  • 189
  • 134
  • 56
  • 45
  • 44
  • 4
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 922
  • 922
  • 922
  • 404
  • 394
  • 351
  • 351
  • 329
  • 325
  • 320
  • 319
  • 316
  • 314
  • 313
  • 313
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
441

Exploiting multiple levels of parallelism and online refinement of unstructured meshes in atmospheric model application

Schepke, Claudio January 2012 (has links)
Previsões meteorológicas para longos períodos de tempo estão se tornando cada vez mais importantes. A preocupação mundial com as consequências da mudança do clima tem estimulado pesquisas para determinar o seu comportamento nas próximas décadas. Ao mesmo tempo, os passos necessários para definir uma melhor modelagem e simulação do clima e/ou tempo estão longe da precisão desejada. Aumentar o refinamento da superfície terrestre e, consequentemente, aumentar o número de pontos discretos (utilizados para a representação da atmosfera) na modelagem climática e precisão das soluções computadas é uma meta que está em conflito com o desempenho das aplicações numéricas. Aplicações que envolvem a interação de longos períodos de tempo e incluem um grande número de operações possuem um tempo de execução inviável para as arquiteturas de computadores tradicionais. Para superar esta situação, um modelo climatológico pode adotar diferentes níveis de refinamento da superfície terrestre, utilizando mais pontos discretos somente em regiões onde uma maior precisão é requerida. Este é o caso de Ocean-Land-AtmosphereModel, que permite o refinamento estático de uma determinada região no início da execução do código. No entanto, um refinamento dinâmico possibilitaria uma melhor compreensão das condições climáticas específicas de qualquer região da superfície terrestre que se tivesse interesse, sem a necessidade de reiniciar a execução da aplicação. Com o surgimento das arquiteturas multi-core e a adoção de GPUs para a computação de propósito geral, existem diferentes níveis de paralelismo. Hoje há paralelismo interno ao processador, entre processadores e entre computadores. Com o objetivo de extrair ao máximo a performance dos computadores atuais, é necessário utilizar todos os níveis de paralelismo disponíveis durante o desenvolvimento de aplicações concorrentes. No entanto, nenhuma interface de programação paralela explora simultaneamente bem os diferentes níveis de paralelismo existentes. Baseado neste contexto, esta tese investiga como explorar diferentes níveis de paralelismo em modelos climatológicos usando interfaces clássicas de programação paralela de forma combinada e como é possível prover refinamento de malhas em tempo de execução para estes modelos. Os resultados obtidos a partir de implementações realizadas mostraram que é possível reduzir o tempo de execução de uma simulação atmosférica utilizando diferentes níveis de paralelismo, através do uso combinado de interfaces de programação paralela. Além disso, foi possível prover maior desempenho na execução de aplicações climatológicas que utilizam refinamento de malhas em tempo de execução. Com isso, uma malha de maior resolução para a representação da atmosfera terrestre pode ser adotada e, consequentemente, as previsões numéricas serão mais precisas. / Weather forecasts for long periods of time has emerged as increasingly important. The global concern with the consequences of climate changes has stimulated researches to determine the climate in coming decades. At the same time the steps needed to better defining the modeling and the simulation of climate/weather is far of the desired accuracy. Upscaling the land surface and consequently to increase the number of points used in climate modeling and the precision of the computed solutions is a goal that conflicts with the performance of numerical applications. Applications that include the interaction of long periods of time and involve a large number of operations become the expectation for results infeasible in traditional computers. To overcome this situation, a climatic model can take different levels of refinement of the Earth’s surface, using more discretized elements only in regions where more precision are required. This is the case of Ocean-Land- Atmosphere Model, which allows the static refinement of a particular region of the Earth in the early execution of the code. However, a dynamic mesh refinement could allow to better understand specific climatic conditions that appear at execution time of any region of the Earth’s surface, without restarting execution. With the introduction of multi-core processors and GPU boards, computers architectures have many parallel layers. Today, there are parallelism inside the processor, among processors and among computers. In order to use the best performance of the computers it is necessary to consider all parallel levels to distribute a concurrent application. However, nothing parallel programming interface abstracts all these different parallel levels. Based in this context, this thesis investigates how to explore different levels of parallelism in climatological models using mixed interfaces of parallel programming and how these models can provide mesh refinement at execution time. The performance results show that is possible to reduce the execution time of atmospheric simulations using different levels of parallelism, through the combined use of parallel programming interfaces. Higher performance for the execution of atmospheric applications that use online mesh refinement was also provided. Therefore, more mesh resolution to describe the Earth’s atmosphere can be adopted, and consequently the numerical forecasts are more accurate.
442

Computação paralela na análise de problemas de engenharia utilizando o Método dos Elementos Finitos

Masuero, Joao Ricardo January 2009 (has links)
O objetivo deste trabalho é estudar algoritmos paralelos para a solução de problemas de Mecânica dos Sólidos, Mecânica dos Fluídos e Interação Fluido-Estrutura empregando o Método dos Elementos Finitos para uso em configurações de memória distribuída e compartilhada. Dois processos para o particionamento da estrutura de dados entre os processadores e divisão de tarefas foram desenvolvidos baseados na aplicação do método de particionamento em faixas e do método da bissecção coordenada recursiva não sobre a geometria da malha mas sim diretamente sobre o sistema de equações, através de reordenações nodais para minimização da largura da banda. Para ordenar a comunicação entre os processadores, foi desenvolvido um algoritmo simples e genérico baseado em uma ordenação circular e alternada que permite a organização eficiente dos processos mesmo em cenários nos quais cada processador precisa trocar dados com todos os demais. Os algoritmos selecionados foram todos do tipo iterativo, por sua adequabilidade ao paralelismo de memória distribuída. Foram desenvolvidos códigos paralelos para o Método dos Gradientes Conjugados utilizado em problemas de Mecânica dos Sólidos, para o esquema explícito de Taylor-Galerkin com um passo e iterações utilizado na simulação de escoamentos compressíveis em regime transônico e supersônico, para o esquema explícito de Taylor- Galerkin com 2 passos para simulação de escoamentos incompressíveis em regime subsônico e para interação fluído-estrutura usando o esquema explícito de dois passos para o fluído e o método implícito de Newmark no contexto do método de estabilização α-Generalizado para a estrutura, com acoplamento particionado. Numerosas configurações foram testadas com problemas tridimensionais utilizando elementos tetraédricos e hexaédricos em clusters temporários e permanentes, homogêneos e heterogêneos, com diferentes tamanhos de problemas, diferentes números de computadores e diferentes velocidades de rede. / Analysis and development of distributed memory parallel algorithms for the solution of Solid Mechanics, Fluid Mechanics and Fluid-Structure Interaction problems using the Finite Element Method is the main goal of this work. Two process for mesh partitioning and task division were developed, based in the Stripwise Partitioning and the Recursive Coordinate Bisection Methods, but applied not over the mesh geometry but over the resultant system of equations through a nodal ordering algorithm for system bandwidth minimization. To schedule the communication tasks in scenarios where each processor must exchange data with all others in the cluster, a simple and generic algorithm based in a circular an alternate ordering was developed. The algorithms selected to be parallelized were of iterative types due to their suitability for distributed memory parallelism. Parallel codes were developed for the Conjugate Gradient Method ( for Solid Mechanics analysis), for the explicit one-step scheme of Taylor-Galerkin method (for transonic and supersonic compressible flow analysis), for the two-step explicit scheme of Taylor-Galerkin method (for subsonic incompressible flow analysis) and for a Fluid-Structure Interaction algorithm using a coupling model based on a partitioned scheme. Explicit two-step scheme of Taylor-Galerkin were employed for the fluid and the implicit Newmark algorithm for the structure. Several configurations were tested for three-dimensional problems using tetrahedral and hexahedral elements in uniform and nonuniform clusters and grids, with several sizes of meshes, numbers of computers and network speeds.
443

Integração de bibliotecas científicas de propósito especial em uma plataforma de componentes paralelos / Integration of special purpose scientific libraries on a platform of parallel components

Ferreira, Davi Morais January 2010 (has links)
FERREIRA, Davi Morais. Integração de bibliotecas científicas de propósito especial em uma plataforma de componentes paralelos. 2010. 145 f. : Dissertação (mestrado) - Universidade Federal do Ceará, Centro de Ciências, Departamento de Computação, Fortaleza-CE, 2010. / Submitted by guaracy araujo (guaraa3355@gmail.com) on 2016-06-16T17:50:44Z No. of bitstreams: 1 2010_dis_dmf.pdf: 1977126 bytes, checksum: 8f6276f7e40d8f3dbdca5deb5a0a8447 (MD5) / Approved for entry into archive by guaracy araujo (guaraa3355@gmail.com) on 2016-06-16T17:51:57Z (GMT) No. of bitstreams: 1 2010_dis_dmf.pdf: 1977126 bytes, checksum: 8f6276f7e40d8f3dbdca5deb5a0a8447 (MD5) / Made available in DSpace on 2016-06-16T17:51:57Z (GMT). No. of bitstreams: 1 2010_dis_dmf.pdf: 1977126 bytes, checksum: 8f6276f7e40d8f3dbdca5deb5a0a8447 (MD5) Previous issue date: 2010 / The contribution of traditional scienti c libraries shows to be consolidated in the construction of high-performance applications. However, such an artifact of development possesses some limitations in integration, productivity in large-scale applications, and exibility for changes in the context of the problem. On the other hand, the development technology based on components recently proposed a viable alternative for the architecture of High-Performance Computing (HPC) applications, which has provided a means to overcome these challenges. Thus we see that the scienti c libraries and programming orientated at components are complementary techniques in the improvement of the development process of modern HPC applications. Accordingly, this work aims to propose a systematic method for the integration of scienti c libraries on a platform of parallel components, HPE (Hash Programming Environment), to o er additional advantageous aspects for the use of components and scienti c libraries to developers of parallel programs that implement high-performance applications. The purpose of this work goes beyond the construction of a simple encapsulation of the library in a component; it aims to provide the bene ts in integration, productivity in large-scale applications, and the exibility for changes in the context of a problem in the use of scienti c libraries. As a way to illustrate and validate the method, we have incorporated the libraries of linear systems solvers to HPE, electing three signi cant representatives: PETSc, Hypre, e SuperLU. / A contribuição das tradicionais bibliotecas cientí cas mostra-se consolidada na construção de aplicações de alto desempenho. No entanto, tal artefato de desenvolvimento possui algumas limitações de integração, de produtividade em aplicações de larga escala e de exibilidade para mudanças no contexto do problema. Por outro lado, a tecnologia de desenvolvimento baseada em componentes, recentemente proposta como alternativa viável para a arquitetura de aplicações de Computação de Alto Desempenho (CAD), tem fornecido meios para superar esses desa os. Vemos assim, que as bibliotecas cientí cas e a programação orientada a componentes são técnicas complementares na melhoria do processo de desenvolvimento de aplicações modernas de CAD. Dessa forma, este trabalho tem por objetivo propor um método sistemático para integração de bibliotecas cientí cas sobre a plataforma de componentes paralelos HPE (Hash Programming Environment ), buscando oferecer os aspectos vantajosos complementares do uso de componentes e de bibliotecas cientí cas aos desenvolvedores de programas paralelos que implementam aplicações de alto desempenho. A proposta deste trabalho vai além da construção de um simples encapsulamento da biblioteca em um componente, visa proporcionar ao uso das bibliotecas cientí cas os benefícios de integração, de produtividade em aplicações de larga escala e da exibilidade para mudanças no contexto do problema. Como forma de exempli car e validar o método, temos incorporado bibliotecas de resolução de sistemas lineares ao HPE, elegendo três representantes significativos: PETSc, Hypre e SuperLU.
444

Um ambiente computacional de alto desempenho para cálculo de deslocamento usando correlação de imagens digitais. / A high-performance computing enviroment for displacement using digital image correlation.

Várady Filho, Christiano Augusto Ferrario 04 April 2016 (has links)
This work proposes a high performance computing environment using digital image correlation techniques to determine physical quantities associated with engineering problems. Software ar- chitecture supports this computing environment, integrating several advanced technologies for calculation of displacement and strain fields of structural elements from testing. The method- ology applies the study of concepts, formulations and techniques for image processing, digital image correlation, high performance computing and software architecture. The methodology also includes specific procedures in a single environment for the evaluation of physical quanti- ties. Among the main procedures used in the presented software architecture, one can cite the digital image correlation techniques known as Full-Field and Subset, non-linear optimization methods and two-dimensional interpolations. In addition, high performance computing strate- gies are included into the computing environment to achieve performance speed-ups on evalu- ating the displacement fields using digital image correlation. Comparisons with Scale Invariant Feature Transform and Q4-DIC are also evaluated. Following, the development of a computer prototype has the purpose of validating the presented high performance environment, allowing the calculation of physical quantities in structural elements through the correlation of digital images. Then, submission of case studies into the prototype validates data acquired from tech- nologies built into the prototype, including quantitative analysis of results and measurement of computational time. / Conselho Nacional de Desenvolvimento Científico e Tecnológico / O presente trabalho apresenta um ambiente computacional de alto desempenho que utiliza téc- nicas de correlação de imagens digitais para determinação de grandezas físicas associadas a problemas de engenharia. Sua arquitetura computacional integra diversas tecnologias avança- das para o cálculo dos campos de deslocamentos de elementos estruturais a partir de ensaios. A metodologia utilizada prevê o estudo de conceitos, formulações e técnicas de arquitetura de software, processamento de imagens, correlação de imagens digitais e computação de alto desempenho, além de incorporar procedimentos específicos em um ambiente único para a de- terminação de grandezas físicas. Dentre os principais procedimentos utilizados na arquitetura apresentada, pode-se citar as abordagens locais e globais de correlação de imagens digitais, mé- todos de otimização não-linear e interpolações bidimensionais. Também são incorporadas ao ambiente computacional estratégias de computação de alto desempenho para alcançar ganhos de performance na determinação dos campos de deslocamentos e deformações usando a corre- lação de imagens digitais. Uma comparação com o método Scale Invariant Feature Transform e com o método Q4-DIC de análise de imagens também são realizadas. Um protótipo compu- tacional é desenvolvido com o objetivo de validar o ambiente de alto desempenho apresentado, permitindo o monitoramento de elementos estruturais através da correlação de imagens digitais. Também são realizados estudos de casos que permitem a verificação de tecnologias incorpora- das ao protótipo apresentado, incluindo análises quantitativas de resultados e medição de tempo computacional.
445

Co-scheduling for large-scale applications : memory and resilience / Ordonnancement concurrent d’applications à grande échelle : mémoire et résilience

Pottier, Loïc 18 September 2018 (has links)
Cette thèse explore les problèmes liés à l'ordonnancement concurrent dans le contexte des applications massivement parallèle, de deux points de vue: le coté mémoire (en particulier la mémoire cache) et le coté tolérance aux fautes.Avec l'avènement récent des architectures dites many-core, tels que les récents processeurs multi-coeurs, le nombre d'unités de traitement augmente de manière importante.Dans ce contexte, les avantages fournis par les techniques d'ordonnancements concurrents ont été démontrés à travers de nombreuses études.L'ordonnancement concurrent, aussi appelé co-ordonnancement, consiste à exécuter les applications de manière concurrente plutôt que les unes après les autres, dans le but d'améliorer le débit global de la plateforme.Mais le partage des ressources peut souvent générer des interférences.Une des solutions pour réduire de manière importante ces interférences est le partitionnement de cache.À travers un modèle théorique, des simulations et des expériences sur une plateforme existante, nous montrons l'utilité et l'importance du co-ordonnancement quand nos stratégies de partitionnement de cache sont utilisées.De plus, avec ce nombre croissant de processeurs, la probabilité d'une panne augmente également.L'efficacité des techniques de co-ordonnancement a été démontrée dans un contexte sans pannes, mais les plateformes massivement parallèles sont confrontées à des pannes fréquentes, et des techniques de tolérance aux fautes doivent être mise en place pour améliorer l'efficacité de ces plateformes.Nous étudions la complexité du problème avec un modèle théorique, nous concevons des heuristiques et nous effectuons un ensemble complet de simulations avec un simulateur de pannes, qui démontre l'efficacité des heuristiques proposées. / This thesis explores co-scheduling problems in the context of large-scale applications with two main focus: the memory side, in particular the cache memory and the resilience side.With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units is increasing.In this context, the benefits of co-scheduling techniques have been demonstrated. Recall that, the main idea behind co-scheduling is to execute applications concurrently rather than in sequence in order to improve the global throughput of the platform.But sharing resources often generates interferences.With the arising number of processing units accessing to the same last-level cache, those interferences among co-scheduled applications becomes critical.In addition, with that increasing number of processors the probability of a failure increases too.Resiliency aspects must be taking into account, specially for co-scheduling because failure-prone resources might be shared between applications.On the memory side, we focus on the interferences in the last-level cache, one solution used to reduce these interferences is the cache partitioning.Extensive simulations demonstrate the usefulness of co-scheduling when our efficient cache partitioning strategies are deployed.We also investigate the same problem on a real cache partitioned chip multiprocessors, using the Cache Allocation Technology recently provided by Intel.In a second time, still on the memory side, we study how to model and schedule task graphs on the new many-core architectures, such as Knights Landing architecture.These architectures offer a new level in the memory hierarchy through a new on-packagehigh-bandwidth memory. Current approaches usually do not take intoaccount this new memory level, however new scheduling algorithms anddata partitioning schemes are needed to take advantage of this deepmemory hierarchy.On the resilience, we explore the impact on failures on co-scheduling performance.The co-scheduling approach has been demonstrated in a fault-free context, but large-scale computer systems are confronted by frequent failures, and resilience techniques must be employed for large applications to execute efficiently. Indeed, failures may create severe imbalance between applications, and significantly degrade performance.We aim at minimizing the expected completion time of a set of co-scheduled applications in a failure-prone context by redistributing processors.
446

Towards brain-scale modelling of the human cerebral blood flow : hybrid approach and high performance computing / Vers une modélisation de l’écoulement sanguin cérébral humain à l’échelle du cerveau : approche hybride et calcul haute performance

Peyrounette, Myriam 25 October 2017 (has links)
La microcirculation cérébrale joue un rôle clé dans la physiologie cérébrale. Lors de maladies dégénératives comme celle d’Alzheimer, la détérioration des réseaux microvasculaires (e.g. occlusions et baisse de densité vasculaires) limite l’afflux sanguin vers le cortex. La réduction associée de l’apport en oxygène et nutriments risque de provoquer la mort de neurones. En complément des techniques d’imagerie médicale, la modélisation est un outil précieux pour comprendre l’impact de telles variations structurelles sur l’écoulement sanguin et les transferts de masse. Dans la microcirculation cérébrale, le lit capillaire contient les plus petits vaisseaux (diamètre de 1-10 μm) et présente une structure maillée, au sein du tissu cérébral. C’est le lieu principal des échanges moléculaires entre le sang et les neurones. Le lit capillaire est alimenté et drainé par les arbres artériolaires et veinulaires (diamètre de 10-100 μm). Depuis quelques décennies, les approches “réseau” ont significativement amélioré notre compréhension de l’écoulement sanguin, du transport de masse et des mécanismes de régulation dans la microcirculation cérébrale humaine. Cependant, d’un point de vue numérique, la densité des capillaires limite ces approches à des volumes relativement petits (<100 mm3). Cette contrainte empêche leur application à des échelles cliniques, puisque les techniques d’imagerie médicale permettent d’acquérir des volumes bien plus importants (∼100 cm3), avec une résolution de 1-10 mm. Pour réduire ce coût numérique, nous présentons une approche hybride pour la modélisation de l’écoulement dans laquelle les capillaires sont remplacés par un milieu continu. Cette substitution a du sens puisque le lit capillaire est dense et homogène à partir d’une longueur de coupure de ∼50 μm. Dans ce continuum, l’écoulement est caractérisé par des propriétés effectives (e.g. perméabilité) à l’échelle d’un volume représentatif plus grand. De plus, le continuum est discrétisé par la méthode des volumes finis sur un maillage grossier, ce qui induit un gain numérique important. Les arbres artério- et veinulaires ne peuvent être homogénéisés à cause de leur structure quasi-fractale. Nous appliquons donc une approche “réseau” standard dans les vaisseaux les plus larges. La principale difficulté de l’approche hybride est de développer un modèle de couplage aux points où les vaisseaux artério- et veinulaires sont connectés au continuum. En effet, de forts gradients de pression apparaissent à proximité de ces points, et doivent être homogénéisés proprement à l’échelle du continuum. Ce genre de couplage multi-échelle n’a jamais été introduit dans le contexte de la microcirculation cérébrale. Nous nous inspirons ici du "modèle de puits" développé par Peaceman pour l’ingénierie pétrolière, en utilisant des solutions analytiques du champ des pressions dans le voisinage des points de couplage. Les équations obtenues forment un unique système linéaire à résoudre pour l’ensemble du domaine d’étude. Nous validons l’approche hybride par comparaison avec une approche “réseau” classique, pour des architectures synthétiques simples qui n’impliquent qu’un ou deux couplages, et pour des structures plus complexes qui impliquent des arbres artério- et veinulaires anatomiques avec un grand nombre de couplages. Nous montrons que cette approche est fiable, puisque les erreurs relatives en pression sont faibles (<6 %). Cela ouvre la voie à une complexification du modèle (e.g. hématocrite non uniforme). Dans une perspective de simulations à grande échelle et d’extension au transport de masse, l’approche hybride a été implémentée dans un code C++ conçu pour le calcul haute performance. Ce code a été entièrement parallélisé en utilisant les standards MPI et des librairies spécialisées (e.g. PETSc). Ce travail faisant partie d’un projet plus large impliquant plusieurs collaborateurs, une attention particulière a été portée à l’établissement de stratégies d’implémentation efficaces. / The brain microcirculation plays a key role in cerebral physiology and neuronal activation. In the case of degenerative diseases such as Alzheimer’s, severe deterioration of the microvascular networks (e.g. vascular occlusions) limit blood flow, thus oxygen and nutrients supply, to the cortex, eventually resulting in neurons death. In addition to functional neuroimaging, modelling is a valuable tool to investigate the impact of structural variations of the microvasculature on blood flow and mass transfers. In the brain microcirculation, the capillary bed contains the smallest vessels (1-10 μm in diameter) and presents a mesh-like structure embedded in the cerebral tissue. This is the main place of molecular exchange between blood and neurons. The capillary bed is fed and drained by larger arteriolar and venular tree-like vessels (10-100 μm in diameter). For the last decades, standard network approaches have significantly advanced our understanding of blood flow, mass transport and regulation mechanisms in the human brain microcirculation. By averaging flow equations over the vascular cross-sections, such approaches yield a one-dimensional model that involves much fewer variables compared to a full three-dimensional resolution of the flow. However, because of the high density of capillaries, such approaches are still computationally limited to relatively small volumes (<100 mm3). This constraint prevents applications at clinically relevant scales, since standard imaging techniques only yield much larger volumes (∼100 cm3), with a resolution of 1-10 mm3. To get around this computational cost, we present a hybrid approach for blood flow modelling where the capillaries are replaced by a continuous medium. This substitution makes sense since the capillary bed is dense and space-filling over a cut-off length of ∼50 μm. In this continuum, blood flow is characterized by effective properties (e.g. permeability) at the scale of a much larger representative volume. Furthermore, the domain is discretized on a coarse grid using the finite volume method, inducing an important computational gain. The arteriolar and venular trees cannot be homogenized because of their quasi-fractal structure, thus the network approach is used to model blood flow in the larger vessels. The main difficulty of the hybrid approach is to develop a proper coupling model at the points where arteriolar or venular vessels are connected to the continuum. Indeed, high pressure gradients build up at capillary-scale in the vicinity of the coupling points, and must be properly described at the continuum-scale. Such multiscale coupling has never been discussed in the context of brain microcirculation. Taking inspiration from the Peaceman “well model” developed for petroleum engineering, our coupling model relies on to use analytical solutions of the pressure field in the neighbourhood of the coupling points. The resulting equations yield a single linear system to solve for both the network part and the continuum (strong coupling). The accuracy of the hybrid model is evaluated by comparison with a classical network approach, for both very simple synthetic architectures involving no more than two couplings, and more complex ones, with anatomical arteriolar and venular trees displaying a large number of couplings. We show that the present approach is very accurate, since relative pressure errors are lower than 6 %. This lays the goundwork for introducing additional levels of complexity in the future (e.g. non uniform hematocrit). In the perspective of large-scale simulations and extension to mass transport, the hybrid approach has been implemented in a C++ code designed for High Performance Computing. It has been fully parallelized using Message Passing Interface standards and specialized libraries (e.g. PETSc). Since the present work is part of a larger project involving several collaborators, special care has been taken in developing efficient coding strategies.
447

A study on block flexible iterative solvers with applications to Earth imaging problem in geophysics / Étude de méthodes itératives par bloc avec application à l’imagerie sismique en géophysique

Ferreira Lago, Rafael 13 June 2013 (has links)
Les travaux de ce doctorat concernent le développement de méthodes itératives pour la résolution de systèmes linéaires creux de grande taille comportant de nombreux seconds membres. L’application visée est la résolution d’un problème inverse en géophysique visant à reconstruire la vitesse de propagation des ondes dans le sous-sol terrestre. Lorsque de nombreuses sources émettrices sont utilisées, ce problème inverse nécessite la résolution de systèmes linéaires complexes non symétriques non hermitiens comportant des milliers de seconds membres. Dans le cas tridimensionnel ces systèmes linéaires sont reconnus comme difficiles à résoudre plus particulièrement lorsque des fréquences élevées sont considérées. Le principal objectif de cette thèse est donc d’étendre les développements existants concernant les méthodes de Krylov par bloc. Nous étudions plus particulièrement les techniques de déflation dans le cas multiples seconds membres et recyclage de sous-espace dans le cas simple second membre. Des gains substantiels sont obtenus en terme de temps de calcul par rapport aux méthodes existantes sur des applications réalistes dans un environnement parallèle distribué. / This PhD thesis concerns the development of flexible Krylov subspace iterative solvers for the solution of large sparse linear systems of equations with multiple right-hand sides. Our target application is the solution of the acoustic full waveform inversion problem in geophysics associated with the phenomena of wave propagation through an heterogeneous model simulating the subsurface of Earth. When multiple wave sources are being used, this problem gives raise to large sparse complex non-Hermitian and nonsymmetric linear systems with thousands of right-hand sides. Specially in the three-dimensional case and at high frequencies, this problem is known to be difficult. The purpose of this thesis is to develop a flexible block Krylov iterative method which extends and improves techniques already available in the current literature to the multiple right-hand sides scenario. We exploit the relations between each right-hand side to accelerate the convergence of the overall iterative method. We study both block deflation and single right-hand side subspace recycling techniques obtaining substantial gains in terms of computational time when compared to other strategies published in the literature, on realistic applications performed in a parallel environment.
448

Efficient large electromagnetic simulation based on hybrid TLM and modal approach on grid computing and supercomputer / Parallélisation, déploiement et adaptation automatique de la simulation électromagnétique sur une grille de calcul

Alexandru, Mihai 14 December 2012 (has links)
Dans le contexte des Sciences de l’Information et de la Technologie, un des challenges est de créer des systèmes de plus en plus petits embarquant de plus en plus d’intelligence au niveau matériel et logiciel avec des architectures communicantes de plus en plus complexes. Ceci nécessite des méthodologies robustes de conception afin de réduire le cycle de développement et la phase de prototypage. Ainsi, la conception et l’optimisation de la couche physique de communication est primordiale. La complexité de ces systèmes rend difficile leur optimisation notamment à cause de l’explosion du nombre des paramètres inconnus. Les méthodes et outils développés ces dernières années seront à terme inadéquats pour traiter les problèmes qui nous attendent. Par exemple, la propagation des ondes dans une cabine d’avion à partir des capteurs ou même d’une antenne, vers le poste de pilotage est grandement affectée par la présence de la structure métallique des sièges à l’intérieur de la cabine, voir les passagers. Il faut, donc, absolument prendre en compte cette perturbation pour prédire correctement le bilan de puissance entre l’antenne et un possible récepteur. Ces travaux de recherche portent sur les aspects théoriques et de mise en oeuvre pratique afin de proposer des outils informatiques pour le calcul rigoureux de la réflexion des champs électromagnétiques à l’intérieur de très grandes structures . Ce calcul implique la solution numérique de très grands systèmes inaccessibles par des ressources traditionnelles. La solution sera basée sur une grille de calcul et un supercalculateur. La modélisation électromagnétique des structures surdimensionnées par plusieurs méthodes numériques utilisant des nouvelles ressources informatiques, hardware et software, pour dérouler des calculs performants, représente le but de ce travail. La modélisation numérique est basée sur une approche hybride qui combine la méthode Transmission-Line Matrix (TLM) et l’approche modale. La TLM est appliquée aux volumes homogènes, tandis que l’approche modale est utilisée pour décrire les structures planaires complexes. Afin d’accélérer la simulation, une implémentation parallèle de l’algorithme TLM dans le contexte du paradigme de calcul distribué est proposé. Le sous-domaine de la structure qui est discrétisé avec la TLM est divisé en plusieurs parties appelées tâches, chacune étant calculée en parallèle par des processeurs différents. Pour accomplir le travail, les tâches communiquent entre elles au cours de la simulation par une librairie d’échange de messages. Une extension de l’approche modale avec plusieurs modes différents a été développée par l’augmentation de la complexité des structures planaires. Les résultats démontrent les avantages de la grille de calcul combinée avec l’approche hybride pour résoudre des grandes structures électriques, en faisant correspondre la taille du problème avec le nombre de ressources de calcul utilisées. L’étude met en évidence le rôle du schéma de parallélisation, cluster versus grille, par rapport à la taille du problème et à sa répartition. En outre, un modèle de prédiction a été développé pour déterminer les performances du calcul sur la grille, basé sur une approche hybride qui combine une prédiction issue d’un historique d’expériences avec une prédiction dérivée du profil de l’application. Les valeurs prédites sont en bon accord avec les valeurs mesurées. L’analyse des performances de simulation a permis d’extraire des règles pratiques pour l’estimation des ressources nécessaires pour un problème donné. En utilisant tous ces outils, la propagation du champ électromagnétique à l’intérieur d’une structure surdimensionnée complexe, telle qu’une cabine d’avion, a été effectuée sur la grille et également sur le supercalculateur. Les avantages et les inconvénients des deux environnements sont discutés. / In the context of Information Communications Technology (ICT), the major challenge is to create systems increasingly small, boarding more and more intelligence, hardware and software, including complex communicating architectures. This requires robust design methodologies to reduce the development cycle and prototyping phase. Thus, the design and optimization of physical layer communication is paramount. The complexity of these systems makes them difficult to optimize, because of the explosion in the number of unknown parameters. The methods and tools developed in past years will be eventually inadequate to address problems that lie ahead. Communicating objects will be very often integrated into cluttered environments with all kinds of metal structures and dielectric larger or smaller sizes compared to the wavelength. The designer must anticipate the presence of such barriers in the propagation channel to establish properly link budgets and an optimal design of the communicating object. For example, the wave propagation in an airplane cabin from sensors or even an antenna, towards the cockpit is greatly affected by the presence of the metal structure of the seats inside the cabin or even the passengers. So, we must absolutely take into account this perturbation to predict correctly the power balance between the antenna and a possible receiver. More generally, this topic will address the theoretical and computational electromagnetics in order to propose an implementation of informatics tools for the rigorous calculation of electromagnetic scattering inside very large structures or radiation antenna placed near oversized objects. This calculation involves the numerical solution of very large systems inaccessible by traditional resources. The solution will be based on grid computing and supercomputers. Electromagnetic modeling of oversized structures by means of different numerical methods, using new resources (hardware and software) to realize yet more performant calculations, is the aim of this work. The numerical modeling is based on a hybrid approach which combines Transmission-Line Matrix (TLM) and the mode matching methods. The former is applied to homogeneous volumes while the latter is used to describe complex planar structures. In order to accelerate the simulation, a parallel implementation of the TLM algorithm in the context of distributed computing paradigm is proposed. The subdomain of the structure which is discretized upon TLM is divided into several parts called tasks, each one being computed in parallel by different processors. To achieve this, the tasks communicate between them during the simulation by a message passing library. An extension of the modal approach to various modes has been developped by increasing the complexity of the planar structures. The results prove the benefits of the combined grid computing and hybrid approach to solve electrically large structures, by matching the size of the problem with the number of computing resources used. The study highlights the role of parallelization scheme, cluster versus grid, with respect to the size of the problem and its repartition. Moreover, a prediction model for the computing performances on grid, based on a hybrid approach that combines a historic-based prediction and an application profile-based prediction, has been developped. The predicted values are in good agreement with the measured values. The analysis of the simulation performances has allowed to extract practical rules for the estimation of the required resources for a given problem. Using all these tools, the propagation of the electromagnetic field inside a complex oversized structure such an airplane cabin, has been performed on grid and also on a supercomputer. The advantages and disadvantages of the two environments are discussed.
449

A simulation workflow to evaluate the performance of dynamic load balancing with over decomposition for iterative parallel applications

Tesser, Rafael Keller January 2018 (has links)
Nesta tese é apresentado um novo workflow de simulação para avaliar o desempenho do balanceamento de carga dinâmico baseado em sobre-decomposição aplicado a aplicações paralelas iterativas. Seus objetivos são realizar essa avaliação com modificações mínimas da aplicação e a baixo custo em termos de tempo e de sua necessidade de recursos computacionais. Muitas aplicações paralelas sofrem com desbalanceamento de carga dinâmico (temporal) que não pode ser tratado a nível de aplicação. Este pode ser causado por características intrínsecas da aplicação ou por fatores externos de hardware ou software. Como demonstrado nesta tese, tal desbalanceamento é encontrado mesmo em aplicações cujo código não aparenta qualquer dinamismo. Portanto, faz-se necessário utilizar mecanismo de balanceamento de carga dinâmico a nível de runtime. Este trabalho foca no balanceamento de carga dinâmico baseado em sobre-decomposição. No entanto, avaliar e ajustar o desempenho de tal técnica pode ser custoso. Isso geralmente requer modificações na aplicação e uma grande quantidade de execuções para obter resultados estatisticamente significativos com diferentes combinações de parâmetros de balanceamento de carga Além disso, para que essas medidas sejam úteis, são usualmente necessárias grandes alocações de recursos em um sistema de produção. Simulated Adaptive MPI (SAMPI), nosso workflow de simulação, emprega uma combinação de emulação sequencial e replay de rastros para reduzir os custos dessa avaliação. Tanto emulação sequencial como replay de rastros requerem um único nó computacional. Além disso, o replay demora apenas uma pequena fração do tempo de uma execução paralela real da aplicação. Adicionalmente à simulação de balanceamento de carga, foram desenvolvidas técnicas de agregação espacial e rescaling a nível de aplicação, as quais aceleram o processo de emulação. Para demonstrar os potenciais benefícios do balanceamento de carga dinâmico com sobre-decomposição, foram avaliados os ganhos de desempenho empregando essa técnica a uma aplicação iterativa paralela da área de geofísica (Ondes3D). Adaptive MPI (AMPI) foi utilizado para prover o suporte a balanceamento de carga dinâmico, resultando em ganhos de desempenho de até 36.58% em 288 cores de um cluster Essa avaliação também é usada pra ilustrar as dificuldades encontradas nesse processo, assim justificando o uso de simulação para facilitá-la. Para implementar o workflow SAMPI, foi utilizada a interface SMPI do simulador SimGrid, tanto no modo de emulação, como no de replay de rastros. Para validar esse simulador, foram comparadas execuções simuladas (SAMPI) e reais (AMPI) da aplicação Ondes3D. As simulações apresentaram uma evolução do balanceamento de carga bastante similar às execuções reais. Adicionalmente, SAMPI estimou com sucesso a melhor heurística de balanceamento de carga para os cenários testados. Além dessa validação, nesta tese é demonstrado o uso de SAMPI para exploração de parâmetros de balanceamento de carga e para planejamento de capacidade computacional. Quanto ao desempenho da simulação, estimamos que o workflow completo é capaz de simular a execução do Ondes3D com 24 combinações de parâmetros de balanceamento de carga em 5 horas para o nosso cenário de terremoto mais pesado e 3 horas para o mais leve. / In this thesis we present a novel simulation workflow to evaluate the performance of dynamic load balancing with over-decomposition applied to iterative parallel applications at low-cost. Its goals are to perform such evaluation with minimal application modification and at a low cost in terms of time and of resource requirements. Many parallel applications suffer from dynamic (temporal) load imbalance that can not be treated at the application level. It may be caused by intrinsic characteristics of the application or by external software and hardware factors. As demonstrated in this thesis, such dynamic imbalance can be found even in applications whose codes do not hint at any dynamism. Therefore, we need to rely on runtime dynamic load balancing mechanisms, such as dynamic load balancing based on over-decomposition. The problem is that evaluating and tuning the performance of such technique can be costly. This usually entails modifications to the application and a large number of executions to get statistically sound performance measurements with different load balancing parameter combinations. Moreover, useful and accurate measurements often require big resource allocations on a production cluster. Our simulation workflow, dubbed Simulated Adaptive MPI (SAMPI), employs a combined sequential emulation and trace-replay simulation approach to reduce the cost of such an evaluation Both sequential emulation and trace-replay require a single computer node. Additionally, the trace-replay simulation lasts a small fraction of the real-life parallel execution time of the application. Besides the basic SAMPI simulation, we developed spatial aggregation and applicationlevel rescaling techniques to speed-up the emulation process. To demonstrate the real-life performance benefits of dynamic load balance with over-decomposition, we evaluated the performance gains obtained by employing this technique on a iterative parallel geophysics application, called Ondes3D. Dynamic load balancing support was provided by Adaptive MPI (AMPI). This resulted in up to 36.58% performance improvement, on 288 cores of a cluster. This real-life evaluation also illustrates the difficulties found in this process, thus justifying the use of simulation. To implement the SAMPI workflow, we relied on SimGrid’s Simulated MPI (SMPI) interface in both emulation and trace-replay modes.To validate our simulator, we compared simulated (SAMPI) and real-life (AMPI) executions of Ondes3D. The simulations presented a load balance evolution very similar to real-life and were also successful in choosing the best load balancing heuristic for each scenario. Besides the validation, we demonstrate the use of SAMPI for load balancing parameter exploration and for computational capacity planning. As for the performance of the simulation itself, we roughly estimate that our full workflow can simulate the execution of Ondes3D with 24 different load balancing parameter combinations in 5 hours for our heavier earthquake scenario and in 3 hours for the lighter one.
450

Desenvolvimento de um simulador para espectrometria por fluorescência de raios X usando computação distribuída / Development of a X-ray fluorescence spectrometry simulator using distributed computing

Marcio Henrique dos Santos 30 March 2012 (has links)
Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro / A Física das Radiações é um ramo da Física que está presente em diversas áreas de estudo e se relaciona ao conceito de espectrometria. Dentre as inúmeras técnicas espectrométricas existentes, destaca-se a espectrometria por fluorescência de raios X. Esta também possui uma gama de variações da qual pode-se dar ênfase a um determinado subconjunto de técnicas. A produção de fluorescência de raios X permite (em certos casos) a análise das propriedades físico-químicas de uma amostra específica, possibilitando a determinação de sua constituiçõa química e abrindo um leque de aplicações. Porém, o estudo experimental pode exigir uma grande carga de trabalho, tanto em termos do aparato físico quanto em relação conhecimento técnico. Assim, a técnica de simulação entra em cena como um caminho viável, entre a teoria e a experimentação. Através do método de Monte Carlo, que se utiliza da manipulação de números aleatórios, a simulação se mostra como uma espécie de alternativa ao trabalho experimental.Ela desenvolve este papel por meio de um processo de modelagem, dentro de um ambiente seguro e livre de riscos. E ainda pode contar com a computação de alto desempenho, de forma a otimizar todo o trabalho por meio da arquitetura distribuída. O objetivo central deste trabalho é a elaboração de um simulador computacional para análise e estudo de sistemas de fluorescência de raios X desenvolvido numa plataforma de computação distribuída de forma nativa com o intuito de gerar dados otimizados. Como resultados deste trabalho, mostra-se a viabilidade da construção do simulador através da linguagem CHARM++, uma linguagem baseada em C++ que incorpora rotinas para processamento distribuído, o valor da metodologia para a modelagem de sistemas e a aplicação desta na construção de um simulador para espectrometria por fluorescência de raios X. O simulador foi construído com a capacidade de reproduzir uma fonte de radiação eletromagnética, amostras complexas e um conjunto de detectores. A modelagem dos detectores incorpora a capacidade de geração de imagens baseadas nas contagens registradas. Para validação do simulador, comparou-se os resultados espectrométricos com os resultados gerados por outro simulador já validado: o MCNP. / Radiation Physics is a branch of Physics that is present in various studying areas and relates to the concept of spectrometry. Among the numerous existing spectrometry techniques, there is the X-ray fluorescence spectrometry. It also has a range of variations which can emphasize a particular subset of techniques. The production of X-ray fluorescence enables (in some cases) the analysis of physical and chemical properties of a given sample, allowing the determination of its chemical constitution and also a range of applications. However, the experimental analysis may require a large workload, both in terms of physical apparatus and in relation to technical knowledge. Thus, the simulation comes into play as a viable path between theory and experiment. Through the Monte Carlo method, which uses the manipulation of random numbers, the simulation is a kind of alternative to the experimental analysis. It develops this role by a modeling process, within a secure environment and risk free. And it can count on high performance computing in order to optimize all the work through the distributed architecture. The aim of this paper is the development of a computational simulator for analysis and studying of X-ray fluorescence systems developed on a communication platform distributed natively, in order to generate optimal data. As results, has been proved the viability of the simulator implementation through the CHARM++ language, a language based on C++ which incorporate procedures to distributed processing, the value of the methodology to system modelling e its application to build a simulator for X-ray fluorescence spectrometry. The simulator was built with the ability to reproduce a eletromagnetic radiation source, complex samples and a set of detectors. The modelling of the detectors embody the ability to yield images based on recorded counts. To validate the simulator, the results were compared with the results provided by other known simulator: MCNP.

Page generated in 0.1526 seconds