• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 339
  • 189
  • 134
  • 56
  • 45
  • 44
  • 4
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 924
  • 924
  • 924
  • 404
  • 395
  • 351
  • 351
  • 329
  • 325
  • 320
  • 319
  • 316
  • 314
  • 313
  • 313
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
441

MPI sobre MOM para suportar log de mensagens pessimista remoto / MPI over MOM to support remote pessimistic message logging

Machado, Caciano dos Santos January 2010 (has links)
O aumento crescente no número de processadores das arquiteturas paralelas que estão no topo dos rankings de desempenho, apesar de permitir uma maior capacidade de processamento, também traz consigo um aumento na taxa de falhas diretamente proporcional ao número de processadores. Atualmente, as técnicas de tolerância a falhas com recuperação retroativa são as mais empregadas em aplicações MPI, principalmente a técnica de checkpoint coordenado. No entanto, previsões afirmam que essa última técnica será inadequada para as arquiteturas emergentes. Em contrapartida, as técnicas de log de mensagens possuem características que as tornam mais apropriadas no novo cenário que se estabelece. O presente trabalho consiste em uma proposta de log de mensagens pessimista remoto com checkpoint não-coordenado e a avaliação de desempenho da comunicação MPI sobre Publish/Subscriber no qual se baseia o log de mensagens. O trabalho compreende: um estudo das técnicas de tolerância a falhas mais empregadas em ambientes de alto desempenho e a motivação para a escolha dessa variante de log de mensagens; a proposta de log de mensagens; uma implementação de comunicação Open MPI sobre OpenAMQ e sua respectiva avaliação de desempenho com comunicação tradicional TCP/IP e com o log de mensagens pessimista local da distribuição do Open MPI. Os benchmarks utilizados foram o NetPIPE, o NAS Parallel Benchmarks e a aplicação Virginia Hydrodynamics (VH-1). / The growing number of processors in parallel architectures at the top of performance rankings allows a higher processing capacity. However, it also brings an increase in the fault rate which is directly proportional to the number of processors. Nowadays, coordinated checkpoint is the most widely used rollback technique for system recovery in the occurrence of faults in MPI applications. Nevertheless, projections point that this technique will be inappropriate for the emerging architectures. On the other hand, message logging seems to be more appropriate to this new scenario. This work consists in a proposal of pessimistic message logging (remote based) with non-coordinated checkpoint and the performance evaluation of an MPI communication mechanism that works over Publish/Subscriber channels in which the proposed message logging is based. The work is organized as following: an study of fault tolerant techniques used in HPC and the motivation for choosing this variant of message logging; a message logging proposal; an implementation of Open MPI communication over OpenAMQ; performance evaluation and comparision with the tradicional TCP/IP communication and a pessimistic message logging (sender based) from Open MPI distribution. The benchmark set is composed of NetPIPE, NAS Parallel Benchmarks and Virginia Hydrodynamics (VH-1).
442

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms / Um sistema de escalonamento dinâmico e tuning em tempo de execução para plataformas desktop heterogêneas de múltiplos núcleos

Binotto, Alécio Pedro Delazari January 2011 (has links)
Atualmente, o computador pessoal (PC) moderno poder ser considerado como um cluster heterogênedo de um nodo, o qual processa simultâneamente inúmeras tarefas provenientes das aplicações. O PC pode ser composto por Unidades de Processamento (PUs) assimétricas, como a Unidade Central de Processamento (CPU), composta de múltiplos núcleos, a Unidade de Processamento Gráfico (GPU), composta por inúmeros núcleos e que tem sido um dos principais co-processadores que contribuiram para a computação de alto desempenho em PCs, entre outras. Neste sentido, uma plataforma de execução heterogênea é formada em um PC para efetuar cálculos intensivos em um grande número de dados. Na perspectiva desta tese, a distribuição da carga de trabalho de uma aplicação nas PUs é um fator importante para melhorar o desempenho das aplicações e explorar tal heterogeneidade. Esta questão apresenta desafios uma vez que o custo de execução de uma tarefa de alto nível em uma PU é não-determinístico e pode ser afetado por uma série de parâmetros não conhecidos a priori, como o tamanho do domínio do problema e a precisão da solução, entre outros. Nesse escopo, esta pesquisa de doutorado apresenta um sistema sensível ao contexto e de adaptação em tempo de execução com base em um compromisso entre a redução do tempo de execução das aplicações - devido a um escalonamento dinâmico adequado de tarefas de alto nível - e o custo de computação do próprio escalonamento aplicados em uma plataforma composta de CPU e GPU. Esta abordagem combina um modelo para um primeiro escalonamento baseado em perfis de desempenho adquiridos em préprocessamento com um modelo online, o qual mantém o controle do tempo de execução real de novas tarefas e escalona dinâmicamente e de modo eficaz novas instâncias das tarefas de alto nível em uma plataforma de execução composta de CPU e de GPU. Para isso, é proposto um conjunto de heurísticas para escalonar tarefas em uma CPU e uma GPU e uma estratégia genérica e eficiente de escalonamento que considera várias unidades de processamento. A abordagem proposta é aplicada em um estudo de caso utilizando uma plataforma de execução composta por CPU e GPU para computação de métodos iterativos focados na solução de Sistemas de Equações Lineares que se utilizam de um cálculo de stencil especialmente concebido para explorar as características das GPUs modernas. A solução utiliza o número de incógnitas como o principal parâmetro para a decisão de escalonamento. Ao escalonar tarefas para a CPU e para a GPU, um ganho de 21,77% em desempenho é obtido em comparação com o escalonamento estático de todas as tarefas para a GPU (o qual é utilizado por modelos de programação atuais, como OpenCL e CUDA para Nvidia) com um erro de escalonamento de apenas 0,25% em relação à combinação exaustiva. / A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) - which have become one of the main co-processors that contributed towards high performance computing - and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications - due to appropriate dynamic scheduling of high-level tasks - and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search.
443

Exploiting multiple levels of parallelism and online refinement of unstructured meshes in atmospheric model application

Schepke, Claudio January 2012 (has links)
Previsões meteorológicas para longos períodos de tempo estão se tornando cada vez mais importantes. A preocupação mundial com as consequências da mudança do clima tem estimulado pesquisas para determinar o seu comportamento nas próximas décadas. Ao mesmo tempo, os passos necessários para definir uma melhor modelagem e simulação do clima e/ou tempo estão longe da precisão desejada. Aumentar o refinamento da superfície terrestre e, consequentemente, aumentar o número de pontos discretos (utilizados para a representação da atmosfera) na modelagem climática e precisão das soluções computadas é uma meta que está em conflito com o desempenho das aplicações numéricas. Aplicações que envolvem a interação de longos períodos de tempo e incluem um grande número de operações possuem um tempo de execução inviável para as arquiteturas de computadores tradicionais. Para superar esta situação, um modelo climatológico pode adotar diferentes níveis de refinamento da superfície terrestre, utilizando mais pontos discretos somente em regiões onde uma maior precisão é requerida. Este é o caso de Ocean-Land-AtmosphereModel, que permite o refinamento estático de uma determinada região no início da execução do código. No entanto, um refinamento dinâmico possibilitaria uma melhor compreensão das condições climáticas específicas de qualquer região da superfície terrestre que se tivesse interesse, sem a necessidade de reiniciar a execução da aplicação. Com o surgimento das arquiteturas multi-core e a adoção de GPUs para a computação de propósito geral, existem diferentes níveis de paralelismo. Hoje há paralelismo interno ao processador, entre processadores e entre computadores. Com o objetivo de extrair ao máximo a performance dos computadores atuais, é necessário utilizar todos os níveis de paralelismo disponíveis durante o desenvolvimento de aplicações concorrentes. No entanto, nenhuma interface de programação paralela explora simultaneamente bem os diferentes níveis de paralelismo existentes. Baseado neste contexto, esta tese investiga como explorar diferentes níveis de paralelismo em modelos climatológicos usando interfaces clássicas de programação paralela de forma combinada e como é possível prover refinamento de malhas em tempo de execução para estes modelos. Os resultados obtidos a partir de implementações realizadas mostraram que é possível reduzir o tempo de execução de uma simulação atmosférica utilizando diferentes níveis de paralelismo, através do uso combinado de interfaces de programação paralela. Além disso, foi possível prover maior desempenho na execução de aplicações climatológicas que utilizam refinamento de malhas em tempo de execução. Com isso, uma malha de maior resolução para a representação da atmosfera terrestre pode ser adotada e, consequentemente, as previsões numéricas serão mais precisas. / Weather forecasts for long periods of time has emerged as increasingly important. The global concern with the consequences of climate changes has stimulated researches to determine the climate in coming decades. At the same time the steps needed to better defining the modeling and the simulation of climate/weather is far of the desired accuracy. Upscaling the land surface and consequently to increase the number of points used in climate modeling and the precision of the computed solutions is a goal that conflicts with the performance of numerical applications. Applications that include the interaction of long periods of time and involve a large number of operations become the expectation for results infeasible in traditional computers. To overcome this situation, a climatic model can take different levels of refinement of the Earth’s surface, using more discretized elements only in regions where more precision are required. This is the case of Ocean-Land- Atmosphere Model, which allows the static refinement of a particular region of the Earth in the early execution of the code. However, a dynamic mesh refinement could allow to better understand specific climatic conditions that appear at execution time of any region of the Earth’s surface, without restarting execution. With the introduction of multi-core processors and GPU boards, computers architectures have many parallel layers. Today, there are parallelism inside the processor, among processors and among computers. In order to use the best performance of the computers it is necessary to consider all parallel levels to distribute a concurrent application. However, nothing parallel programming interface abstracts all these different parallel levels. Based in this context, this thesis investigates how to explore different levels of parallelism in climatological models using mixed interfaces of parallel programming and how these models can provide mesh refinement at execution time. The performance results show that is possible to reduce the execution time of atmospheric simulations using different levels of parallelism, through the combined use of parallel programming interfaces. Higher performance for the execution of atmospheric applications that use online mesh refinement was also provided. Therefore, more mesh resolution to describe the Earth’s atmosphere can be adopted, and consequently the numerical forecasts are more accurate.
444

Computação paralela na análise de problemas de engenharia utilizando o Método dos Elementos Finitos

Masuero, Joao Ricardo January 2009 (has links)
O objetivo deste trabalho é estudar algoritmos paralelos para a solução de problemas de Mecânica dos Sólidos, Mecânica dos Fluídos e Interação Fluido-Estrutura empregando o Método dos Elementos Finitos para uso em configurações de memória distribuída e compartilhada. Dois processos para o particionamento da estrutura de dados entre os processadores e divisão de tarefas foram desenvolvidos baseados na aplicação do método de particionamento em faixas e do método da bissecção coordenada recursiva não sobre a geometria da malha mas sim diretamente sobre o sistema de equações, através de reordenações nodais para minimização da largura da banda. Para ordenar a comunicação entre os processadores, foi desenvolvido um algoritmo simples e genérico baseado em uma ordenação circular e alternada que permite a organização eficiente dos processos mesmo em cenários nos quais cada processador precisa trocar dados com todos os demais. Os algoritmos selecionados foram todos do tipo iterativo, por sua adequabilidade ao paralelismo de memória distribuída. Foram desenvolvidos códigos paralelos para o Método dos Gradientes Conjugados utilizado em problemas de Mecânica dos Sólidos, para o esquema explícito de Taylor-Galerkin com um passo e iterações utilizado na simulação de escoamentos compressíveis em regime transônico e supersônico, para o esquema explícito de Taylor- Galerkin com 2 passos para simulação de escoamentos incompressíveis em regime subsônico e para interação fluído-estrutura usando o esquema explícito de dois passos para o fluído e o método implícito de Newmark no contexto do método de estabilização α-Generalizado para a estrutura, com acoplamento particionado. Numerosas configurações foram testadas com problemas tridimensionais utilizando elementos tetraédricos e hexaédricos em clusters temporários e permanentes, homogêneos e heterogêneos, com diferentes tamanhos de problemas, diferentes números de computadores e diferentes velocidades de rede. / Analysis and development of distributed memory parallel algorithms for the solution of Solid Mechanics, Fluid Mechanics and Fluid-Structure Interaction problems using the Finite Element Method is the main goal of this work. Two process for mesh partitioning and task division were developed, based in the Stripwise Partitioning and the Recursive Coordinate Bisection Methods, but applied not over the mesh geometry but over the resultant system of equations through a nodal ordering algorithm for system bandwidth minimization. To schedule the communication tasks in scenarios where each processor must exchange data with all others in the cluster, a simple and generic algorithm based in a circular an alternate ordering was developed. The algorithms selected to be parallelized were of iterative types due to their suitability for distributed memory parallelism. Parallel codes were developed for the Conjugate Gradient Method ( for Solid Mechanics analysis), for the explicit one-step scheme of Taylor-Galerkin method (for transonic and supersonic compressible flow analysis), for the two-step explicit scheme of Taylor-Galerkin method (for subsonic incompressible flow analysis) and for a Fluid-Structure Interaction algorithm using a coupling model based on a partitioned scheme. Explicit two-step scheme of Taylor-Galerkin were employed for the fluid and the implicit Newmark algorithm for the structure. Several configurations were tested for three-dimensional problems using tetrahedral and hexahedral elements in uniform and nonuniform clusters and grids, with several sizes of meshes, numbers of computers and network speeds.
445

Integração de bibliotecas científicas de propósito especial em uma plataforma de componentes paralelos / Integration of special purpose scientific libraries on a platform of parallel components

Ferreira, Davi Morais January 2010 (has links)
FERREIRA, Davi Morais. Integração de bibliotecas científicas de propósito especial em uma plataforma de componentes paralelos. 2010. 145 f. : Dissertação (mestrado) - Universidade Federal do Ceará, Centro de Ciências, Departamento de Computação, Fortaleza-CE, 2010. / Submitted by guaracy araujo (guaraa3355@gmail.com) on 2016-06-16T17:50:44Z No. of bitstreams: 1 2010_dis_dmf.pdf: 1977126 bytes, checksum: 8f6276f7e40d8f3dbdca5deb5a0a8447 (MD5) / Approved for entry into archive by guaracy araujo (guaraa3355@gmail.com) on 2016-06-16T17:51:57Z (GMT) No. of bitstreams: 1 2010_dis_dmf.pdf: 1977126 bytes, checksum: 8f6276f7e40d8f3dbdca5deb5a0a8447 (MD5) / Made available in DSpace on 2016-06-16T17:51:57Z (GMT). No. of bitstreams: 1 2010_dis_dmf.pdf: 1977126 bytes, checksum: 8f6276f7e40d8f3dbdca5deb5a0a8447 (MD5) Previous issue date: 2010 / The contribution of traditional scienti c libraries shows to be consolidated in the construction of high-performance applications. However, such an artifact of development possesses some limitations in integration, productivity in large-scale applications, and exibility for changes in the context of the problem. On the other hand, the development technology based on components recently proposed a viable alternative for the architecture of High-Performance Computing (HPC) applications, which has provided a means to overcome these challenges. Thus we see that the scienti c libraries and programming orientated at components are complementary techniques in the improvement of the development process of modern HPC applications. Accordingly, this work aims to propose a systematic method for the integration of scienti c libraries on a platform of parallel components, HPE (Hash Programming Environment), to o er additional advantageous aspects for the use of components and scienti c libraries to developers of parallel programs that implement high-performance applications. The purpose of this work goes beyond the construction of a simple encapsulation of the library in a component; it aims to provide the bene ts in integration, productivity in large-scale applications, and the exibility for changes in the context of a problem in the use of scienti c libraries. As a way to illustrate and validate the method, we have incorporated the libraries of linear systems solvers to HPE, electing three signi cant representatives: PETSc, Hypre, e SuperLU. / A contribuição das tradicionais bibliotecas cientí cas mostra-se consolidada na construção de aplicações de alto desempenho. No entanto, tal artefato de desenvolvimento possui algumas limitações de integração, de produtividade em aplicações de larga escala e de exibilidade para mudanças no contexto do problema. Por outro lado, a tecnologia de desenvolvimento baseada em componentes, recentemente proposta como alternativa viável para a arquitetura de aplicações de Computação de Alto Desempenho (CAD), tem fornecido meios para superar esses desa os. Vemos assim, que as bibliotecas cientí cas e a programação orientada a componentes são técnicas complementares na melhoria do processo de desenvolvimento de aplicações modernas de CAD. Dessa forma, este trabalho tem por objetivo propor um método sistemático para integração de bibliotecas cientí cas sobre a plataforma de componentes paralelos HPE (Hash Programming Environment ), buscando oferecer os aspectos vantajosos complementares do uso de componentes e de bibliotecas cientí cas aos desenvolvedores de programas paralelos que implementam aplicações de alto desempenho. A proposta deste trabalho vai além da construção de um simples encapsulamento da biblioteca em um componente, visa proporcionar ao uso das bibliotecas cientí cas os benefícios de integração, de produtividade em aplicações de larga escala e da exibilidade para mudanças no contexto do problema. Como forma de exempli car e validar o método, temos incorporado bibliotecas de resolução de sistemas lineares ao HPE, elegendo três representantes significativos: PETSc, Hypre e SuperLU.
446

Um ambiente computacional de alto desempenho para cálculo de deslocamento usando correlação de imagens digitais. / A high-performance computing enviroment for displacement using digital image correlation.

Várady Filho, Christiano Augusto Ferrario 04 April 2016 (has links)
This work proposes a high performance computing environment using digital image correlation techniques to determine physical quantities associated with engineering problems. Software ar- chitecture supports this computing environment, integrating several advanced technologies for calculation of displacement and strain fields of structural elements from testing. The method- ology applies the study of concepts, formulations and techniques for image processing, digital image correlation, high performance computing and software architecture. The methodology also includes specific procedures in a single environment for the evaluation of physical quanti- ties. Among the main procedures used in the presented software architecture, one can cite the digital image correlation techniques known as Full-Field and Subset, non-linear optimization methods and two-dimensional interpolations. In addition, high performance computing strate- gies are included into the computing environment to achieve performance speed-ups on evalu- ating the displacement fields using digital image correlation. Comparisons with Scale Invariant Feature Transform and Q4-DIC are also evaluated. Following, the development of a computer prototype has the purpose of validating the presented high performance environment, allowing the calculation of physical quantities in structural elements through the correlation of digital images. Then, submission of case studies into the prototype validates data acquired from tech- nologies built into the prototype, including quantitative analysis of results and measurement of computational time. / Conselho Nacional de Desenvolvimento Científico e Tecnológico / O presente trabalho apresenta um ambiente computacional de alto desempenho que utiliza téc- nicas de correlação de imagens digitais para determinação de grandezas físicas associadas a problemas de engenharia. Sua arquitetura computacional integra diversas tecnologias avança- das para o cálculo dos campos de deslocamentos de elementos estruturais a partir de ensaios. A metodologia utilizada prevê o estudo de conceitos, formulações e técnicas de arquitetura de software, processamento de imagens, correlação de imagens digitais e computação de alto desempenho, além de incorporar procedimentos específicos em um ambiente único para a de- terminação de grandezas físicas. Dentre os principais procedimentos utilizados na arquitetura apresentada, pode-se citar as abordagens locais e globais de correlação de imagens digitais, mé- todos de otimização não-linear e interpolações bidimensionais. Também são incorporadas ao ambiente computacional estratégias de computação de alto desempenho para alcançar ganhos de performance na determinação dos campos de deslocamentos e deformações usando a corre- lação de imagens digitais. Uma comparação com o método Scale Invariant Feature Transform e com o método Q4-DIC de análise de imagens também são realizadas. Um protótipo compu- tacional é desenvolvido com o objetivo de validar o ambiente de alto desempenho apresentado, permitindo o monitoramento de elementos estruturais através da correlação de imagens digitais. Também são realizados estudos de casos que permitem a verificação de tecnologias incorpora- das ao protótipo apresentado, incluindo análises quantitativas de resultados e medição de tempo computacional.
447

Co-scheduling for large-scale applications : memory and resilience / Ordonnancement concurrent d’applications à grande échelle : mémoire et résilience

Pottier, Loïc 18 September 2018 (has links)
Cette thèse explore les problèmes liés à l'ordonnancement concurrent dans le contexte des applications massivement parallèle, de deux points de vue: le coté mémoire (en particulier la mémoire cache) et le coté tolérance aux fautes.Avec l'avènement récent des architectures dites many-core, tels que les récents processeurs multi-coeurs, le nombre d'unités de traitement augmente de manière importante.Dans ce contexte, les avantages fournis par les techniques d'ordonnancements concurrents ont été démontrés à travers de nombreuses études.L'ordonnancement concurrent, aussi appelé co-ordonnancement, consiste à exécuter les applications de manière concurrente plutôt que les unes après les autres, dans le but d'améliorer le débit global de la plateforme.Mais le partage des ressources peut souvent générer des interférences.Une des solutions pour réduire de manière importante ces interférences est le partitionnement de cache.À travers un modèle théorique, des simulations et des expériences sur une plateforme existante, nous montrons l'utilité et l'importance du co-ordonnancement quand nos stratégies de partitionnement de cache sont utilisées.De plus, avec ce nombre croissant de processeurs, la probabilité d'une panne augmente également.L'efficacité des techniques de co-ordonnancement a été démontrée dans un contexte sans pannes, mais les plateformes massivement parallèles sont confrontées à des pannes fréquentes, et des techniques de tolérance aux fautes doivent être mise en place pour améliorer l'efficacité de ces plateformes.Nous étudions la complexité du problème avec un modèle théorique, nous concevons des heuristiques et nous effectuons un ensemble complet de simulations avec un simulateur de pannes, qui démontre l'efficacité des heuristiques proposées. / This thesis explores co-scheduling problems in the context of large-scale applications with two main focus: the memory side, in particular the cache memory and the resilience side.With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units is increasing.In this context, the benefits of co-scheduling techniques have been demonstrated. Recall that, the main idea behind co-scheduling is to execute applications concurrently rather than in sequence in order to improve the global throughput of the platform.But sharing resources often generates interferences.With the arising number of processing units accessing to the same last-level cache, those interferences among co-scheduled applications becomes critical.In addition, with that increasing number of processors the probability of a failure increases too.Resiliency aspects must be taking into account, specially for co-scheduling because failure-prone resources might be shared between applications.On the memory side, we focus on the interferences in the last-level cache, one solution used to reduce these interferences is the cache partitioning.Extensive simulations demonstrate the usefulness of co-scheduling when our efficient cache partitioning strategies are deployed.We also investigate the same problem on a real cache partitioned chip multiprocessors, using the Cache Allocation Technology recently provided by Intel.In a second time, still on the memory side, we study how to model and schedule task graphs on the new many-core architectures, such as Knights Landing architecture.These architectures offer a new level in the memory hierarchy through a new on-packagehigh-bandwidth memory. Current approaches usually do not take intoaccount this new memory level, however new scheduling algorithms anddata partitioning schemes are needed to take advantage of this deepmemory hierarchy.On the resilience, we explore the impact on failures on co-scheduling performance.The co-scheduling approach has been demonstrated in a fault-free context, but large-scale computer systems are confronted by frequent failures, and resilience techniques must be employed for large applications to execute efficiently. Indeed, failures may create severe imbalance between applications, and significantly degrade performance.We aim at minimizing the expected completion time of a set of co-scheduled applications in a failure-prone context by redistributing processors.
448

Towards brain-scale modelling of the human cerebral blood flow : hybrid approach and high performance computing / Vers une modélisation de l’écoulement sanguin cérébral humain à l’échelle du cerveau : approche hybride et calcul haute performance

Peyrounette, Myriam 25 October 2017 (has links)
La microcirculation cérébrale joue un rôle clé dans la physiologie cérébrale. Lors de maladies dégénératives comme celle d’Alzheimer, la détérioration des réseaux microvasculaires (e.g. occlusions et baisse de densité vasculaires) limite l’afflux sanguin vers le cortex. La réduction associée de l’apport en oxygène et nutriments risque de provoquer la mort de neurones. En complément des techniques d’imagerie médicale, la modélisation est un outil précieux pour comprendre l’impact de telles variations structurelles sur l’écoulement sanguin et les transferts de masse. Dans la microcirculation cérébrale, le lit capillaire contient les plus petits vaisseaux (diamètre de 1-10 μm) et présente une structure maillée, au sein du tissu cérébral. C’est le lieu principal des échanges moléculaires entre le sang et les neurones. Le lit capillaire est alimenté et drainé par les arbres artériolaires et veinulaires (diamètre de 10-100 μm). Depuis quelques décennies, les approches “réseau” ont significativement amélioré notre compréhension de l’écoulement sanguin, du transport de masse et des mécanismes de régulation dans la microcirculation cérébrale humaine. Cependant, d’un point de vue numérique, la densité des capillaires limite ces approches à des volumes relativement petits (<100 mm3). Cette contrainte empêche leur application à des échelles cliniques, puisque les techniques d’imagerie médicale permettent d’acquérir des volumes bien plus importants (∼100 cm3), avec une résolution de 1-10 mm. Pour réduire ce coût numérique, nous présentons une approche hybride pour la modélisation de l’écoulement dans laquelle les capillaires sont remplacés par un milieu continu. Cette substitution a du sens puisque le lit capillaire est dense et homogène à partir d’une longueur de coupure de ∼50 μm. Dans ce continuum, l’écoulement est caractérisé par des propriétés effectives (e.g. perméabilité) à l’échelle d’un volume représentatif plus grand. De plus, le continuum est discrétisé par la méthode des volumes finis sur un maillage grossier, ce qui induit un gain numérique important. Les arbres artério- et veinulaires ne peuvent être homogénéisés à cause de leur structure quasi-fractale. Nous appliquons donc une approche “réseau” standard dans les vaisseaux les plus larges. La principale difficulté de l’approche hybride est de développer un modèle de couplage aux points où les vaisseaux artério- et veinulaires sont connectés au continuum. En effet, de forts gradients de pression apparaissent à proximité de ces points, et doivent être homogénéisés proprement à l’échelle du continuum. Ce genre de couplage multi-échelle n’a jamais été introduit dans le contexte de la microcirculation cérébrale. Nous nous inspirons ici du "modèle de puits" développé par Peaceman pour l’ingénierie pétrolière, en utilisant des solutions analytiques du champ des pressions dans le voisinage des points de couplage. Les équations obtenues forment un unique système linéaire à résoudre pour l’ensemble du domaine d’étude. Nous validons l’approche hybride par comparaison avec une approche “réseau” classique, pour des architectures synthétiques simples qui n’impliquent qu’un ou deux couplages, et pour des structures plus complexes qui impliquent des arbres artério- et veinulaires anatomiques avec un grand nombre de couplages. Nous montrons que cette approche est fiable, puisque les erreurs relatives en pression sont faibles (<6 %). Cela ouvre la voie à une complexification du modèle (e.g. hématocrite non uniforme). Dans une perspective de simulations à grande échelle et d’extension au transport de masse, l’approche hybride a été implémentée dans un code C++ conçu pour le calcul haute performance. Ce code a été entièrement parallélisé en utilisant les standards MPI et des librairies spécialisées (e.g. PETSc). Ce travail faisant partie d’un projet plus large impliquant plusieurs collaborateurs, une attention particulière a été portée à l’établissement de stratégies d’implémentation efficaces. / The brain microcirculation plays a key role in cerebral physiology and neuronal activation. In the case of degenerative diseases such as Alzheimer’s, severe deterioration of the microvascular networks (e.g. vascular occlusions) limit blood flow, thus oxygen and nutrients supply, to the cortex, eventually resulting in neurons death. In addition to functional neuroimaging, modelling is a valuable tool to investigate the impact of structural variations of the microvasculature on blood flow and mass transfers. In the brain microcirculation, the capillary bed contains the smallest vessels (1-10 μm in diameter) and presents a mesh-like structure embedded in the cerebral tissue. This is the main place of molecular exchange between blood and neurons. The capillary bed is fed and drained by larger arteriolar and venular tree-like vessels (10-100 μm in diameter). For the last decades, standard network approaches have significantly advanced our understanding of blood flow, mass transport and regulation mechanisms in the human brain microcirculation. By averaging flow equations over the vascular cross-sections, such approaches yield a one-dimensional model that involves much fewer variables compared to a full three-dimensional resolution of the flow. However, because of the high density of capillaries, such approaches are still computationally limited to relatively small volumes (<100 mm3). This constraint prevents applications at clinically relevant scales, since standard imaging techniques only yield much larger volumes (∼100 cm3), with a resolution of 1-10 mm3. To get around this computational cost, we present a hybrid approach for blood flow modelling where the capillaries are replaced by a continuous medium. This substitution makes sense since the capillary bed is dense and space-filling over a cut-off length of ∼50 μm. In this continuum, blood flow is characterized by effective properties (e.g. permeability) at the scale of a much larger representative volume. Furthermore, the domain is discretized on a coarse grid using the finite volume method, inducing an important computational gain. The arteriolar and venular trees cannot be homogenized because of their quasi-fractal structure, thus the network approach is used to model blood flow in the larger vessels. The main difficulty of the hybrid approach is to develop a proper coupling model at the points where arteriolar or venular vessels are connected to the continuum. Indeed, high pressure gradients build up at capillary-scale in the vicinity of the coupling points, and must be properly described at the continuum-scale. Such multiscale coupling has never been discussed in the context of brain microcirculation. Taking inspiration from the Peaceman “well model” developed for petroleum engineering, our coupling model relies on to use analytical solutions of the pressure field in the neighbourhood of the coupling points. The resulting equations yield a single linear system to solve for both the network part and the continuum (strong coupling). The accuracy of the hybrid model is evaluated by comparison with a classical network approach, for both very simple synthetic architectures involving no more than two couplings, and more complex ones, with anatomical arteriolar and venular trees displaying a large number of couplings. We show that the present approach is very accurate, since relative pressure errors are lower than 6 %. This lays the goundwork for introducing additional levels of complexity in the future (e.g. non uniform hematocrit). In the perspective of large-scale simulations and extension to mass transport, the hybrid approach has been implemented in a C++ code designed for High Performance Computing. It has been fully parallelized using Message Passing Interface standards and specialized libraries (e.g. PETSc). Since the present work is part of a larger project involving several collaborators, special care has been taken in developing efficient coding strategies.
449

A study on block flexible iterative solvers with applications to Earth imaging problem in geophysics / Étude de méthodes itératives par bloc avec application à l’imagerie sismique en géophysique

Ferreira Lago, Rafael 13 June 2013 (has links)
Les travaux de ce doctorat concernent le développement de méthodes itératives pour la résolution de systèmes linéaires creux de grande taille comportant de nombreux seconds membres. L’application visée est la résolution d’un problème inverse en géophysique visant à reconstruire la vitesse de propagation des ondes dans le sous-sol terrestre. Lorsque de nombreuses sources émettrices sont utilisées, ce problème inverse nécessite la résolution de systèmes linéaires complexes non symétriques non hermitiens comportant des milliers de seconds membres. Dans le cas tridimensionnel ces systèmes linéaires sont reconnus comme difficiles à résoudre plus particulièrement lorsque des fréquences élevées sont considérées. Le principal objectif de cette thèse est donc d’étendre les développements existants concernant les méthodes de Krylov par bloc. Nous étudions plus particulièrement les techniques de déflation dans le cas multiples seconds membres et recyclage de sous-espace dans le cas simple second membre. Des gains substantiels sont obtenus en terme de temps de calcul par rapport aux méthodes existantes sur des applications réalistes dans un environnement parallèle distribué. / This PhD thesis concerns the development of flexible Krylov subspace iterative solvers for the solution of large sparse linear systems of equations with multiple right-hand sides. Our target application is the solution of the acoustic full waveform inversion problem in geophysics associated with the phenomena of wave propagation through an heterogeneous model simulating the subsurface of Earth. When multiple wave sources are being used, this problem gives raise to large sparse complex non-Hermitian and nonsymmetric linear systems with thousands of right-hand sides. Specially in the three-dimensional case and at high frequencies, this problem is known to be difficult. The purpose of this thesis is to develop a flexible block Krylov iterative method which extends and improves techniques already available in the current literature to the multiple right-hand sides scenario. We exploit the relations between each right-hand side to accelerate the convergence of the overall iterative method. We study both block deflation and single right-hand side subspace recycling techniques obtaining substantial gains in terms of computational time when compared to other strategies published in the literature, on realistic applications performed in a parallel environment.
450

Efficient large electromagnetic simulation based on hybrid TLM and modal approach on grid computing and supercomputer / Parallélisation, déploiement et adaptation automatique de la simulation électromagnétique sur une grille de calcul

Alexandru, Mihai 14 December 2012 (has links)
Dans le contexte des Sciences de l’Information et de la Technologie, un des challenges est de créer des systèmes de plus en plus petits embarquant de plus en plus d’intelligence au niveau matériel et logiciel avec des architectures communicantes de plus en plus complexes. Ceci nécessite des méthodologies robustes de conception afin de réduire le cycle de développement et la phase de prototypage. Ainsi, la conception et l’optimisation de la couche physique de communication est primordiale. La complexité de ces systèmes rend difficile leur optimisation notamment à cause de l’explosion du nombre des paramètres inconnus. Les méthodes et outils développés ces dernières années seront à terme inadéquats pour traiter les problèmes qui nous attendent. Par exemple, la propagation des ondes dans une cabine d’avion à partir des capteurs ou même d’une antenne, vers le poste de pilotage est grandement affectée par la présence de la structure métallique des sièges à l’intérieur de la cabine, voir les passagers. Il faut, donc, absolument prendre en compte cette perturbation pour prédire correctement le bilan de puissance entre l’antenne et un possible récepteur. Ces travaux de recherche portent sur les aspects théoriques et de mise en oeuvre pratique afin de proposer des outils informatiques pour le calcul rigoureux de la réflexion des champs électromagnétiques à l’intérieur de très grandes structures . Ce calcul implique la solution numérique de très grands systèmes inaccessibles par des ressources traditionnelles. La solution sera basée sur une grille de calcul et un supercalculateur. La modélisation électromagnétique des structures surdimensionnées par plusieurs méthodes numériques utilisant des nouvelles ressources informatiques, hardware et software, pour dérouler des calculs performants, représente le but de ce travail. La modélisation numérique est basée sur une approche hybride qui combine la méthode Transmission-Line Matrix (TLM) et l’approche modale. La TLM est appliquée aux volumes homogènes, tandis que l’approche modale est utilisée pour décrire les structures planaires complexes. Afin d’accélérer la simulation, une implémentation parallèle de l’algorithme TLM dans le contexte du paradigme de calcul distribué est proposé. Le sous-domaine de la structure qui est discrétisé avec la TLM est divisé en plusieurs parties appelées tâches, chacune étant calculée en parallèle par des processeurs différents. Pour accomplir le travail, les tâches communiquent entre elles au cours de la simulation par une librairie d’échange de messages. Une extension de l’approche modale avec plusieurs modes différents a été développée par l’augmentation de la complexité des structures planaires. Les résultats démontrent les avantages de la grille de calcul combinée avec l’approche hybride pour résoudre des grandes structures électriques, en faisant correspondre la taille du problème avec le nombre de ressources de calcul utilisées. L’étude met en évidence le rôle du schéma de parallélisation, cluster versus grille, par rapport à la taille du problème et à sa répartition. En outre, un modèle de prédiction a été développé pour déterminer les performances du calcul sur la grille, basé sur une approche hybride qui combine une prédiction issue d’un historique d’expériences avec une prédiction dérivée du profil de l’application. Les valeurs prédites sont en bon accord avec les valeurs mesurées. L’analyse des performances de simulation a permis d’extraire des règles pratiques pour l’estimation des ressources nécessaires pour un problème donné. En utilisant tous ces outils, la propagation du champ électromagnétique à l’intérieur d’une structure surdimensionnée complexe, telle qu’une cabine d’avion, a été effectuée sur la grille et également sur le supercalculateur. Les avantages et les inconvénients des deux environnements sont discutés. / In the context of Information Communications Technology (ICT), the major challenge is to create systems increasingly small, boarding more and more intelligence, hardware and software, including complex communicating architectures. This requires robust design methodologies to reduce the development cycle and prototyping phase. Thus, the design and optimization of physical layer communication is paramount. The complexity of these systems makes them difficult to optimize, because of the explosion in the number of unknown parameters. The methods and tools developed in past years will be eventually inadequate to address problems that lie ahead. Communicating objects will be very often integrated into cluttered environments with all kinds of metal structures and dielectric larger or smaller sizes compared to the wavelength. The designer must anticipate the presence of such barriers in the propagation channel to establish properly link budgets and an optimal design of the communicating object. For example, the wave propagation in an airplane cabin from sensors or even an antenna, towards the cockpit is greatly affected by the presence of the metal structure of the seats inside the cabin or even the passengers. So, we must absolutely take into account this perturbation to predict correctly the power balance between the antenna and a possible receiver. More generally, this topic will address the theoretical and computational electromagnetics in order to propose an implementation of informatics tools for the rigorous calculation of electromagnetic scattering inside very large structures or radiation antenna placed near oversized objects. This calculation involves the numerical solution of very large systems inaccessible by traditional resources. The solution will be based on grid computing and supercomputers. Electromagnetic modeling of oversized structures by means of different numerical methods, using new resources (hardware and software) to realize yet more performant calculations, is the aim of this work. The numerical modeling is based on a hybrid approach which combines Transmission-Line Matrix (TLM) and the mode matching methods. The former is applied to homogeneous volumes while the latter is used to describe complex planar structures. In order to accelerate the simulation, a parallel implementation of the TLM algorithm in the context of distributed computing paradigm is proposed. The subdomain of the structure which is discretized upon TLM is divided into several parts called tasks, each one being computed in parallel by different processors. To achieve this, the tasks communicate between them during the simulation by a message passing library. An extension of the modal approach to various modes has been developped by increasing the complexity of the planar structures. The results prove the benefits of the combined grid computing and hybrid approach to solve electrically large structures, by matching the size of the problem with the number of computing resources used. The study highlights the role of parallelization scheme, cluster versus grid, with respect to the size of the problem and its repartition. Moreover, a prediction model for the computing performances on grid, based on a hybrid approach that combines a historic-based prediction and an application profile-based prediction, has been developped. The predicted values are in good agreement with the measured values. The analysis of the simulation performances has allowed to extract practical rules for the estimation of the required resources for a given problem. Using all these tools, the propagation of the electromagnetic field inside a complex oversized structure such an airplane cabin, has been performed on grid and also on a supercomputer. The advantages and disadvantages of the two environments are discussed.

Page generated in 0.0998 seconds