Global ETD Search

291	Paralelismo em visão natural e artificial / Paralelism in natural and artificial Bruno, Odemir Martinez 16 June 2000 (has links) Nesta tese são abordados, de maneira integrada, aspectos de paralelismo em visão natural e artificial, com discussões críticas das diversas áreas relacionadas. O paralelismo é discutido no sistema visual dos primatas, assim como suas principais contribuições e motivações incentivando a incorporação de paralelismo em sistemas de visão artificial. Um dos objetivos principais é fornecer as bases de paralelismo para o desenvolvimento do projeto Cyvis-1, uma proposta do Grupo de Pesquisa em Visão Cibernética (IFSC-USP) para visão versátil, com forte motivação biológica e baseada no córtex visual dos primatas. Para tanto, foi introduzida e implementada a proposta CVMP (Cybernetic Vision Message Passage), um conjunto de ferramentas para o desenvolvimento de aplicações paralelas em visão, tanto para sistemas distribuídos como para máquinas multiprocessadores. Baseada em programação orientada a objetos, interação homem-máquina, engenharia de software e programação visual, a proposta prima pelo desenvolvimento de forma simples e amigável. O CVMP é testado, avaliado e validado quanto a aspectos de funcionalidade e utilização, através da implementação paralela de diversos algoritmos de visão computacional e de processamento de imagens (operadores locais, transformada de Hough e transformada de Fourier, entre outros) os quais, além de ilustrar a utilização da ferramenta, são discutidos em termos de arquitetura e balanceamento de carga. São apresentadas três aplicações reais de sistemas paralelos de visão computacional, implementadas através do CVMP, demonstrando a eficiência da ferramenta, na implementação paralela, na utilização e cooperação de trabalho. Duas destas aplicações (integração de atributos visuais no projeto Cyvis-1 e um modelo de complexidade com base na percepção humana), foram desenvolvidas em conjunto com outros pesquisadores do Grupo de Pesquisa em Visão Cibernética. A terceira aplicação apresenta uma proposta do autor para um sistema automático de reconhecimento de plantas arbóreas (Botânica) / This thesis addresses, in an integrated way, the concept and usage of parallelism in natural and artificial vision. It starts by revising the primate visual system, and discussing how its principles and solutions can be extended to computational systems. One of the main objectives is to supply the parallelism backbone for the development of the Cyvis-1 System, which is a proposal of the Cybernetic Vision Research Group (IFSC-USP) for versatile vision, presenting a strong biological motivation, especially regarding the primate visual cortex. In order to achieve these objectives, the CVMP - Cybernetic Vision Message Passage - had to be developed, representing a set of simple and friendly parallel tools for computer vision applications in distributed and parallel (multiprocessor) systems, which is based on object oriented programming, human-machine interaction, software engineering and visual programming. The CVMP is tested, evaluated and validated with respect to functionality and utilization through the parallel implementation of several algorithms in computer vision and image processing (local operators, Hough transform, Fourier transform, etc.) which, in addition to illustrating the tools, are also discussed as far as their architecture and load balancing is concerned. Three applications of parallel computer vision systems to real situations are presented and implemented by using CVMP, corroborating the effectiveness of the tools in the parallel implementation, usage, and researcher integration. Two such applications (visual attributes integration in Cyvis-1 and a human complexity model) have been developed in collaboration with other researchers at the Cybernetic Vision Research Group. The third application presents the author\'s proposal for an automated system for arboreal plants recognition (Botany) Artificial vision Computação paralela Cybernetic vision Distributed systems Natural vision Parallel computing Sistemas distribuidos Visão artificial Visão cibernética Visão natural
292	Estudo de escalabilidade de servidores baseados em eventos em sitemas multiprocessados: um estudo de caso completo\" / Scalability study of event-driven servers in multi-processed systems: a complete case study Cordeiro, Daniel de Angelis 27 October 2006 (has links) O crescimento explosivo no número de usuários de Internet levou arquitetos de software a reavaliarem questões relacionadas à escalabilidade de serviços que são disponibilizados em larga escala. Projetar arquiteturas de software que não apresentem degradação no desempenho com o aumento no número de acessos concorrentes ainda é um desafio. Neste trabalho, investigamos o impacto do sistema operacional em questões relacionadas ao desempenho, paralelização e escalabilidade de jogos interativos multi-usuários. Em particular, estudamos e estendemos o jogo interativo, multi-usuário, QuakeWorld, disponibilizado publicamente pela id Software sob a licença GPL. Criamos um modelo de paralelismo para a simulação distribuída realizada pelo jogo e o implementamos no servidor do QuakeWorld com adaptações que permitem que o sistema operacional gerencie de forma adequada a execução da carga de trabalho gerada. / The explosive growth in the number of Internet users made software architects reevaluate issues related to the scalability of services deployed on a large scale. It is still challenging to design software architectures that do not experience performance degradation when the concurrent access increases. In this work, we investigate the impact of the operating system in issues related to performance, parallelization, and scalability of interactive multiplayer games. Particularly, we study and extend the interactive, multiplayer game QuakeWorld, made publicly available by id Software under GPL license. We have created a new parallelization model for Quake\'s distributed simulation and implemented that model in QuakeWorld server with adaptations that allows the operating system to manage the execution of the generated workload in a more convenient way. computação paralela computer games escalabilidade event-driven programming jogos de computadores parallel computing programação baseada em eventos QuakeWorld QuakeWorld scalability
293	Paralelização do cálculo de estruturas de bandas de semicondutores usando o High Performance Fortran / Semiconductors band structure calculus paralelization using High Performance Fortran Malara, Rodrigo Daniel 14 January 2005 (has links) O uso de sistemas multiprocessados para a resolução de problemas que demandam um grande poder computacional tem se tornado cada vez mais comum. Porém a conversão de programas seqüenciais para programas concorrentes ainda não é uma tarefa trivial. Dentre os fatores que tornam esta tarefa difícil, destacamos a inexistência de um paradigma único e consolidado para a construção de sistemas computacionais paralelos e a existência de várias plataformas de programação para o desenvolvimento de programas concorrentes. Nos dias atuais ainda é impossível isentar o programador da especificação de como o problema será particionado entre os vários processadores. Para que o programa paralelo seja eficiente, o programador deve conhecer a fundo aspectos que norteiam a construção do hardware computacional paralelo, aspectos inerentes à arquitetura onde o software será executado e à plataforma de programação concorrente escolhida. Isto ainda não pode ser mudado. O ganho que podemos obter é na implementação do software paralelo. Esta tarefa pode ser trabalhosa e demandar muito tempo para a depuração, pois as plataformas de programação não possibilitam que o programador abstraia dos elementos de hardware. Tem havido um grande esforço na criação de ferramentas que otimizem esta tarefa, permitindo que o programador se expresse mais fácil e sucintamente quanto à para1elização do programa. O presente trabalho se baseia na avaliação dos aspectos ligados à implementação de software concorrente utilizando uma plataforma de portabilidade chamada High Performance Fortran, aplicado a um problema específico da física: o cálculo da estrutura de bandas de heteroestruturas semicondutoras. O resultado da utilização desta plataforma foi positivo. Obtivemos um ganho de performance superior ao esperado e verificamos que o compilador pode ser ainda mais eficiente do que o próprio programador na paralelização de um programa. O custo inicial de desenvolvimento não foi muito alto, e pode ser diluído entre os futuros projetos que venham a utilizar deste conhecimento pois após a fase de aprendizado, a paralelização de programas se torna rápida e prática. A plataforma de paralelização escolhida não permite a paralelização de todos os tipos de problemas, apenas daqueles que seguem o paradigma de paralelismo por dados, que representam uma parcela considerável dos problemas típicos da Física. / The employment of multiprocessor systems to solve problems that demand a great computational power have become more and more usual. Besides, the conversion of sequential programs to concurrent ones isn\'t trivial yet. Among the factors that makes this task difficult, we highlight the nonexistence of a unique and consolidated paradigm for the parallel computer systems building and the existence of various programming platforms for concurrent programs development. Nowadays it is still impossible to exempt the programmer of the specification about how the problem will be partitioned among the various processors. In order to have an efficient parallel program the programmer have to deeply know subjects that heads the parallel hardware systems building, the inherent architecture where the software will run and the chosen concurrent programming platform. This cannot be changed yet. The gain is supposed to be on the parallel software implementation. This task can be very hard and consume so much time on debugging it, because the programming platforms do not allow the programmer to abstract from the hardware elements. It has been a great effort in the development of tools that optimize this task, allowing the programmer to work easily and briefly express himself concerning the software parallelization. The present work is based on the evaluation of aspects linked to the concurrent software implementation using a portability platform called High Performance Fortran, applied to a physics specific problem: the calculus of semiconductor heterostructures? valence band structure. The result of the use of this platform use was positive. We obtained a performance gain superior than we expected and we could assert that the compiler is able to be more effective than the programmer on the paralelization of a program. The initial development cost wasn\'t so high and it can be diluted between the next projects that would use the acquired knowledge, because after the learning phase, the programs parallelization task becomes quick and practical. The chosen parallelization platform does not allow the parallelization of all kinds of problems, but just the ones that follow the data parallelism paradigm that represents a considerable parcel of tipical Physics problems. Cálculo autoconsistente Computação paralela Computacional simulation k.p. Method Método k.p. Parallel computing Self-consistent calculation Simulações computacionais
294	Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots / GPU-based Semi-Infinite Optimization for Whole-Body Robot Control Chrétien, Benjamin 08 July 2016 (has links) Un robot humanoïde est un système complexe doté de nombreux degrés de liberté, et dont le comportement est sujet aux équations non linéaires du mouvement. Par conséquent, la planification de mouvement pour un tel système est une tâche difficile d'un point de vue calculatoire. Dans ce mémoire, nous avons pour objectif de développer une méthode permettant d'utiliser la puissance de calcul des GPUs dans le contexte de la planification de mouvement corps-complet basée sur de l'optimisation. Nous montrons dans un premier temps les propriétés du problème d'optimisation, et des pistes d'étude pour la parallélisation de ce dernier. Ensuite, nous présentons notre approche du calcul de la dynamique, adaptée aux architectures de calcul parallèle. Cela nous permet de proposer une implémentation de notre problème de planification de mouvement sur GPU: contraintes et gradients sont calculés en parallèle, tandis que la résolution du problème même se déroule sur le CPU. Nous proposons en outre une nouvelle paramétrisation des forces de contact adaptée à notre problème d'optimisation. Enfin, nous étudions l'extension de notre travail au contrôle prédictif. / A humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control. Planification de mouvements Robotique humanoïde Calcul parallèle Gpgpu Optimisation non linéaire Motion planning Humanoid robotics Parallel computing Gpgpu Nonlinear optimization
295	Avaliação do uso de computação paralela utilizando uma rede P2P na simulação de dados climáticos: velocidade e direção do vento / Avaliação do uso de computação paralela utilizando uma rede P2P na simulação de dados climáticos: velocidade e direção do vento Serckumecka, Adriano 17 July 2012 (has links) Made available in DSpace on 2017-07-21T14:19:32Z (GMT). No. of bitstreams: 1 AdrianoSerckumecka.pdf: 825262 bytes, checksum: 9c527c3f506be1e58cf82ac8117dac67 (MD5) Previous issue date: 2012-07-17 / The main objective of this work is the evaluation of distributed systems based on peer-topeer network and parallel computing techniques to reduce response times of climate simulations. A probabilistic model for simulating wind data was used and two computing applications was implemented and evaluated. The first application was developed by using the framework P2PComp, which is directed to building parallel software and its deploy in P2P networks. The second application was developed without the support of this framework and using straightly the communication infrastructure software. Climatic data of the municipality of Lapa, Paraná comprising the period between 1998 and 2007 were used in the experiments. The results demonstrate the feasibility of using parallel computing and P2P networks for executing climate simulations. Best results, measured by the speedup obtained, were observed when using multiple peers (equal to 26) and when executing simulations for a long period of time ( near to 100 years). In such case, the speedup obtained is near to 7. / O objetivo principal deste trabalho é a avaliação de sistemas distribuídos baseados em redespar-a-par (P2P), juntamente com técnicas de computação paralela, para reduzir os tempos deresposta de simulações climáticas. Um modelo probabilístico específico para simulação de dados de vento foi adotado e duas aplicações computacionais foram implementadas e avaliadas.A primeira aplicação foi desenvolvida com o uso do Framework P2PComp, que permite a criação de programas paralelos e sua execução em redes P2P. A segunda aplicação adotou diretamente uma rede P2P, sem o uso desse framework, como infraestrutura de comunicação.Dados climáticos da região do Município de Lapa-PR, compreendendo o períıodo entre 1998 e 2007 foram empregados nos experimentos. Os resultados obtidos demonstram a viabilidade de utilização de computação paralela em redes P2P, nas simulações climáticas. Os melhores resultados medidos em relação ao fator de aceleração foram observados em situações onde foi utilizado um número maior de pares (igual a 26) e simulações climáticas para um período de tempo igual a 100 anos. Para este caso, o fator de aceleração obtido foi aproximadamente igual a 7. P2P vento parallel computing climate simulation P2P wind
296	Desenvolvimento e otimização de um código paralelizado para simulação de escoamentos incompressíveis / Development and optimization of a parallel code for the simulation of incompressible flows Rogenski, Josuel Kruppa 06 April 2011 (has links) O presente trabalho de pesquisa tem por objetivo estudar a paralelização de algoritmos voltados à solução de equações diferenciais parciais. Esses algoritmos são utilizados para gerar a solução numérica das equações de Navier-Stokes em um escoamento bidimensional incompressível de um fluido newtoniano. As derivadas espaciais são calculadas através de um método de diferenças finitas compactas com a utilização de aproximações de altas ordens de precisão. Uma vez que o cálculo de derivadas espaciais com alta ordem de precisão da forma compacta adotado no presente estudo requer a solução de sistemas lineares tridiagonais, é importante realizar estudos voltados a resolução desses sistemas, para se obter uma boa performance. Ressalta-se ainda que a solução de sistemas lineares também faz-se presente na solução numérica da equação de Poisson. Os resultados obtidos decorrentes da solução das equações diferenciais parciais são comparados com os resultados onde se conhece a solução analítica, de forma a verificar a precisão dos métodos implementados. Os resultados do código voltado à resolução das equações de Navier-Stokes paralelizado para simulação de escoamentos incompressíveis são comparados com resultados da teoria de estabilidade linear, para validação do código final. Verifica-se a performance e o speedup do código em questão, comparando-se o tempo total gasto em função do número de elementos de processamento utilizados / The objective of the present work is to study the parallelization of partial differential equations. The aim is to achieve an effective parallelization to generate numerical solution of Navier-Stokes equations in a two-dimensional incompressible and isothermal flow of a Newtonian fluid. The spatial derivatives are calculated using compact finite differences approximations of higher order accuracy. Since the calculation of spatial derivatives with high order adopted in the present work requires the solution of tridiagonal systems, it is important to conduct studies to solve these systems and achieve good performance. In addiction, linear systems solution is also present in the numerical solution of a Poisson equation. The results generated by the solution of partial differential equations are compared to analytical solution, in order to verify the accuracy of the implemented methods. The numerical parallel solution of a Navier-Stokes equations is compared with linear stability theory to validate the final code. The performance and the speedup of the code in question is also checked, comparing the execution time in function of the number of processing elements Compact finite differences Computação paralela Diferenças finitas compactas Equações de Navier-Stokes Métodos multigrid Multigrid methods Navier-Stokes equations Parallel computing
297	Modélisation par la méthode Lattice Boltzmann de la diffusion de chaleur et d’humidité dans des matériaux biosourcés à partir de leur morphologie 3D / Heat and moisture diffusion in bio-based materials from their 3D morphology using Lattice Boltzmann method Louërat, Mathilde 19 January 2017 (has links) Avec la performance thermique croissante des bâtiments, les codes de simulation utilisés en conception requièrent des données de plus en plus précises sur les matériaux de construction. De plus, l’utilisation de matériaux biosourcés qui sont hygroscopiques (leur teneur en eau s’équilibre avec l’air humide ambiant) est en pleine expansion. Leur conductivité thermique et leur diffusivité massique doivent ainsi être caractérisées précisément. Un facteur essentiel affectant ces propriétés est la microstructure des matériaux. Ce travail de thèse propose de prédire les propriétés macroscopiques d’épicéa et de panneaux de fibres de bois (matériaux hétérogènes et anisotropes) à partir de leur morphologie réelle 3D. Celle-ci est obtenue par micro-tomographie synchrotron aux rayons X, outil très performant pour caractériser la structure interne d’un matériau de façon non destructive. Un traitement d’images permet de segmenter les phases solide et gazeuse. La méthode numérique choisie pour modéliser la diffusion de chaleur et de masse est la méthode Lattice Boltzmann car elle est simple à implémenter et à paralléliser et qu’elle peut facilement traiter des morphologies complexes. Les conductivités thermiques et diffusivités massiques équivalentes sont calculées dans trois directions orthogonales pour chaque matériau. Les résultats mettent en évidence l’influence de la structure interne et la forte anisotropie des matériaux étudiés (rapport 2 entre les directions tangentielle et longitudinale du bois en thermique et 30 en massique). La conductivité thermique transversale du panneau léger est de 0,04 W m−1 K−1. / As thermal performance of buildings is increasing, the simulation codes used during design require more accurate construction material data. Moreover, the use of bio-based materials which are hygroscopic (their moisture content balances with the ambient moist air) is booming. Their thermal conductivity and mass diffusivity must therefore be accurately characterized. A key factor affecting these properties is the microstructure of the materials. This work is dedicated to the prediction of macroscopic properties of spruce and fibreboards (heterogeneous and anisotropic materials) from their real 3D morphology. This is obtained by synchrotron X-ray microtomography, a powerful and nondestructive technique to characterize the internal structure of materials. Image processing allows the segmentation of the solid and gaseous phases. To model heat and mass diffusion, we choose the Lattice Boltzmann method because of its simple numerical development, suitability for parallel computing and easy processing of complex morphologies. The equivalent thermal conductivity and mass diffusivity are calculated in three orthogonal directions for each material. The results highlight the influence of the internal structure and the strong anisotropy of the materials studied (ratio of 2 between tangential and longitudinal directions of wood for heat diffusion and of 30 for mass diffusion). The transverse thermal conductivity of the lightweight board is about 0,04 W m−1 K−1. Bois Panneau de fibres Micro-Tomographie X Anisotropie Calcul parallèle Wood Fibreboard X-Ray microtomography Anisotropy Parallel computing
298	Análise de execução de aplicações paralelas em grades móveis com restrições de processamento e bateria / Analysis of the execution of parallel applications using a mobile grid environment Santos, Frederico Cassis Ribeiro 10 March 2016 (has links) Existem atualmente diversas propostas para integração de dispositivos móveis em uma grade computacional, porém vários problemas são observados em tais ambientes. Esta dissertação mantém o foco em um problema, a restrição sobre a quantidade de energia despendida na execução das aplicações, ao utilizar esses dispositivos móveis como provedores de recursos em uma grade computacional que fornece processamento para aplicações paralelas. Para tanto, este trabalho propõe um método para estimar o consumo de energia das aplicações considerando que elas utilizam um determinado conjunto de operações as quais estão presentes na grande maioria das aplicações paralelas (operações matemáticas e alocação de memória). Com base no método proposto, dois dispositivos móveis foram estudados e foi criada uma representação do consumo de energia utilizando-se de métodos de regressão. Para validar os modelos, duas aplicações foram analisadas e o consumo de energia real foi comparado ao consumo estimado. O modelo criado apresentou resultados próximos ao medido, mostrando um aumento entre 6% e 14,24% em relação ao resultado medido. / Nowadays, there are different proposals to integrate mobile devices in a computational grid, although several problems are introduces. This dissertation focus on the energy limitation problem when using mobile devices to provide resources, such as processing power to run parallel applications. It also proposes a method to estimate energy consumption for a task that needs to be executed in this environment. To achieve this goal two mobile devices were used as a test case and a representation of its energy consumption was created running benchmarks and using regression techniques. To validate the model created, two applications were executed and had the measured values compared to the estimated ones. The estimation showed a raise between 6 and 14.24 percent. Computação paralela Dispositivos móveis Energy consumption estimation Estimativa do consumo de energia Grade computacional móvel Mobile computational grid Mobile devices Parallel computing
299	Fault Tolerance in Linear Algebraic Methods using Erasure Coded Computations Xuejiao Kang (5929862) 16 January 2019 (has links) <p>As parallel and distributed systems scale to hundreds of thousands of cores and beyond, fault tolerance becomes increasingly important -- particularly on systems with limited I/O capacity and bandwidth. Error correcting codes (ECCs) are used in communication systems where errors arise when bits are corrupted silently in a message. Error correcting codes can detect and correct erroneous bits. Erasure codes, an instance of error correcting codes that deal with data erasures, are widely used in storage systems. An erasure code addsredundancy to the data to tolerate erasures. </p> <p><br> </p> <p>In this thesis, erasure coded computations are proposed as a novel approach to dealing with processor faults in parallel and distributed systems. We first give a brief review of traditional fault tolerance methods, error correcting codes, and erasure coded storage. The benefits and challenges of erasure coded computations with respect to coding scheme, fault models and system support are also presented.</p> <p><br> </p> <p>In the first part of my thesis, I demonstrate the novel concept of erasure coded computations for linear system solvers. Erasure coding augments a given problem instance with redundant data. This augmented problem is executed in a fault oblivious manner in a faulty parallel environment. In the event of faults, we show how we can compute the true solution from potentially fault-prone solutions using a computationally inexpensive procedure. The results on diverse linear systems show that our technique has several important advantages: (i) as the hardware platform scales in size and in number of faults, our scheme yields increasing improvement in resource utilization, compared to traditional schemes; (ii) the proposed scheme is easy to code as the core algorithm remains the same; (iii) the general scheme is flexible to accommodate a range of computation and communication trade-offs. </p> <p><br> </p> <p>We propose a new coding scheme for augmenting the input matrix that satisfies the recovery equations of erasure coding with high probability in the event of random failures. This coding scheme also minimizes fill (non-zero elements introduced by the coding block), while being amenable to efficient partitioning across processing nodes. Our experimental results show that the scheme adds minimal overhead for fault tolerance, yields excellent parallel efficiency and scalability, and is robust to different fault arrival models and fault rates.</p> <p><br> </p> <p>Building on these results, we show how we can minimize, to optimality, the overhead associated with our problem augmentation techniques for linear system solvers. Specifically, we present a technique that adaptively augments the problem only when faults are detected. At any point during execution, we only solve a system with the same size as the original input system. This has several advantages in terms of maintaining the size and conditioning of the system, as well as in only adding the minimal amount of computation needed to tolerate the observed faults. We present, in details, the augmentation process, the parallel formulation, and the performance of our method. Specifically, we show that the proposed adaptive fault tolerance mechanism has minimal overhead in terms of FLOP counts with respect to the original solver executing in a non-faulty environment, has good convergence properties, and yields excellent parallel performance.</p> <p><br> </p> <p>Based on the promising results for linear system solvers, we apply the concept of erasure coded computation to eigenvalue problems, which arise in many applications including machine learning and scientific simulations. Erasure coded computation is used to design a fault tolerant eigenvalue solver. The original eigenvalue problem is reformulated into a generalized eigenvalue problem defined on appropriate augmented matrices. We present the augmentation scheme, the necessary conditions for augmentation blocks, and the proofs of equivalence of the original eigenvalue problem and the reformulated generalized eigenvalue problem. Finally, we show how the eigenvalues can be derived from the augmented system in the event of faults. </p> <p><br> </p> <p>We present detailed experiments, which demonstrate the excellent convergence properties of our fault tolerant TraceMin eigensolver in the average case. In the worst case where the row-column pairs that have the most impact on eigenvalues are erased, we present a novel scheme that computes the augmentation blocks as the computation proceeds, using the estimates of leverage scores of row-column pairs as they are computed by the iterative process. We demonstrate low overhead, excellent scalability in terms of the number of faults, and the robustness to different fault arrival models and fault rates for our method.</p> <p><br> </p> <p>In summary, this thesis presents a novel approach to fault tolerance based on erasure coded computations, demonstrates it in the context of important linear algebra kernels, and validates its performance on a diverse set of problems on scalable parallel computing platforms. As parallel systems scale to hundreds of thousands of processing cores and beyond, these techniques present the most scalable fault tolerant mechanisms currently available.</p><br> Computation Theory and Mathematics Distributed Computing Numerical Computation Fault Tolerance Linear Solver Eigenvalue Solver Erasure Coding Parallel Computing
300	High performance computing for the discontinuous Galerkin methods Mukhamedov, Farukh January 2018 (has links) Discontinuous Galerkin methods form a class of numerical methods to find a solution of partial differential equations by combining features of finite element and finite volume methods. Methods are defined using a weak form of a particular model problem, allowing for discontinuities in the discrete trial and test spaces. Using a discontinuous discrete space mesh provides proper flexibility and a compact discretisation pattern, allowing a multidomain and multiphysics simulation. Discontinuous Galerkin methods with a higher approximation polynomial order, the socalled p-version, performs better in terms of convergence rate, compared with the low order h-version with smaller element sizes and bigger mesh. However, the condition number of the Galerkin system grows subsequently. This causes surge in the amount of required storage, computational complexity and in the time required for computation. We use the following three approaches to keep the advantages and eliminate the disadvantages. The first approach will be a specific choice of basis functions which we call C1 polynomials. These ensure that the majority of integrals over the edge of the mesh elements disappears. This reduces the total number of non-zero elements in the resulting system. This decreases the computational complexity without loss in precision. This approach does not affect the number of iterations required by chosen Conjugate Gradients method when compared to the other choice of basis functions. It actually decreases the total number of algebraic operations performed. The second approach is the introduction of suitable preconditioners. In our case, the Additive two-layer Schwarz method, developed in [4], for the iterative Conjugate Gradients method is considered. This directly affects the spectral condition number of the system matrix and decreases the number of iterations required for the computation. This approach, however, increases the total number of algebraic operations and might require more operational time. To tackle the rise in the number of algebraic operations, we introduced a modified Additive two-layer non-overlapping Schwarz method with a Multigrid process. This using a fixed low-order approximation polynomial degree on a coarse grid. We show that this approach is spectrally equivalent to the first preconditioner, and requires less time for computation. The third approach is a development of an efficient mathematical framework for distributed data structure. This allows a high performance, massively parallel, implementation of the discontinuous Galerkin method. We demonstrate that it is possible to exploit properties of the system matrix and C1 polynomials as basis functions to optimize the parallel structures. The previously mentioned parallel data structure allows us to parallelize at the same time both the matrix-vector multiplication routines for the Conjugate Gradients method, as well as the preconditioner routines on the solver level. This minimizes the transfer ratio amongst the distributed system. Finally, we combined all three approaches and created a framework, which allowed us to successfully implement all of the above.

Search results