Global ETD Search

261	Paralelização do cálculo de estruturas de bandas de semicondutores usando o High Performance Fortran / Semiconductors band structure calculus paralelization using High Performance Fortran Malara, Rodrigo Daniel 14 January 2005 (has links) O uso de sistemas multiprocessados para a resolução de problemas que demandam um grande poder computacional tem se tornado cada vez mais comum. Porém a conversão de programas seqüenciais para programas concorrentes ainda não é uma tarefa trivial. Dentre os fatores que tornam esta tarefa difícil, destacamos a inexistência de um paradigma único e consolidado para a construção de sistemas computacionais paralelos e a existência de várias plataformas de programação para o desenvolvimento de programas concorrentes. Nos dias atuais ainda é impossível isentar o programador da especificação de como o problema será particionado entre os vários processadores. Para que o programa paralelo seja eficiente, o programador deve conhecer a fundo aspectos que norteiam a construção do hardware computacional paralelo, aspectos inerentes à arquitetura onde o software será executado e à plataforma de programação concorrente escolhida. Isto ainda não pode ser mudado. O ganho que podemos obter é na implementação do software paralelo. Esta tarefa pode ser trabalhosa e demandar muito tempo para a depuração, pois as plataformas de programação não possibilitam que o programador abstraia dos elementos de hardware. Tem havido um grande esforço na criação de ferramentas que otimizem esta tarefa, permitindo que o programador se expresse mais fácil e sucintamente quanto à para1elização do programa. O presente trabalho se baseia na avaliação dos aspectos ligados à implementação de software concorrente utilizando uma plataforma de portabilidade chamada High Performance Fortran, aplicado a um problema específico da física: o cálculo da estrutura de bandas de heteroestruturas semicondutoras. O resultado da utilização desta plataforma foi positivo. Obtivemos um ganho de performance superior ao esperado e verificamos que o compilador pode ser ainda mais eficiente do que o próprio programador na paralelização de um programa. O custo inicial de desenvolvimento não foi muito alto, e pode ser diluído entre os futuros projetos que venham a utilizar deste conhecimento pois após a fase de aprendizado, a paralelização de programas se torna rápida e prática. A plataforma de paralelização escolhida não permite a paralelização de todos os tipos de problemas, apenas daqueles que seguem o paradigma de paralelismo por dados, que representam uma parcela considerável dos problemas típicos da Física. / The employment of multiprocessor systems to solve problems that demand a great computational power have become more and more usual. Besides, the conversion of sequential programs to concurrent ones isn\'t trivial yet. Among the factors that makes this task difficult, we highlight the nonexistence of a unique and consolidated paradigm for the parallel computer systems building and the existence of various programming platforms for concurrent programs development. Nowadays it is still impossible to exempt the programmer of the specification about how the problem will be partitioned among the various processors. In order to have an efficient parallel program the programmer have to deeply know subjects that heads the parallel hardware systems building, the inherent architecture where the software will run and the chosen concurrent programming platform. This cannot be changed yet. The gain is supposed to be on the parallel software implementation. This task can be very hard and consume so much time on debugging it, because the programming platforms do not allow the programmer to abstract from the hardware elements. It has been a great effort in the development of tools that optimize this task, allowing the programmer to work easily and briefly express himself concerning the software parallelization. The present work is based on the evaluation of aspects linked to the concurrent software implementation using a portability platform called High Performance Fortran, applied to a physics specific problem: the calculus of semiconductor heterostructures? valence band structure. The result of the use of this platform use was positive. We obtained a performance gain superior than we expected and we could assert that the compiler is able to be more effective than the programmer on the paralelization of a program. The initial development cost wasn\'t so high and it can be diluted between the next projects that would use the acquired knowledge, because after the learning phase, the programs parallelization task becomes quick and practical. The chosen parallelization platform does not allow the parallelization of all kinds of problems, but just the ones that follow the data parallelism paradigm that represents a considerable parcel of tipical Physics problems. Cálculo autoconsistente Computação paralela Computacional simulation k.p. Method Método k.p. Parallel computing Self-consistent calculation Simulações computacionais
262	Optimisation semi-infinie sur GPU pour le contrôle corps-complet de robots / GPU-based Semi-Infinite Optimization for Whole-Body Robot Control Chrétien, Benjamin 08 July 2016 (has links) Un robot humanoïde est un système complexe doté de nombreux degrés de liberté, et dont le comportement est sujet aux équations non linéaires du mouvement. Par conséquent, la planification de mouvement pour un tel système est une tâche difficile d'un point de vue calculatoire. Dans ce mémoire, nous avons pour objectif de développer une méthode permettant d'utiliser la puissance de calcul des GPUs dans le contexte de la planification de mouvement corps-complet basée sur de l'optimisation. Nous montrons dans un premier temps les propriétés du problème d'optimisation, et des pistes d'étude pour la parallélisation de ce dernier. Ensuite, nous présentons notre approche du calcul de la dynamique, adaptée aux architectures de calcul parallèle. Cela nous permet de proposer une implémentation de notre problème de planification de mouvement sur GPU: contraintes et gradients sont calculés en parallèle, tandis que la résolution du problème même se déroule sur le CPU. Nous proposons en outre une nouvelle paramétrisation des forces de contact adaptée à notre problème d'optimisation. Enfin, nous étudions l'extension de notre travail au contrôle prédictif. / A humanoid robot is a complex system with numerous degrees of freedom, whose behavior is subject to the nonlinear equations of motion. As a result, planning its motion is a difficult task from a computational perspective.In this thesis, we aim at developing a method that can leverage the computing power of GPUs in the context of optimization-based whole-body motion planning. We first exhibit the properties of the optimization problem, and show that several avenues can be exploited in the context of parallel computing. Then, we present our approach of the dynamics computation, suitable for highly-parallel processing architectures. Next, we propose a many-core GPU implementation of the motion planning problem. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. Because each constraint and its gradient can be evaluated independently for each time interval, we end up with a highly parallelizable problem that can take advantage of GPUs. We also propose a new parametrization of contact forces adapted to our optimization problem. Finally, we investigate the extension of our work to model predictive control. Planification de mouvements Robotique humanoïde Calcul parallèle Gpgpu Optimisation non linéaire Motion planning Humanoid robotics Parallel computing Gpgpu Nonlinear optimization
263	Avaliação do uso de computação paralela utilizando uma rede P2P na simulação de dados climáticos: velocidade e direção do vento / Avaliação do uso de computação paralela utilizando uma rede P2P na simulação de dados climáticos: velocidade e direção do vento Serckumecka, Adriano 17 July 2012 (has links) Made available in DSpace on 2017-07-21T14:19:32Z (GMT). No. of bitstreams: 1 AdrianoSerckumecka.pdf: 825262 bytes, checksum: 9c527c3f506be1e58cf82ac8117dac67 (MD5) Previous issue date: 2012-07-17 / The main objective of this work is the evaluation of distributed systems based on peer-topeer network and parallel computing techniques to reduce response times of climate simulations. A probabilistic model for simulating wind data was used and two computing applications was implemented and evaluated. The first application was developed by using the framework P2PComp, which is directed to building parallel software and its deploy in P2P networks. The second application was developed without the support of this framework and using straightly the communication infrastructure software. Climatic data of the municipality of Lapa, Paraná comprising the period between 1998 and 2007 were used in the experiments. The results demonstrate the feasibility of using parallel computing and P2P networks for executing climate simulations. Best results, measured by the speedup obtained, were observed when using multiple peers (equal to 26) and when executing simulations for a long period of time ( near to 100 years). In such case, the speedup obtained is near to 7. / O objetivo principal deste trabalho é a avaliação de sistemas distribuídos baseados em redespar-a-par (P2P), juntamente com técnicas de computação paralela, para reduzir os tempos deresposta de simulações climáticas. Um modelo probabilístico específico para simulação de dados de vento foi adotado e duas aplicações computacionais foram implementadas e avaliadas.A primeira aplicação foi desenvolvida com o uso do Framework P2PComp, que permite a criação de programas paralelos e sua execução em redes P2P. A segunda aplicação adotou diretamente uma rede P2P, sem o uso desse framework, como infraestrutura de comunicação.Dados climáticos da região do Município de Lapa-PR, compreendendo o períıodo entre 1998 e 2007 foram empregados nos experimentos. Os resultados obtidos demonstram a viabilidade de utilização de computação paralela em redes P2P, nas simulações climáticas. Os melhores resultados medidos em relação ao fator de aceleração foram observados em situações onde foi utilizado um número maior de pares (igual a 26) e simulações climáticas para um período de tempo igual a 100 anos. Para este caso, o fator de aceleração obtido foi aproximadamente igual a 7. P2P vento parallel computing climate simulation P2P wind
264	Desenvolvimento e otimização de um código paralelizado para simulação de escoamentos incompressíveis / Development and optimization of a parallel code for the simulation of incompressible flows Rogenski, Josuel Kruppa 06 April 2011 (has links) O presente trabalho de pesquisa tem por objetivo estudar a paralelização de algoritmos voltados à solução de equações diferenciais parciais. Esses algoritmos são utilizados para gerar a solução numérica das equações de Navier-Stokes em um escoamento bidimensional incompressível de um fluido newtoniano. As derivadas espaciais são calculadas através de um método de diferenças finitas compactas com a utilização de aproximações de altas ordens de precisão. Uma vez que o cálculo de derivadas espaciais com alta ordem de precisão da forma compacta adotado no presente estudo requer a solução de sistemas lineares tridiagonais, é importante realizar estudos voltados a resolução desses sistemas, para se obter uma boa performance. Ressalta-se ainda que a solução de sistemas lineares também faz-se presente na solução numérica da equação de Poisson. Os resultados obtidos decorrentes da solução das equações diferenciais parciais são comparados com os resultados onde se conhece a solução analítica, de forma a verificar a precisão dos métodos implementados. Os resultados do código voltado à resolução das equações de Navier-Stokes paralelizado para simulação de escoamentos incompressíveis são comparados com resultados da teoria de estabilidade linear, para validação do código final. Verifica-se a performance e o speedup do código em questão, comparando-se o tempo total gasto em função do número de elementos de processamento utilizados / The objective of the present work is to study the parallelization of partial differential equations. The aim is to achieve an effective parallelization to generate numerical solution of Navier-Stokes equations in a two-dimensional incompressible and isothermal flow of a Newtonian fluid. The spatial derivatives are calculated using compact finite differences approximations of higher order accuracy. Since the calculation of spatial derivatives with high order adopted in the present work requires the solution of tridiagonal systems, it is important to conduct studies to solve these systems and achieve good performance. In addiction, linear systems solution is also present in the numerical solution of a Poisson equation. The results generated by the solution of partial differential equations are compared to analytical solution, in order to verify the accuracy of the implemented methods. The numerical parallel solution of a Navier-Stokes equations is compared with linear stability theory to validate the final code. The performance and the speedup of the code in question is also checked, comparing the execution time in function of the number of processing elements Compact finite differences Computação paralela Diferenças finitas compactas Equações de Navier-Stokes Métodos multigrid Multigrid methods Navier-Stokes equations Parallel computing
265	Modélisation par la méthode Lattice Boltzmann de la diffusion de chaleur et d’humidité dans des matériaux biosourcés à partir de leur morphologie 3D / Heat and moisture diffusion in bio-based materials from their 3D morphology using Lattice Boltzmann method Louërat, Mathilde 19 January 2017 (has links) Avec la performance thermique croissante des bâtiments, les codes de simulation utilisés en conception requièrent des données de plus en plus précises sur les matériaux de construction. De plus, l’utilisation de matériaux biosourcés qui sont hygroscopiques (leur teneur en eau s’équilibre avec l’air humide ambiant) est en pleine expansion. Leur conductivité thermique et leur diffusivité massique doivent ainsi être caractérisées précisément. Un facteur essentiel affectant ces propriétés est la microstructure des matériaux. Ce travail de thèse propose de prédire les propriétés macroscopiques d’épicéa et de panneaux de fibres de bois (matériaux hétérogènes et anisotropes) à partir de leur morphologie réelle 3D. Celle-ci est obtenue par micro-tomographie synchrotron aux rayons X, outil très performant pour caractériser la structure interne d’un matériau de façon non destructive. Un traitement d’images permet de segmenter les phases solide et gazeuse. La méthode numérique choisie pour modéliser la diffusion de chaleur et de masse est la méthode Lattice Boltzmann car elle est simple à implémenter et à paralléliser et qu’elle peut facilement traiter des morphologies complexes. Les conductivités thermiques et diffusivités massiques équivalentes sont calculées dans trois directions orthogonales pour chaque matériau. Les résultats mettent en évidence l’influence de la structure interne et la forte anisotropie des matériaux étudiés (rapport 2 entre les directions tangentielle et longitudinale du bois en thermique et 30 en massique). La conductivité thermique transversale du panneau léger est de 0,04 W m−1 K−1. / As thermal performance of buildings is increasing, the simulation codes used during design require more accurate construction material data. Moreover, the use of bio-based materials which are hygroscopic (their moisture content balances with the ambient moist air) is booming. Their thermal conductivity and mass diffusivity must therefore be accurately characterized. A key factor affecting these properties is the microstructure of the materials. This work is dedicated to the prediction of macroscopic properties of spruce and fibreboards (heterogeneous and anisotropic materials) from their real 3D morphology. This is obtained by synchrotron X-ray microtomography, a powerful and nondestructive technique to characterize the internal structure of materials. Image processing allows the segmentation of the solid and gaseous phases. To model heat and mass diffusion, we choose the Lattice Boltzmann method because of its simple numerical development, suitability for parallel computing and easy processing of complex morphologies. The equivalent thermal conductivity and mass diffusivity are calculated in three orthogonal directions for each material. The results highlight the influence of the internal structure and the strong anisotropy of the materials studied (ratio of 2 between tangential and longitudinal directions of wood for heat diffusion and of 30 for mass diffusion). The transverse thermal conductivity of the lightweight board is about 0,04 W m−1 K−1. Bois Panneau de fibres Micro-Tomographie X Anisotropie Calcul parallèle Wood Fibreboard X-Ray microtomography Anisotropy Parallel computing
266	Análise de execução de aplicações paralelas em grades móveis com restrições de processamento e bateria / Analysis of the execution of parallel applications using a mobile grid environment Santos, Frederico Cassis Ribeiro 10 March 2016 (has links) Existem atualmente diversas propostas para integração de dispositivos móveis em uma grade computacional, porém vários problemas são observados em tais ambientes. Esta dissertação mantém o foco em um problema, a restrição sobre a quantidade de energia despendida na execução das aplicações, ao utilizar esses dispositivos móveis como provedores de recursos em uma grade computacional que fornece processamento para aplicações paralelas. Para tanto, este trabalho propõe um método para estimar o consumo de energia das aplicações considerando que elas utilizam um determinado conjunto de operações as quais estão presentes na grande maioria das aplicações paralelas (operações matemáticas e alocação de memória). Com base no método proposto, dois dispositivos móveis foram estudados e foi criada uma representação do consumo de energia utilizando-se de métodos de regressão. Para validar os modelos, duas aplicações foram analisadas e o consumo de energia real foi comparado ao consumo estimado. O modelo criado apresentou resultados próximos ao medido, mostrando um aumento entre 6% e 14,24% em relação ao resultado medido. / Nowadays, there are different proposals to integrate mobile devices in a computational grid, although several problems are introduces. This dissertation focus on the energy limitation problem when using mobile devices to provide resources, such as processing power to run parallel applications. It also proposes a method to estimate energy consumption for a task that needs to be executed in this environment. To achieve this goal two mobile devices were used as a test case and a representation of its energy consumption was created running benchmarks and using regression techniques. To validate the model created, two applications were executed and had the measured values compared to the estimated ones. The estimation showed a raise between 6 and 14.24 percent. Computação paralela Dispositivos móveis Energy consumption estimation Estimativa do consumo de energia Grade computacional móvel Mobile computational grid Mobile devices Parallel computing
267	Fault Tolerance in Linear Algebraic Methods using Erasure Coded Computations Xuejiao Kang (5929862) 16 January 2019 (has links) <p>As parallel and distributed systems scale to hundreds of thousands of cores and beyond, fault tolerance becomes increasingly important -- particularly on systems with limited I/O capacity and bandwidth. Error correcting codes (ECCs) are used in communication systems where errors arise when bits are corrupted silently in a message. Error correcting codes can detect and correct erroneous bits. Erasure codes, an instance of error correcting codes that deal with data erasures, are widely used in storage systems. An erasure code addsredundancy to the data to tolerate erasures. </p> <p><br> </p> <p>In this thesis, erasure coded computations are proposed as a novel approach to dealing with processor faults in parallel and distributed systems. We first give a brief review of traditional fault tolerance methods, error correcting codes, and erasure coded storage. The benefits and challenges of erasure coded computations with respect to coding scheme, fault models and system support are also presented.</p> <p><br> </p> <p>In the first part of my thesis, I demonstrate the novel concept of erasure coded computations for linear system solvers. Erasure coding augments a given problem instance with redundant data. This augmented problem is executed in a fault oblivious manner in a faulty parallel environment. In the event of faults, we show how we can compute the true solution from potentially fault-prone solutions using a computationally inexpensive procedure. The results on diverse linear systems show that our technique has several important advantages: (i) as the hardware platform scales in size and in number of faults, our scheme yields increasing improvement in resource utilization, compared to traditional schemes; (ii) the proposed scheme is easy to code as the core algorithm remains the same; (iii) the general scheme is flexible to accommodate a range of computation and communication trade-offs. </p> <p><br> </p> <p>We propose a new coding scheme for augmenting the input matrix that satisfies the recovery equations of erasure coding with high probability in the event of random failures. This coding scheme also minimizes fill (non-zero elements introduced by the coding block), while being amenable to efficient partitioning across processing nodes. Our experimental results show that the scheme adds minimal overhead for fault tolerance, yields excellent parallel efficiency and scalability, and is robust to different fault arrival models and fault rates.</p> <p><br> </p> <p>Building on these results, we show how we can minimize, to optimality, the overhead associated with our problem augmentation techniques for linear system solvers. Specifically, we present a technique that adaptively augments the problem only when faults are detected. At any point during execution, we only solve a system with the same size as the original input system. This has several advantages in terms of maintaining the size and conditioning of the system, as well as in only adding the minimal amount of computation needed to tolerate the observed faults. We present, in details, the augmentation process, the parallel formulation, and the performance of our method. Specifically, we show that the proposed adaptive fault tolerance mechanism has minimal overhead in terms of FLOP counts with respect to the original solver executing in a non-faulty environment, has good convergence properties, and yields excellent parallel performance.</p> <p><br> </p> <p>Based on the promising results for linear system solvers, we apply the concept of erasure coded computation to eigenvalue problems, which arise in many applications including machine learning and scientific simulations. Erasure coded computation is used to design a fault tolerant eigenvalue solver. The original eigenvalue problem is reformulated into a generalized eigenvalue problem defined on appropriate augmented matrices. We present the augmentation scheme, the necessary conditions for augmentation blocks, and the proofs of equivalence of the original eigenvalue problem and the reformulated generalized eigenvalue problem. Finally, we show how the eigenvalues can be derived from the augmented system in the event of faults. </p> <p><br> </p> <p>We present detailed experiments, which demonstrate the excellent convergence properties of our fault tolerant TraceMin eigensolver in the average case. In the worst case where the row-column pairs that have the most impact on eigenvalues are erased, we present a novel scheme that computes the augmentation blocks as the computation proceeds, using the estimates of leverage scores of row-column pairs as they are computed by the iterative process. We demonstrate low overhead, excellent scalability in terms of the number of faults, and the robustness to different fault arrival models and fault rates for our method.</p> <p><br> </p> <p>In summary, this thesis presents a novel approach to fault tolerance based on erasure coded computations, demonstrates it in the context of important linear algebra kernels, and validates its performance on a diverse set of problems on scalable parallel computing platforms. As parallel systems scale to hundreds of thousands of processing cores and beyond, these techniques present the most scalable fault tolerant mechanisms currently available.</p><br> Computation Theory and Mathematics Distributed Computing Numerical Computation Fault Tolerance Linear Solver Eigenvalue Solver Erasure Coding Parallel Computing
268	High performance computing for the discontinuous Galerkin methods Mukhamedov, Farukh January 2018 (has links) Discontinuous Galerkin methods form a class of numerical methods to find a solution of partial differential equations by combining features of finite element and finite volume methods. Methods are defined using a weak form of a particular model problem, allowing for discontinuities in the discrete trial and test spaces. Using a discontinuous discrete space mesh provides proper flexibility and a compact discretisation pattern, allowing a multidomain and multiphysics simulation. Discontinuous Galerkin methods with a higher approximation polynomial order, the socalled p-version, performs better in terms of convergence rate, compared with the low order h-version with smaller element sizes and bigger mesh. However, the condition number of the Galerkin system grows subsequently. This causes surge in the amount of required storage, computational complexity and in the time required for computation. We use the following three approaches to keep the advantages and eliminate the disadvantages. The first approach will be a specific choice of basis functions which we call C1 polynomials. These ensure that the majority of integrals over the edge of the mesh elements disappears. This reduces the total number of non-zero elements in the resulting system. This decreases the computational complexity without loss in precision. This approach does not affect the number of iterations required by chosen Conjugate Gradients method when compared to the other choice of basis functions. It actually decreases the total number of algebraic operations performed. The second approach is the introduction of suitable preconditioners. In our case, the Additive two-layer Schwarz method, developed in [4], for the iterative Conjugate Gradients method is considered. This directly affects the spectral condition number of the system matrix and decreases the number of iterations required for the computation. This approach, however, increases the total number of algebraic operations and might require more operational time. To tackle the rise in the number of algebraic operations, we introduced a modified Additive two-layer non-overlapping Schwarz method with a Multigrid process. This using a fixed low-order approximation polynomial degree on a coarse grid. We show that this approach is spectrally equivalent to the first preconditioner, and requires less time for computation. The third approach is a development of an efficient mathematical framework for distributed data structure. This allows a high performance, massively parallel, implementation of the discontinuous Galerkin method. We demonstrate that it is possible to exploit properties of the system matrix and C1 polynomials as basis functions to optimize the parallel structures. The previously mentioned parallel data structure allows us to parallelize at the same time both the matrix-vector multiplication routines for the Conjugate Gradients method, as well as the preconditioner routines on the solver level. This minimizes the transfer ratio amongst the distributed system. Finally, we combined all three approaches and created a framework, which allowed us to successfully implement all of the above.
269	Application of GPU Computing to Some Urban Traffic Problems Jradi, Walid Abdala Rfaei 30 November 2016 (has links) Submitted by Erika Demachki (erikademachki@gmail.com) on 2017-01-06T16:59:11Z No. of bitstreams: 2 Tese - Walid Abdala Rfaei Jradi - 2016.pdf: 5339936 bytes, checksum: 0a0a6bdc4791ee31c229b5175ae3af03 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-01-09T09:29:25Z (GMT) No. of bitstreams: 2 Tese - Walid Abdala Rfaei Jradi - 2016.pdf: 5339936 bytes, checksum: 0a0a6bdc4791ee31c229b5175ae3af03 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-01-09T09:29:25Z (GMT). No. of bitstreams: 2 Tese - Walid Abdala Rfaei Jradi - 2016.pdf: 5339936 bytes, checksum: 0a0a6bdc4791ee31c229b5175ae3af03 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-11-30 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The present work studies and proposes GPU-based parallel algorithms and implementations for the problem of macroscopic assignment of urban traffic on large-scale networks, promoting an in-depth investigation on each sub-problem that must be efficiently solved during the traffic assignment process. Among the main contributions of this work, there are: 1) the first GPU-based algorithm for the enumeration of chordless cycles; 2) a new parallel GPU-based shortest path algorithm that takes advantage of some common properties of urban traffic networks; a refinement in the parallel reduction implementation proposed by one of the leaders in the GPU market, which resulted in a 2.8x speedup relative to its original version; and finally, 3) a parallel algorithm for the macroscopic traffic assignment problem, 39x faster than the equivalent sequential approach when applied to large scale networks. The main goal of this thesis is to contribute to the extension of the PET-Gyn software, proposing efficient GPU data structures and parallel algorithms for a faster resolution of two well known problems in the literature: The Traffic Assignment Problem (TAP) and the Enumeration of Chordless Cycles. When applied to difficult input sets, the performed experiments showed a clear advantage of the parallel algorithms over their sequential versions. / O presente trabalho estuda e propõe algoritmos e implementações paralelas baseadas em GPU para o problema de alocação macroscópica de tráfego urbano em redes de grande porte, promovendo uma investigação aprofundada de cada sub-problema que deve ser resolvido de forma eficiente durante o processo de atribuição de tráfego. Entre as principais contribuições deste trabalho, estão: 1) o primeiro algoritmo baseado em GPU para a enumeração de ciclos sem corda; 2) um novo algoritmo de caminho mínimo paralelo que tira vantagem de algumas propriedades comuns das redes de tráfego urbano; Um refinamento na implementação de redução paralela proposta por um dos líderes no mercado de GPU, o que resultou em uma aceleração de 2,8x em relação à sua versão original; 3) e, finalmente, um algoritmo paralelo para o problema de alocação macroscópica de tráfego, 39x mais rápido do que a abordagem equivalente sequencial quando aplicado a redes de larga escala. O objetivo principal desta tese é de contribuir para a expansão do software PET-Gyn, propondo estruturas de dados de GPU eficientes e algoritmos paralelos para uma resolução mais rápida de dois problemas bem conhecidos na literatura: O Problema de Alocação de Tráfego e a Enumeração de Ciclos sem Corda. Quando aplicados a conjuntos de entrada difíceis, os experimentos realizados mostraram uma clara vantagem dos algoritmos paralelos sobre suas versões sequenciais. Tráfego urbano Alocação macroscópica de tráfego Computação paralela GPU Urban traffic Macroscopic traffic assignment Parallel computing
270	Développement et implémentation parallèle de méthodes d'interaction de configurations sélectionnées / Development and parallel implementation of selected configuration interaction methods Garniron, Yann 03 December 2018 (has links) Cette thèse, ayant pour thème les algorithmes de la chimie quantique, s'inscrit dans le cade du changement de paradigme observé depuis une douzaines d'années, dans lequel les méthodes de calcul séquentielles se doivent d'être progressivement remplacées par des méthodes parallèles. En effet, l'augmentation de la fréquences des processeurs se heurtant à des barrières physiques difficilement franchissables, l'augmentation de la puissance de calcul se fait par l'augmentation du nombre d'unités de calcul. Toutefois, là où une augmentation de la fréquence conduisait mécaniquement à une exécution plus rapide d'un code, l'augmentation du nombre de cœurs peut se heurter à des barrières algorithmiques, qui peuvent nécessiter une adaptation ou un changement d'algorithme. Parmi les méthodes développées afin de contourner ce problème, on trouve en particulier celles de type Monte-Carlo (stochastiques), qui sont intrinsèquement "embarrassingly parallel", c'est à dire qu'elles sont par construction constituées d'une multitudes de tâches indépendantes, et de ce fait particulièrement adaptées aux architectures massivement parallèles. Elles ont également l'avantage, dans de nombreux cas, d'être capables de produire un résultat approché pour une fraction du coût calculatoire de l'équivalent déterministe exacte. Lors de cette thèse, des implémentations massivement parallèles de certains algorithmes déterministes de chimie quantique ont été réalisées. Il s'agit des algorithmes suivants : CIPSI, diagonalisation de Davidson, calcul de la perturbation au second ordre, shifted-Bk, et Coupled Cluster Multi Références. Pour certains, une composante stochastique a été introduite en vue d'améliorer leur efficacité. Toutes ces méthodes ont été implémentées sur un modèle de tâches distribuées en TCP, où un processus central distribue des tâches par le réseau et collecte les résultats. En d'autres termes, des nœuds esclaves peuvent être ajoutés au cours du calcul depuis n'importe quelle machine accessible depuis internet. L'efficacité parallèle des algorithmes implémentés dans cette thèse a été étudiée, et le programme a pu donner lieu à de nombreuses applications, notamment pour permettre d'obtenir des énergies de références pour des systèmes moléculaires difficiles. / This thesis, whose topic is quantum chemistry algorithms, is made in the context of the change in paradigm that has been going on for the last decade, in which the usual sequential algorithms are progressively replaced by parallel equivalents. Indeed, the increase in processors' frequency is challenged by physical barriers, so increase in computational power is achieved through increasing the number of cores. However, where an increase of frequency mechanically leads to a faster execution of a code, an increase in number of cores may be challenged by algorithmic barriers, which may require adapting of even changing the algorithm. Among methods developed to circumvent this issue, we find in particular Monte-Carlo methods (stochastic methods), which are intrinsically "embarrassingly parallel", meaning they are by design composed of a large number of independent tasks, and thus, particularly well-adapted to massively parallel architectures. In addition, they often are able to yield an approximate result for just a fraction of the cost of the equivalent deterministic, exact computation. During this thesis, massively parallel implementations of some deterministic quantum chemistry algorithms were realized. Those methods are: CIPSI, Davidson diagonalization, computation of second-order perturbation, shifted-Bk, Multi-Reference Coupled-Cluster. For some of these, a stochastic aspect was introduced in order to improve their efficiency. All of them were implemented on a distributed task model, with a central process distributing tasks and collecting results. In other words, slave nodes can be added during the computation from any location reachable through Internet. The efficiency for the implemented algorithms has been studied, and the code could give way to numerous applications, in particular to obtain reference energies for difficult molecular systems. Calcul parallèle Chimie quantique Théorie des perturbations Parallel computing Quantum chemistry Selected configuration interaction Perturbation theory

Search results