• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 76
  • 16
  • 7
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 148
  • 148
  • 59
  • 23
  • 21
  • 21
  • 19
  • 19
  • 19
  • 19
  • 16
  • 16
  • 15
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Calcul haute performance pour la simulation d'interactions fluide-structure / High performance computing for the simulation of fluid-structure interactions

Partimbene, Vincent 25 April 2018 (has links)
Cette thèse aborde la résolution des problèmes d'interaction fluide-structure par un algorithme consistant en un couplage entre deux solveurs : un pour le fluide et un pour la structure. Pour assurer la cohérence entre les maillages fluide et structure, on considère également une discrétisation de chaque domaine par volumes finis. En raison des difficultés de décomposition du domaine en sous-domaines, nous considérons pour chaque environnement un algorithme parallèle de multi-splitting (ou multi-décomposition) qui correspond à une présentation unifiée des méthodes de sous-domaines avec ou sans recouvrement. Cette méthode combine plusieurs applications de points fixes contractantes et nous montrons que, sous des hypothèses appropriées, chaque application de points fixes est contractante dans des espaces de dimensions finies normés par des normes hilbertiennes et non-hilbertiennes. De plus, nous montrons qu'une telle étude est valable pour les résolutions parallèles synchrones et plus généralement asynchrones de grands systèmes linéaires apparaissant lors de la discrétisation des problèmes d'interaction fluide-structure et peut être étendue au cas où le déplacement de la structure est soumis à des contraintes. Par ailleurs, nous pouvons également considérer l’analyse de la convergence de ces méthodes de multi-splitting parallèles asynchrones par des techniques d’ordre partiel, lié au principe du maximum discret, aussi bien dans le cadre linéaire que dans celui obtenu lorsque les déplacements de la structure sont soumis à des contraintes. Nous réalisons des simulations parallèles pour divers cas test fluide-structure sur différents clusters, en considérant des communications bloquantes et non bloquantes. Dans ce dernier cas nous avons eu à résoudre une difficulté d'implémentation dans la mesure où une erreur irrécupérable survenait lors de l'exécution ; cette difficulté a été levée par introduction d’une méthode assurant la terminaison de toutes les communications non bloquantes avant la mise à jour du maillage. Les performances des simulations parallèles sont présentées et analysées. Enfin, nous appliquons la méthodologie présentée précédemment à divers contextes d'interaction fluide-structure de type industriel sur des maillages non structurés, ce qui constitue une difficulté supplémentaire. / This thesis deals with the solution of fluid-structure interaction problems by an algorithm consisting in the coupling between two solvers: one for the fluid and one for the structure. In order to ensure the consistency between fluid and structure meshes, we also consider a discretization of each domain by finite volumes. Due to the difficulties of decomposing the domain into sub-domains, we consider a parallel multi-splitting algorithm for each environment which represents a unified presentation of sub-domain methods with or without overlapping. This method combines several contracting fixed point mappings and we show that, under appropriate assumptions, each fixed point mapping is contracting in finite dimensional spaces normalized by Hilbertian and non-Hilbertian norms. In addition, we show that such a study is valid for synchronous parallel solutions and more generally asynchronous of large linear systems arising from the discretization of fluidstructure interaction problems and can be extended to cases where the displacement of the structure is subject to constraints. Moreover, we can also consider the analysis of the convergence of these asynchronous parallel multi-splitting methods by partial ordering techniques, linked to the discrete maximum principle, both in the linear frame and in the one obtained when the structure's displacements are subjected to constraints. We carry out parallel simulations for various fluidstructure test cases on different clusters considering blocking and non-blocking communications. In the latter case, we had to solve an implementation problem due to the fact that an unrecoverable error occurred during execution; this issue has been overcome by introducing a method to ensure the termination of all non-blocking communications prior to the mesh update. Performances of parallel simulations are presented ans analyzed. Finally, we apply the methodology presented above to various fluid-structure interaction cases on unstructured meshes, which represents an additional difficulty.
122

Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel / Análise de sincronizações em execuções por escalonamento guloso e aplicações para geração eficiente de números pseudoaleatórios em paralelo / Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail applications à la génération déterministe de nombres pseudo-aléatoires

Mor, Stefano Drimon Kurz January 2015 (has links)
Nous présentons deux contributions dans le domaine de la programmation parallèle. La première est théorique : nous introduisons l’analyse SIPS, une approche nouvelle pour dénombrer le nombre d’opérations de synchronisation durant l’exécution d’un algorithme parallèle ordonnancé par vol de travail. Basée sur le concept d’horloges logiques, elle nous permet : d’une part de donner de nouvelles majorations de coût en moyenne; d’autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité. La seconde contribution est pragmatique : nous présentons une parallélisation générique d’algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l’exécution. Alternative à l’utilisation d’un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS. Sa caractéristique principale est d’exploiter un générateur séquentiel qui peut “sauter” directement d’un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire. Grâce à l’analyse SIPS, nous montrons qu’en moyenne, lors d’une exécution par vol de travail d’un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d’opérations), ces opérations de saut sont rares. Par-R est comparé au générateur pseudo-aléatoire DotMix écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail. Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement. De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent. / Nós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório. / We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.
123

Implementação eficiente em software de curvas elípticas e emparelhamentos bilineares / Efficient software implementation of elliptic curves and bilinear pairings

Aranha, Diego de Freitas, 1982- 19 August 2018 (has links)
Orientador: Júlio César Lopez Hernández / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-19T05:47:42Z (GMT). No. of bitstreams: 1 Aranha_DiegodeFreitas_D.pdf: 2545815 bytes, checksum: b630a80d0f8be161e6cb7519072882ed (MD5) Previous issue date: 2011 / Resumo: O advento da criptografia assimétrica ou de chave pública possibilitou a aplicação de criptografia em novos cenários, como assinaturas digitais e comércio eletrônico, tornando-a componente vital para o fornecimento de confidencialidade e autenticação em meios de comunicação. Dentre os métodos mais eficientes de criptografia assimétrica, a criptografia de curvas elípticas destaca-se pelos baixos requisitos de armazenamento para chaves e custo computacional para execução. A descoberta relativamente recente da criptografia baseada em emparelhamentos bilineares sobre curvas elípticas permitiu ainda sua flexibilização e a construção de sistemas criptográficos com propriedades inovadoras, como sistemas baseados em identidades e suas variantes. Porém, o custo computacional de criptossistemas baseados em emparelhamentos ainda permanece significativamente maior do que os assimétricos tradicionais, representando um obstáculo para sua adoção, especialmente em dispositivos com recursos limitados. As contribuições deste trabalho objetivam aprimorar o desempenho de criptossistemas baseados em curvas elípticas e emparelhamentos bilineares e consistem em: (i) implementação eficiente de corpos binários em arquiteturas embutidas de 8 bits (microcontroladores presentes em sensores sem fio); (ii) formulação eficiente de aritmética em corpos binários para conjuntos vetoriais de arquiteturas de 64 bits e famílias mais recentes de processadores desktop dotadas de suporte nativo à multiplicação em corpos binários; (iii) técnicas para implementação serial e paralela de curvas elípticas binárias e emparelhamentos bilineares simétricos e assimétricos definidos sobre corpos primos ou binários. Estas contribuições permitiram obter significativos ganhos de desempenho e, conseqüentemente, uma série de recordes de velocidade para o cálculo de diversos algoritmos criptográficos relevantes em arquiteturas modernas que vão de sistemas embarcados de 8 bits a processadores com 8 cores / Abstract: The development of asymmetric or public key cryptography made possible new applications of cryptography such as digital signatures and electronic commerce. Cryptography is now a vital component for providing confidentiality and authentication in communication infra-structures. Elliptic Curve Cryptography is among the most efficient public-key methods because of its low storage and computational requirements. The relatively recent advent of Pairing-Based Cryptography allowed the further construction of flexible and innovative cryptographic solutions like Identity-Based Cryptography and variants. However, the computational cost of pairing-based cryptosystems remains significantly higher than traditional public key cryptosystems and thus an important obstacle for adoption, specially in resource-constrained devices. The main contributions of this work aim to improve the performance of curve-based cryptosystems, consisting of: (i) efficient implementation of binary fields in 8-bit microcontrollers embedded in sensor network nodes; (ii) efficient formulation of binary field arithmetic in terms of vector instructions present in 64-bit architectures, and on the recently-introduced native support for binary field multiplication in the latest Intel microarchitecture families; (iii) techniques for serial and parallel implementation of binary elliptic curves and symmetric and asymmetric pairings defined over prime and binary fields. These contributions produced important performance improvements and, consequently, several speed records for computing relevant cryptographic algorithms in modern computer architectures ranging from embedded 8-bit microcontrollers to 8-core processors / Doutorado / Ciência da Computação / Doutor em Ciência da Computação
124

Dynamics of Driven Quantum Systems:

Baghery, Mehrdad 15 January 2018 (has links) (PDF)
This thesis explores the possibility of using parallel algorithms to calculate the dynamics of driven quantum systems prevalent in atomic physics. In this process, new as well as existing algorithms are considered. The thesis is split into three parts. In the first part an attempt is made to develop a new formalism of the time dependent Schroedinger equation (TDSE) in the hope that the new formalism could lead to a parallel algorithm. The TDSE is written as an eigenvalue problem, the ground state of which represents the solution to the original TDSE. Even though mathematically sound and correct, it turns out the ground state of this eigenvalue problem cannot be easily found numerically, rendering the original hope a false one. In the second part we borrow a Bayesian global optimisation method from the machine learning community in an effort to find the optimum conditions in different systems quicker than textbook optimisation algorithms. This algorithm is specifically designed to find the optimum of expensive functions, and is used in this thesis to 1. maximise the electron yield of hydrogen, 2. maximise the asymmetry in the photo-electron angular distribution of hydrogen, 3. maximise the higher harmonic generation yield within a certain frequency range, 4. generate short pulses via combining higher harmonics generated by hydrogen. In the last part, the phenomenon of dynamic interference (temporal equivalent of the double-slit experiment) is discussed. The necessary conditions are derived from first principles and it is shown where some of the previous analytical and numerical studies have gone wrong; it turns out the choice of gauge plays a crucial role. Furthermore, a number of different scenarios are presented where interference in the photo-electron spectrum is expected to occur.
125

Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor

Karlsson, Andréas January 2010 (has links)
The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow.
126

Parallelism in Event-Based Computations with Applications in Biology

Bauer, Pavol January 2017 (has links)
Event-based models find frequent usage in fields such as computational physics and biology as they may contain both continuous and discrete state variables and may incorporate both deterministic and stochastic state transitions. If the state transitions are stochastic, computer-generated random numbers are used to obtain the model solution. This type of event-based computations is also known as Monte-Carlo simulation. In this thesis, I study different approaches to execute event-based computations on parallel computers. This ultimately allows users to retrieve their simulation results in a fraction of the original computation time. As system sizes grow continuously or models have to be simulated at longer time scales, this is a necessary approach for current computational tasks. More specifically, I propose several ways to asynchronously simulate such models on parallel shared-memory computers, for example using parallel discrete-event simulation or task-based computing. The particular event-based models studied herein find applications in systems biology, computational epidemiology and computational neuroscience. In the presented studies, the proposed methods allow for high efficiency of the parallel simulation, typically scaling well with the number of used computer cores. As the scaling typically depends on individual model properties, the studies also investigate which quantities have the greatest impact on the simulation performance. Finally, the presented studies include other insights into event-based computations, such as methods how to estimate parameter sensitivity in stochastic models and how to simulate models that include both deterministic and stochastic state transitions. / UPMARC
127

Implementação em VHDL de uma arquitetura paralela de um código de Reed-Solomon aplicado a Redes OTN / VHDL implementation of parallel architecture of the Reed-Solomon code for OTN networks

Salvador, Arley Henrique, 1979- 27 August 2018 (has links)
Orientadores: Dalton Soares Arantes, Júlio César Rodrigues Fernandes de Oliveira / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-27T18:41:53Z (GMT). No. of bitstreams: 1 Salvador_ArleyHenrique_M.pdf: 3542397 bytes, checksum: 0847a4269c07c0969394ba0b40491987 (MD5) Previous issue date: 2015 / Resumo: Este trabalho apresenta a implementação de uma arquitetura paralela de um código corretor de erros para aplicações em redes ópticas que utilizam a técnica Foward Error Correction (FEC). O algoritmo FEC especificamente tratado neste trabalho é o Reed-Solomon, que é destinado principalmente a sistemas que sofrem influência causada por erros em rajadas somados ao sinal durante a transmissão, o que o torna adequado para transmissões ópticas. São expostas estruturas seriais de codificador e decodificador FEC e as etapas para convertê-las para uma estrutura paralela. Após descrever as etapas de conversão de uma estrutura serial para paralela é apresentada a estrutura do codificador/decodficador FEC Reed-Solomon RS(255,239) com estrutura paralela para operar em redes ópticas a uma taxa de 100 Gbit/s. Esta descrição exemplificativa é o objetivo principal deste trabalho. A implementação paralela do FEC oferece como vantagem a capacidade de processar os dados de forma rápida, permitindo o emprego desta solução em sistemas com altas taxas de dados. Foi elaborado um ambiente de testes com uma aplicação em redes de transporte óptico, ou Optical Transport Network (OTN). Esta funcionalidade consiste de um Transponder, que tem a função de mapear um cliente de 100 Gigabits Ethernet dentro de uma estrutura de quadro destinado a transmissão de dados em redes ópticas. Deste modo, pôde-se comprovar os resultados e o desempenho da estrutura proposta / Abstract: This paper presents the implementation of a parallel architecture error-correcting code for optical applications that uses the Forward Error Correction (FEC) technique. The FEC algorithm specifically addressed in this work is applied to the Reed-Solomon, which is mainly intended for systems that are harmed by burst errors, which makes it suitable for optical transmissions. A serial FEC encoder/decoder structure and the steps to convert it to a parallel approach are addressed in this work. An example of method that generates an encoder/decoder for a RS(255,239) Reed-Solomon code with parallel structure, able to operate at 100 Gbit/s data rate in optical networks, is also presented. The parallel implementation offers higher FEC processing speeds to handle higher throughputs. A test environment was designed with an application in optical transport networks (OTN). This feature consists a transponder which maps a 100 Gigabit Ethernet client inside an OTN frame structure. With this setup the expected results for the proposed FEC circuitry could be experimentally verified / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica
128

ALGORITHMS FOR DEGREE-CONSTRAINED SUBGRAPHS AND APPLICATIONS

S M Ferdous (11804924) 19 December 2021 (has links)
A degree-constrained subgraph construction (DCS) problem aims to find an optimal spanning subgraph (w.r.t an objective function) subject to certain degree constraints on the vertices. DCS generalizes many combinatorial optimization problems such as Matchings and Edge Covers and has many practical and real-world applications. This thesis focuses on DCS problems where there are only upper and lower bounds on the degrees, known as b-matching and b-edge cover problems, respectively. We explore linear and submodular functions as the objective functions of the subgraph construction.<br><br>The contributions of this thesis involve both the design of new approximation algorithms for these DCS problems, and also their applications to real-world contexts.<br>We designed, developed, and implemented several approximation algorithms for DCS problems. Although some of these problems can be solved exactly in polynomial time, often these algorithms are expensive, tedious to implement, and have little to no concurrency. On the contrary, many of the approximation algorithms developed here run in nearly linear time, are simple to implement, and are concurrent. Using the local dominance framework, we developed the first parallel algorithm submodular b-matching. For weighted b-edge cover, we improved the classic Greedy algorithm using the lazy evaluation technique. We also propose and analyze several approximation algorithms using the primal-dual linear programming framework and reductions to matching. We evaluate the practical performance of these algorithms through extensive experimental results.<br><br>The second contribution of the thesis is to utilize the novel algorithms in real-world applications. We employ submodular b-matching to generate a balanced task assignment for processors to build Fock matrices in the NWChemEx quantum chemistry software. Our load-balanced assignment results in a four-fold speedup per iteration of the Fock matrix computation and scales to 14,000 cores of the Summit supercomputer at Oak Ridge National Laboratory. Using approximate b-edge cover, we propose the first shared-memory and distributed-memory parallel algorithms for the adaptive anonymity problem. Minimum weighted b-edge cover and maximum weight b-matching are shown to be applicable to constructing graphs from datasets for machine learning tasks. We provide a mathematical optimization framework connecting the graph construction problem to the DCS problem.
129

Parallelization of multi-grid methods based on domain decomposition ideas

Jung, M. 30 October 1998 (has links)
In the paper, the parallelization of multi-grid methods for solving second-order elliptic boundary value problems in two-dimensional domains is discussed. The parallelization strategy is based on a non-overlapping domain decomposition data structure such that the algorithm is well-suited for an implementation on a parallel machine with MIMD architecture. For getting an algorithm with a good paral- lel performance it is necessary to have as few communication as possible between the processors. In our implementation, communication is only needed within the smoothing procedures and the coarse-grid solver. The interpolation and restriction procedures can be performed without any communication. New variants of smoothers of Gauss-Seidel type having the same communication cost as Jacobi smoothers are presented. For solving the coarse-grid systems iterative methods are proposed that are applied to the corresponding Schur complement system. Three numerical examples, namely a Poisson equation, a magnetic field problem, and a plane linear elasticity problem, demonstrate the efficiency of the parallel multi- grid algorithm.
130

Dynamics of Driven Quantum Systems:: A Search for Parallel Algorithms

Baghery, Mehrdad 24 November 2017 (has links)
This thesis explores the possibility of using parallel algorithms to calculate the dynamics of driven quantum systems prevalent in atomic physics. In this process, new as well as existing algorithms are considered. The thesis is split into three parts. In the first part an attempt is made to develop a new formalism of the time dependent Schroedinger equation (TDSE) in the hope that the new formalism could lead to a parallel algorithm. The TDSE is written as an eigenvalue problem, the ground state of which represents the solution to the original TDSE. Even though mathematically sound and correct, it turns out the ground state of this eigenvalue problem cannot be easily found numerically, rendering the original hope a false one. In the second part we borrow a Bayesian global optimisation method from the machine learning community in an effort to find the optimum conditions in different systems quicker than textbook optimisation algorithms. This algorithm is specifically designed to find the optimum of expensive functions, and is used in this thesis to 1. maximise the electron yield of hydrogen, 2. maximise the asymmetry in the photo-electron angular distribution of hydrogen, 3. maximise the higher harmonic generation yield within a certain frequency range, 4. generate short pulses via combining higher harmonics generated by hydrogen. In the last part, the phenomenon of dynamic interference (temporal equivalent of the double-slit experiment) is discussed. The necessary conditions are derived from first principles and it is shown where some of the previous analytical and numerical studies have gone wrong; it turns out the choice of gauge plays a crucial role. Furthermore, a number of different scenarios are presented where interference in the photo-electron spectrum is expected to occur.

Page generated in 0.0748 seconds