• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 74
  • 16
  • 7
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 146
  • 146
  • 57
  • 23
  • 21
  • 21
  • 19
  • 19
  • 19
  • 17
  • 16
  • 16
  • 15
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Parallelization of multi-grid methods based on domain decomposition ideas

Jung, M. 30 October 1998 (has links) (PDF)
In the paper, the parallelization of multi-grid methods for solving second-order elliptic boundary value problems in two-dimensional domains is discussed. The parallelization strategy is based on a non-overlapping domain decomposition data structure such that the algorithm is well-suited for an implementation on a parallel machine with MIMD architecture. For getting an algorithm with a good paral- lel performance it is necessary to have as few communication as possible between the processors. In our implementation, communication is only needed within the smoothing procedures and the coarse-grid solver. The interpolation and restriction procedures can be performed without any communication. New variants of smoothers of Gauss-Seidel type having the same communication cost as Jacobi smoothers are presented. For solving the coarse-grid systems iterative methods are proposed that are applied to the corresponding Schur complement system. Three numerical examples, namely a Poisson equation, a magnetic field problem, and a plane linear elasticity problem, demonstrate the efficiency of the parallel multi- grid algorithm.
62

Train Re-scheduling : A Massively Parallel Approach Using CUDA

Petersson, Anton January 2015 (has links)
Context. Train re-scheduling during disturbances is a time-consuming task. Modified schedules need to be found, so that trains can meet in suitable locations and delays minimized. Domino effects are difficult to manage. Commercial optimization software has been shown to find optimal solutions, but modied schedules need to be found quickly. Therefore, greedy depth-first algorithms have been designed to find solutions within a limited time-frame. Modern GPUs have a high computational capacity, and have become easier to use for computations unrelated to computer graphics with the development of technologies such as CUDA and OpenCL. Objectives. We explore the feasibility of running a re-scheduling algorithm developed specifically for this problem on a GPU using the CUDA toolkit. The main objective is to find a way of exploiting the computational capacity of modern GPUs to find better re-scheduling solutions within a limited time-frame. Methods. We develop and adapt a sequential algorithm for use on a GPU and run multiple experiments using 16 disturbance scenarios on the single-tracked iron ore line in northern Sweden. Results. Our implementation succeeds in finding re-scheduling solutions without conflicts for all 16 scenarios. The algorithm visits on average 7 times more nodes per time unit than the sequential CPU algorithm when branching at depth 50, and 4 times more when branching at depth 200. Conclusions. The computational performance of our parallel algorithm is promising but the approach is not complete. Our experiments only show that multiple solution branches can be explored fast in parallel, but not how to construct a high level algorithm that systematically searches for better schedules within a certain time limit. Further research is needed for that. We also find that multiple threads explore redundant solutions in our approach.
63

Aplicações de computação paralela em otimização contínua / Applications of parallel computing in continuous optimization

Ricardo Luiz de Andrade Abrantes 22 February 2008 (has links)
No presente trabalho, estudamos alguns conceitos relacionados ao desenvolvimento de programas paralelos, algumas formas de aplicar computação paralela em métodos de otimização contínua e dois métodos que envolvem o uso de otimização. O primeiro método que apresentamos, chamado PUMA (Pointwise Unconstrained Minimization Approach), recupera constantes óticas e espessuras de filmes finos a partir de valores de transmitância. O problema de recuperação é modelado como um problema inverso e resolvido com auxílio de um método de otimização. Através da paralelização do PUMA viabilizamos a recuperação empírica de constantes e espessuras de sistemas compostos por até dois filmes sobrepostos. Relatamos aqui os resultados obtidos e discutimos o desempenho da versão paralela e a qualidade dos resultados obtidos. O segundo método estudado tem o objetivo de obter configurações iniciais de moléculas para simulações de dinâmica molecular e é chamado PACKMOL. O problema de obter uma configuração inicial de moléculas é modelado como um problema de empacotamento e resolvido com o auxílio de um método de otimização. Construímos uma versão paralela do PACKMOL e mostramos os ganhos de desempenho obtidos com a paralelização. / In this work we studied some concepts of parallel programming, some ways of using parallel computing in continuous optimization methods and two optimization methods. The first method we present is called PUMA (Pointwise Unconstrained Minimization Approach), and it retrieves optical constants and thicknesses of thin films from transmitance data. The problem of retrieve thickness and optical constants is modeled as an inverse problem and solved with aid of an optimization method. Through the paralelization of PUMA we managed to retrieve optical constants and thicknesses of thin films in structures with one and two superposed films. We describe some results and discuss the performance of the parallel PUMA and the quality of the retrievals. The second studied method is used to build an initial configuration of molecules for molecular dynamics simulations and it is called PACKMOL. The problem of create an initial configuration of molecules is modeled as a packing problem and solved with aid of an optimization method. We developed a parallel version of PACKMOL and we show the obtained performance gains.
64

A parallel algorithm to solve the mathematical problem "double coset enumeration of S₂₄ over M₂₄"

Harris, Elena Yavorska 01 January 2003 (has links)
This thesis presents and evaluates a new parallel algorithm that computes all single cosets in the double coset M₂₄ P M₂₄, where P is a permutation on n points of a certain cycle structure, and M₂₄ is the Mathieu group related to a Steiner system S(5, 8, 24) as its automorphism group. The purpose of this work is not to replace the existing algorithms, but rather to explore a possibility to extend calculations of single cosets beyond the limits encountered when using currently available methods.
65

Efficient Parallel Algorithms and Data Structures Related to Trees

Chen, Calvin Ching-Yuen 12 1900 (has links)
The main contribution of this dissertation proposes a new paradigm, called the parentheses matching paradigm. It claims that this paradigm is well suited for designing efficient parallel algorithms for a broad class of nonnumeric problems. To demonstrate its applicability, we present three cost-optimal parallel algorithms for breadth-first traversal of general trees, sorting a special class of integers, and coloring an interval graph with the minimum number of colors.
66

A Speculative Approach to Parallelization in Particle Swarm Optimization

Gardner, Matthew J. 26 April 2011 (has links) (PDF)
Particle swarm optimization (PSO) has previously been parallelized primarily by distributing the computation corresponding to particles across multiple processors. In this thesis we present a speculative approach to the parallelization of PSO that we refer to as SEPSO. In our approach, we refactor PSO such that the computation needed for iteration t+1 can be done concurrently with the computation needed for iteration t. Thus we can perform two iterations of PSO at once. Even with some amount of wasted computation, we show that this approach to parallelization in PSO often outperforms the standard parallelization of simply adding particles to the swarm. SEPSO produces results that are exactly equivalent to PSO; this is not a new algorithm or variant, only a new method of parallelization. However, given this new parallelization model we can relax the requirement of exactly reproducing PSO in an attempt to produce better results. We present several such relaxations, including keeping the best speculative position evaluated instead of the one corresponding to the standard behavior of PSO, and speculating several iterations ahead instead of just one. We show that these methods dramatically improve the performance of parallel PSO in many cases, giving speed ups of up to six times compared to previous parallelization techniques.
67

Asynchronous Algorithms for Large-Scale Optimization : Analysis and Implementation

Aytekin, Arda January 2017 (has links)
This thesis proposes and analyzes several first-order methods for convex optimization, designed for parallel implementation in shared and distributed memory architectures. The theoretical focus is on designing algorithms that can run asynchronously, allowing computing nodes to execute their tasks with stale information without jeopardizing convergence to the optimal solution. The first part of the thesis focuses on shared memory architectures. We propose and analyze a family of algorithms to solve an unconstrained, smooth optimization problem consisting of a large number of component functions. Specifically, we investigate the effect of information delay, inherent in asynchronous implementations, on the convergence properties of the incremental prox-gradient descent method. Contrary to related proposals in the literature, we establish delay-insensitive convergence results: the proposed algorithms converge under any bounded information delay, and their constant step-size can be selected independently of the delay bound. Then, we shift focus to solving constrained, possibly non-smooth, optimization problems in a distributed memory architecture. This time, we propose and analyze two important families of gradient descent algorithms: asynchronous mini-batching and incremental aggregated gradient descent. In particular, for asynchronous mini-batching, we show that, by suitably choosing the algorithm parameters, one can recover the best-known convergence rates established for delay-free implementations, and expect a near-linear speedup with the number of computing nodes. Similarly, for incremental aggregated gradient descent, we establish global linear convergence rates for any bounded information delay. Extensive simulations and actual implementations of the algorithms in different platforms on representative real-world problems validate our theoretical results. / <p>QC 20170317</p>
68

Trial Division : Improvements and Implementations / Trial Division : Förbättringar och Implementationer

Hedenström, Felix January 2017 (has links)
Trial division is possibly the simplest algorithm for factoring numbers.The problem with Trial division is that it is slow and wastes computationaltime on unnecessary tests of division. How can this simple algorithms besped up while still being serial? How does this algorithm behave whenparallelized? Can a superior serial and a parallel version be combined intoan even more powerful algorithm?To answer these questions the basics of trial divisions where researchedand improvements where suggested. These improvements where later im-plemented and tested by measuring the time it took to factorize a givennumber.A version using a list of primes and multiple threads turned out to bethe fastest for numbers larger than 10 10 , but was beaten when factoringlower numbers by its serial counterpart. A problem was detected thatcaused the parallel versions to have long allocation times which slowedthem down, but this did not hinder them much. / Trial division är en av de enklaste algoritmerna när det kommer till attfaktorisera tal. Problemet med trial division är att det är relativt långsamtoch att det gör onödiga beräkningar. Hur kan man göra denna algoritmsnabbare utan att inte göra den seriell? Hur beter sig algoritmen när denär parallelliserad? Kan en förbättrad seriell sedan bli parallelliserad?För att besvara dessa frågor studerades trial division och dess möjligaförbättringar. Dessa olika förbättringar implementerades i form av flerafunktioner som sedan och testades mot varandra.Den snabbaste versionen byggde på att använda en lista utav primtaloch trådar för att minimera antal ’trials’ samt att dela upp arbetet. Denvar dock inte alltid snabbast, då den seriella versionen som också användeen lista av primtal var snabbare för siffror under 10 10 . Sent upptäck-tes ett re-allokeringsproblem med de parallella implementationerna, meneftersom de ändå var snabbare fixades inte detta problemet.
69

Integrated compiler optimizations for tensor contractions

Gao, Xiaoyang 07 January 2008 (has links)
No description available.
70

Kinetic Monte Carlo simulations of submonolayer and multilayer epitaxial growth over extended time- and length-scales

Giridhar, Nandipati 23 September 2009 (has links)
No description available.

Page generated in 0.0898 seconds