• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 76
  • 16
  • 7
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 148
  • 148
  • 59
  • 23
  • 21
  • 21
  • 19
  • 19
  • 19
  • 19
  • 16
  • 16
  • 15
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Parallel algorithms of timetable generation / Parallella algoritmer för att generera scheman.

Antkowiak, Łukasz January 2013 (has links)
Context: Most of the problem of generating timetable for a school belongs to the class of NP-hard problems. Complexity and practical value makes this kind of problem interesting for parallel computing. Objectives: This paper focuses on Class-Teacher problem with weighted time slots and proofs that it is NP-complete problem. Branch and bound scheme and two approaches to distribute the simulated annealing are proposed. Empirical evaluation of described methods is conducted in an elementary school computer laboratory. Methods: Simulated annealing implementation described in literature are adapted for the problem, and prepared for execution in distributed systems. Empirical evaluation is conducted with the real data from Polish elementary school. Results: Proposed branch and bound scheme scales nearly logarithmically with the number of nodes in computing cluster. Proposed parallel simulated annealing models tends to increase solution quality. Conclusions: Despite a significant increase in computing power, computer laboratories are still unprepared for heavy computation. Proposed branch and bound method is infeasible with the real instances. Parallel Moves approach tends to provide better solution at the beginning of execution, but the Multiple Independent Runs approach outruns it after some time. / Sammanhang: De flesta problem med att generera scheman för en skola tillhör klassen av NP-svårt problemen. Komplexitet och praktiskt värde gör att den här typen av problemen forskas med särskild uppmärksamhet på en parallell bearbetning.   Syfte: Detta dokument fokusarar på Klass-Lärare problem med vikter för enskilda tidsluckor och på att visa var ett NP-svårt problem är fullständigt. Branch and bound scheman och två metoder för att distribuera en simulerad glödgning algoritm presenterades. En empirisk analys av beskrivna metoder gjordes i datorlaboratorium i en grundskola. Metod: Implementering av en simulerad glödgning algoritm som beskrivs i litteraturen blev anpassad till ett utvalt problem och distribuerade system. Empirisk utvärdering genomförs med verkliga data från polska grundskolan Resultat: Föreslagit Branch and bound system graderar nästan logaritmiskt antal noder i ett datorkluster. Den simulerade glödgning algoritmen som föreslagits förbättrar lösningarnas kvalitet. Slutsatser: Trots att en betydande ökning med beräkningskraft är inte datasalar i skolor anpassad till avancerade beräkningar. Användning av den Branch and Bound föreslagna metoden till praktiska problem är omöjlig i praktiken. En annan föreslagen metod Parallel Moves ger bättre resultat i början av utförandet men Multiple Independent Runs hittar bättre lösningar efter en viss tid.
62

Bibliotheken zur Entwicklung paralleler Algorithmen

Haase, G., Hommel, T., Meyer, A., Pester, M. 30 October 1998 (has links) (PDF)
The purpose of this paper is to supply a summary of library subroutines and functions for parallel MIMD computers. The subroutines have been developed at the University of Chemnitz during a period of the last five years. In detail, they are concerned with vector operations, inter-processor communication and simple graphic output to workstations. One of the most valuable features is the machine-independence of the communication subroutines proposed in this paper for a hypercube topology of the parallel processors (excepting a kernel of only two primitive system-dependend operations). They were implemented and tested for different hardware and operating systems including transputer, nCube, KSR, PVM. The vector subroutines are optimized by the use of C language and enrolled loops (BLAS1-like). The paper includes hints for using the libraries with both Fortran and C programs.
63

Parallelization of multi-grid methods based on domain decomposition ideas

Jung, M. 30 October 1998 (has links) (PDF)
In the paper, the parallelization of multi-grid methods for solving second-order elliptic boundary value problems in two-dimensional domains is discussed. The parallelization strategy is based on a non-overlapping domain decomposition data structure such that the algorithm is well-suited for an implementation on a parallel machine with MIMD architecture. For getting an algorithm with a good paral- lel performance it is necessary to have as few communication as possible between the processors. In our implementation, communication is only needed within the smoothing procedures and the coarse-grid solver. The interpolation and restriction procedures can be performed without any communication. New variants of smoothers of Gauss-Seidel type having the same communication cost as Jacobi smoothers are presented. For solving the coarse-grid systems iterative methods are proposed that are applied to the corresponding Schur complement system. Three numerical examples, namely a Poisson equation, a magnetic field problem, and a plane linear elasticity problem, demonstrate the efficiency of the parallel multi- grid algorithm.
64

Train Re-scheduling : A Massively Parallel Approach Using CUDA

Petersson, Anton January 2015 (has links)
Context. Train re-scheduling during disturbances is a time-consuming task. Modified schedules need to be found, so that trains can meet in suitable locations and delays minimized. Domino effects are difficult to manage. Commercial optimization software has been shown to find optimal solutions, but modied schedules need to be found quickly. Therefore, greedy depth-first algorithms have been designed to find solutions within a limited time-frame. Modern GPUs have a high computational capacity, and have become easier to use for computations unrelated to computer graphics with the development of technologies such as CUDA and OpenCL. Objectives. We explore the feasibility of running a re-scheduling algorithm developed specifically for this problem on a GPU using the CUDA toolkit. The main objective is to find a way of exploiting the computational capacity of modern GPUs to find better re-scheduling solutions within a limited time-frame. Methods. We develop and adapt a sequential algorithm for use on a GPU and run multiple experiments using 16 disturbance scenarios on the single-tracked iron ore line in northern Sweden. Results. Our implementation succeeds in finding re-scheduling solutions without conflicts for all 16 scenarios. The algorithm visits on average 7 times more nodes per time unit than the sequential CPU algorithm when branching at depth 50, and 4 times more when branching at depth 200. Conclusions. The computational performance of our parallel algorithm is promising but the approach is not complete. Our experiments only show that multiple solution branches can be explored fast in parallel, but not how to construct a high level algorithm that systematically searches for better schedules within a certain time limit. Further research is needed for that. We also find that multiple threads explore redundant solutions in our approach.
65

Aplicações de computação paralela em otimização contínua / Applications of parallel computing in continuous optimization

Ricardo Luiz de Andrade Abrantes 22 February 2008 (has links)
No presente trabalho, estudamos alguns conceitos relacionados ao desenvolvimento de programas paralelos, algumas formas de aplicar computação paralela em métodos de otimização contínua e dois métodos que envolvem o uso de otimização. O primeiro método que apresentamos, chamado PUMA (Pointwise Unconstrained Minimization Approach), recupera constantes óticas e espessuras de filmes finos a partir de valores de transmitância. O problema de recuperação é modelado como um problema inverso e resolvido com auxílio de um método de otimização. Através da paralelização do PUMA viabilizamos a recuperação empírica de constantes e espessuras de sistemas compostos por até dois filmes sobrepostos. Relatamos aqui os resultados obtidos e discutimos o desempenho da versão paralela e a qualidade dos resultados obtidos. O segundo método estudado tem o objetivo de obter configurações iniciais de moléculas para simulações de dinâmica molecular e é chamado PACKMOL. O problema de obter uma configuração inicial de moléculas é modelado como um problema de empacotamento e resolvido com o auxílio de um método de otimização. Construímos uma versão paralela do PACKMOL e mostramos os ganhos de desempenho obtidos com a paralelização. / In this work we studied some concepts of parallel programming, some ways of using parallel computing in continuous optimization methods and two optimization methods. The first method we present is called PUMA (Pointwise Unconstrained Minimization Approach), and it retrieves optical constants and thicknesses of thin films from transmitance data. The problem of retrieve thickness and optical constants is modeled as an inverse problem and solved with aid of an optimization method. Through the paralelization of PUMA we managed to retrieve optical constants and thicknesses of thin films in structures with one and two superposed films. We describe some results and discuss the performance of the parallel PUMA and the quality of the retrievals. The second studied method is used to build an initial configuration of molecules for molecular dynamics simulations and it is called PACKMOL. The problem of create an initial configuration of molecules is modeled as a packing problem and solved with aid of an optimization method. We developed a parallel version of PACKMOL and we show the obtained performance gains.
66

A parallel algorithm to solve the mathematical problem "double coset enumeration of S₂₄ over M₂₄"

Harris, Elena Yavorska 01 January 2003 (has links)
This thesis presents and evaluates a new parallel algorithm that computes all single cosets in the double coset M₂₄ P M₂₄, where P is a permutation on n points of a certain cycle structure, and M₂₄ is the Mathieu group related to a Steiner system S(5, 8, 24) as its automorphism group. The purpose of this work is not to replace the existing algorithms, but rather to explore a possibility to extend calculations of single cosets beyond the limits encountered when using currently available methods.
67

A Speculative Approach to Parallelization in Particle Swarm Optimization

Gardner, Matthew J. 26 April 2011 (has links) (PDF)
Particle swarm optimization (PSO) has previously been parallelized primarily by distributing the computation corresponding to particles across multiple processors. In this thesis we present a speculative approach to the parallelization of PSO that we refer to as SEPSO. In our approach, we refactor PSO such that the computation needed for iteration t+1 can be done concurrently with the computation needed for iteration t. Thus we can perform two iterations of PSO at once. Even with some amount of wasted computation, we show that this approach to parallelization in PSO often outperforms the standard parallelization of simply adding particles to the swarm. SEPSO produces results that are exactly equivalent to PSO; this is not a new algorithm or variant, only a new method of parallelization. However, given this new parallelization model we can relax the requirement of exactly reproducing PSO in an attempt to produce better results. We present several such relaxations, including keeping the best speculative position evaluated instead of the one corresponding to the standard behavior of PSO, and speculating several iterations ahead instead of just one. We show that these methods dramatically improve the performance of parallel PSO in many cases, giving speed ups of up to six times compared to previous parallelization techniques.
68

Asynchronous Algorithms for Large-Scale Optimization : Analysis and Implementation

Aytekin, Arda January 2017 (has links)
This thesis proposes and analyzes several first-order methods for convex optimization, designed for parallel implementation in shared and distributed memory architectures. The theoretical focus is on designing algorithms that can run asynchronously, allowing computing nodes to execute their tasks with stale information without jeopardizing convergence to the optimal solution. The first part of the thesis focuses on shared memory architectures. We propose and analyze a family of algorithms to solve an unconstrained, smooth optimization problem consisting of a large number of component functions. Specifically, we investigate the effect of information delay, inherent in asynchronous implementations, on the convergence properties of the incremental prox-gradient descent method. Contrary to related proposals in the literature, we establish delay-insensitive convergence results: the proposed algorithms converge under any bounded information delay, and their constant step-size can be selected independently of the delay bound. Then, we shift focus to solving constrained, possibly non-smooth, optimization problems in a distributed memory architecture. This time, we propose and analyze two important families of gradient descent algorithms: asynchronous mini-batching and incremental aggregated gradient descent. In particular, for asynchronous mini-batching, we show that, by suitably choosing the algorithm parameters, one can recover the best-known convergence rates established for delay-free implementations, and expect a near-linear speedup with the number of computing nodes. Similarly, for incremental aggregated gradient descent, we establish global linear convergence rates for any bounded information delay. Extensive simulations and actual implementations of the algorithms in different platforms on representative real-world problems validate our theoretical results. / <p>QC 20170317</p>
69

Trial Division : Improvements and Implementations / Trial Division : Förbättringar och Implementationer

Hedenström, Felix January 2017 (has links)
Trial division is possibly the simplest algorithm for factoring numbers.The problem with Trial division is that it is slow and wastes computationaltime on unnecessary tests of division. How can this simple algorithms besped up while still being serial? How does this algorithm behave whenparallelized? Can a superior serial and a parallel version be combined intoan even more powerful algorithm?To answer these questions the basics of trial divisions where researchedand improvements where suggested. These improvements where later im-plemented and tested by measuring the time it took to factorize a givennumber.A version using a list of primes and multiple threads turned out to bethe fastest for numbers larger than 10 10 , but was beaten when factoringlower numbers by its serial counterpart. A problem was detected thatcaused the parallel versions to have long allocation times which slowedthem down, but this did not hinder them much. / Trial division är en av de enklaste algoritmerna när det kommer till attfaktorisera tal. Problemet med trial division är att det är relativt långsamtoch att det gör onödiga beräkningar. Hur kan man göra denna algoritmsnabbare utan att inte göra den seriell? Hur beter sig algoritmen när denär parallelliserad? Kan en förbättrad seriell sedan bli parallelliserad?För att besvara dessa frågor studerades trial division och dess möjligaförbättringar. Dessa olika förbättringar implementerades i form av flerafunktioner som sedan och testades mot varandra.Den snabbaste versionen byggde på att använda en lista utav primtaloch trådar för att minimera antal ’trials’ samt att dela upp arbetet. Denvar dock inte alltid snabbast, då den seriella versionen som också användeen lista av primtal var snabbare för siffror under 10 10 . Sent upptäck-tes ett re-allokeringsproblem med de parallella implementationerna, meneftersom de ändå var snabbare fixades inte detta problemet.
70

Integrated compiler optimizations for tensor contractions

Gao, Xiaoyang 07 January 2008 (has links)
No description available.

Page generated in 0.0775 seconds