Global ETD Search

271	Parallelizing Tabu Search Based Optimization Algorithm on GPUs Malleypally, Vinaya 14 March 2018 (has links) There are many combinatorial optimization problems such as traveling salesman problem, quadratic-assignment problem, flow shop scheduling, that are computationally intractable. Tabu search based simulated annealing is a stochastic search algorithm that is widely used to solve combinatorial optimization problems. Due to excessive run time, there is a strong demand for a parallel version that can be applied to any problem with minimal modifications. Existing advanced and/or parallel versions of tabu search algorithms are specific to the problem at hand. This leads to a drawback of optimization only for that particular problem. In this work, we propose a parallel version of tabu search based SA on the Graphics Processing Unit (GPU) platform. We propose two variants of the algorithm based on where the tabu list is stored (global vs. local). In the first version, the list is stored in the global shared memory such that all threads can access this list. Multiple random walks in solution space are carried out. Each walk avoids the moves made in rest of the walks due to their access to global tabu list at the expense of more time. In the second version, the list is stored at the block level and is shared by only the block threads. Groups of random walks are performed in parallel and a walk in a group avoids the moves made by the rest of the walks within that group due to their access to shared local tabu list. This version is better than the first version in terms of execution time. On the other hand, the first version finds the global optima more often. We present experimental results for six difficult optimization functions with known global optima. Compared to the CPU implementation with similar workload, the proposed GPU versions are faster by approximately three orders of magnitude and often find the global optima. Hill Climbing Simulated Annealing Parallel Computing CUDA Tabu List Aspiration Criteria Neighborhood Search Global Optimum Local Minima Computer Sciences
272	Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures Salinger, Alejandro January 2013 (has links) Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits. Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case. Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting, we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use. This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios. The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms. Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming. computer science parallel computing algorithms multi-core models of computation low degree parallelism paging cache GPU LoPRAM online algorithms Computer Science
273	Pricing a Multi-Asset American Option in a Parallel Environment by a Finite Element Method Approach Kaya, Deniz January 2011 (has links) There is the need for applying numerical methods to problems that cannot be solved analytically and as the spatial dimension of the problem is increased the need for computational recourses increase exponentially, a phenomenon known as the “curse of dimensionality”. In the Black-Scholes-Merton framework the American option pricing problem has no closed form solution and a numerical procedure has to be employed for solving a PDE. The multi-asset American option introduces challenging computational problems, since for every added asset the dimension of the PDE is increased by one. One way to deal with the curse of dimensionality is threw parallelism. Here the finite element method-of-lines is used for pricing a multi-asset American option dependent on up to four assets in a parallel environment. The problem is also solved with the PSOR method giving a accurate benchmark used for comparison. In finance the put option is one of the most fundamental derivatives since it is basically asset-value insurance and a lot of research is done in the field of quantitative finance on accurate and fast pricing techniques for the multi-dimensional case. “What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.” Norbert Wiener “As soon as an Analytical Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will then arise – by what course of calculation can these results be arrived at by the machine in the shortest time?” Charles Babbage multi-asset American options Parallel Computing Finite Element Method-of-lines
274	Gas-kinetic Methods For 3-d Inviscid And Viscous Flow Solutions On Unstructured/hybrid Grids Ilgaz, Murat 01 February 2007 (has links) (PDF) In this thesis, gas-kinetic methods for inviscid and viscous flow simulations are developed. Initially, the finite volume gas-kinetic methods are investigated for 1-D flows as a preliminary study and are discussed in detail from theoretical and numerical points of view. The preliminary results show that the gas-kinetic methods do not produce any unphysical flow phenomena. Especially the Gas-Kinetic BGK method, which takes into account the particle collisions, predicts compressible flows accurately. The Gas-Kinetic BGK method is then extended for the solution of 2-D and 3-D inviscid and viscous flows on unstructured/hybrid grids. The computations are performed in parallel. Various inviscid and viscous test cases are considered and it is shown that the Gas-Kinetic BGK method predicts both inviscid and viscous flow fields accurately. The implementation of hybrid grids for viscous flows reduces the overall number of grid cells while enabling the resolution of boundary layers. The parallel computations significantly improve the computation time of the Gas-Kinetic BGK method which, in turn, enable the method for the computation of practical aerodynamic flow problems.
275	Accuracy And Efficiency Improvements In Finite Difference Sensitivity Calculations Ozhamam, Murat 01 December 2007 (has links) (PDF) Accuracy of the finite difference sensitivity calculations are improved by calculating the optimum finite difference interval sizes. In an aerodynamic inverse design algorithm, a compressor cascade geometry is perturbed by shape functions and finite differences sensitivity derivatives of the flow variables are calculated with respect to the base geometry flow variables. Sensitivity derivatives are used in an optimization code and a new airfoil is designed verifying given design characteristics. Accurate sensitivities are needed for optimization process. In order to find the optimum finite difference interval size, a method is investigated. Convergence error estimation techniques in iterative solutions and second derivative estimations are investigated to facilitate this method. For validation of the method, analytical sensitivity calculations of Euler equations are used and several applications are performed. Efficiency of the finite difference sensitivity calculations is improved by parallel computing. Finite difference sensitivity calculations are independent tasks in an inverse aerodynamic design algorithm and can be computed separately. Sensitivity calculations are performed on parallel processors and computing time is decreased.
276	Vessel Segmentation Using Shallow Water Equations Nar, Fatih 01 May 2011 (has links) (PDF) This thesis investigates the feasibility of using fluid flow as a deformable model for segmenting vessels in 2D and 3D medical images. Exploiting fluid flow in vessel segmentation is biologically plausible since vessels naturally provide the medium for blood transportation. Fluid flow can be used as a basis for powerful vessel segmentation because streaming fluid regions can merge and split providing topological adaptivity. In addition, the fluid can also flow through small gaps formed by imaging artifacts building connections between disconnected areas. In our study, due to their simplicity, parallelism, and low computational cost compared to other fluid simulation methods, linearized shallow water equations (LSWE) are used. The method developed herein is validated using synthetic data sets, two clinical datasets, and publicly available simulated datasets which contain Magnetic Resonance Angiography (MRA) images, Magnetic Resonance Venography (MRV) images and retinal angiography images. Depending on image size, one to two order of magnitude speed ups are obtained with developed parallel implementation using Nvidia Compute Unified Device Architecture (CUDA) compared to single-core and multicore CPU implementation. QA Mathematical Analysis 276-280
277	Proceedings of the 4th Many-core Applications Research Community (MARC) Symposium January 2012 (has links) In continuation of a successful series of events, the 4th Many-core Applications Research Community (MARC) symposium took place at the HPI in Potsdam on December 8th and 9th 2011. Over 60 researchers from different fields presented their work on many-core hardware architectures, their programming models, and the resulting research questions for the upcoming generation of heterogeneous parallel systems. Mehrkernsysteme Verbindungsnetzwerke Prozessoren paralleles Rechnen Virtualisierung many-core multi-core interconnect processor hardware parallel computing virtualization Data processing Computer science
278	Optimistic semantic synchronization Sreeram, Jaswanth 06 October 2011 (has links) Within the last decade multi-core processors have become increasingly commonplace with the power and performance demands of modern real-world programs acting to accelerate this trend. The rapid advancements in designing and adoption of such architectures mean that there is a serious need for programming models that allow the development of correct parallel programs that execute efficiently on these processors. A principle problem in this regard is that of efficiently synchronizing concurrent accesses to shared memory. Traditional solutions to this problem are either inefficient but provide programmability (coarse-grained locks) or are efficient but are not composable and very hard to program and verify (fine-grained locks). Optimistic Transactional Memory systems provide many of the composability and programmabillity advantages of coarse-grained locks and good theoretical scaling but several studies have found that their performance in practice for many programs remains quite poor primarily because of the high overheads of providing safe optimism. Moreover current transactional memory models remain rigid - they are not suited for expressing some of the complex thread interactions that are prevalent in modern parallel programs. Moreover, the synchronization achieved by these transactional memory systems is at the physical or memory level. This thesis advocates a position that memory synchronization problem for threads should be modeled and solved in terms of synchronization of underlying program values which have semantics associated with them. It presents optimistic synchronization techniques that address the semantic synchronization requirements of a parallel program instead. These techniques include methods to 1) enable optimistic transactions to recover from expensive sharing conflicts without discarding all the work made possible by the optimism 2) enable a hybrid pessimistic-optimistic form of concurrency control that lowers overheads 3) make synchronization value-aware and semantics-aware 4) enable finer grained consistency rules (than allowed by traditional optimistic TM models) therefore avoiding conflicts that do not enforce any semantic property required by the program. In addition to improving the expressibility of specific synchronization idioms all these techniques are also effective in improving parallel performance. This thesis formulates these techniques in terms of their purpose, the extensions to the language, the compiler as well as to the concurrency control runtime necessary to implement them. It also briefly presents an experimental evaluation of each of them on a variety of modern parallel workloads. These experiments show that these techniques significantly improve parallel performance and scalability over programs using state-of-the-art optimistic synchronization methods. Synchronization Transactional memory Parallel computing Computer architecture Multiprocessors Parallel programming (Computer science) Parallel programs (Computer programs)
279	Automatic Parallel Memory Address Generation for Parallel DSP Computing Dai, Jiehua January 2008 (has links) <p>The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access.</p><p>This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory.</p><p>According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses.</p><p>A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible.</p><p>The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.</p> DSP Parallel Computing Parallel Vector (scratch pad) Memories Memory access Permutation Coding Template XML Computer engineering Datorteknik
280	Behandlung gekrümmter Oberflächen in einem 3D-FEM-Programm für Parallelrechner Pester, M. 30 October 1998 (has links) (PDF) The paper presents a method for generating curved surfaces of 3D finite element meshes by mesh refinement starting with a very coarse grid. This is useful for parallel implementations where the finest meshes should be computed and not read from large files. The paper deals with simple geometries as sphere, cylinder, cone. But the method may be extended to more complicated geometries. (with 45 figures) finite element meshes curved surfaces mesh refinement parallel computing MSC 65Y05 MSC 65N30 ddc:510 ddc:004

Search results