Global ETD Search

101	On the Extensions of the Predictor-Corrector Proximal Multiplier (PCPM) Algorithm and Their Applications Run Chen (9739499) 15 December 2020 (has links) <div>Many real-world application problems can be modeled mathematically as constrained convex optimization problems. The scale of such problems can be extremely large, posing significant challenges to traditional centralized algorithms and calling for efficient and scalable distributed algorithms. However, most of the existing works on distributed optimization have been focusing on block-separable problems with simple, linear constraints, such as the consensus-type constraints. The focus of this dissertation is to propose distributed algorithms to solve (possibly non-separable) large-scale optimization problems with more complicated constraints with parallel updating (aka in Jacobi fashion), instead of sequential updating in the form of Gauss-Seidel iterations. In achieving so, this dissertation extends the predictor corrector proximal multiplier method (PCPM) to address three issues when solving a large-scale constrained convex optimization problem: (i) non-linear coupling constraints; (ii) asynchronous iterative scheme; (iii) non-separable objective function and coupling constraints. </div><div><br></div><div>The idea of the PCPM algorithm is to introduce a predictor variable for the Lagrangian multiplier to avoid the augmented term, hence removing the coupling of block variables while still achieving convergence without restrictive assumptions. Building upon this algorithmic idea, we propose three distributed algorithms: (i) N-block PCPM algorithm for solving N-block convex optimization problems with both linear and nonlinear coupling constraints; (ii) asynchronous N-block PCPM algorithm for solving linearly constrained N-block convex optimization problems; (iii) a distributed algorithm, named PC<sup>2</sup>PM, for solving large-scale convex quadratically constrained quadratic programs (QCQPs). The global convergence is established for each of the three algorithms. Extensive numerical experiments, using various data sets, are conducted on either a single-node computer or a multi-node computer cluster through message passing interface (MPI). Numerical results demonstrate the efficiency and scalability of the proposed algorithms.</div><div><br></div><div>A real application of the N-block PCPM algorithm to solve electricity capacity expansion models is also studied in this dissertation. A hybrid scenario-node-realization decomposition method, with extended nonanticipativity constraints, is proposed for solving the resulting large-scale optimization problem from a multi-scale, multi-stage stochastic program under various uncertainties with different temporal scales. A technique of orthogonal projection is exploited to simplify the iteration steps, which leads to a simplified N-block PCPM algorithm amenable to massive parallelization during iterations. Such an algorithm exhibits much more scalable numerical performance when compared with the widely used progressive hedging approach (PHA) for solving stochastic programming problems.</div> Operations Research Optimisation Large-scale optimization Distributed algorithm Parallel Computing
102	Parallelization of Dataset Transformation with Processing Order Constraints in Python / Parallelisering av datamängdstransformation med ordningsbegränsningar i Python Gramfors, Dexter January 2016 (has links) Financial data is often represented with rows of values, contained in a dataset. This data needs to be transformed into a common format in order for comparison and matching to be made, which can take a long time for larger datasets. The main goal of this master’s thesis is speeding up these transformations through parallelization using Python multiprocessing. The datasets in question consist of several rows representing trades, and are transformed into a common format using rules known as filters. In order to devise a parallelization strategy, the filters were analyzed in order to find ordering constraints, and the Python profiler cProfile was used to find bottlenecks and potential parallelization points. This analysis resulted in the use of a task-based approach for the implementation, in which the transformation was divided into an initial sequential pre-processing step, a parallel step where chunks of several trade rows were distributed among workers, and a sequential post processing step. The implementation was tested by transforming four datasets of differing sizes using up to 16 workers, and execution time and memory consumption was measured. The results for the tiny, small, medium, and large datasets showed a speedup of 0.5, 2.1, 3.8, and 4.81. They also showed linearly increasing memory consumption for all datasets. The test transformations were also profiled in order to understand the parallel program’s behaviour for the different datasets. The experiments gave way to the conclusion that dataset size heavily influences the speedup, partly because of the fact that the sequential parts become less significant. In addition, the large memory increase for larger amount of workers is noted as a major downside of multiprocessing when using caching mechanisms, as data is duplicated instead of shared. This thesis shows that it is possible to speed up the dataset transformations using chunks of rows as tasks, though the speedup is relatively low. / Finansiell data representeras ofta med rader av värden, samlade i en datamängd. Denna data måste transformeras till ett standardformat för att möjliggöra jämförelser och matchning. Detta kan ta lång tid för stora datamängder. Huvudmålet för detta examensarbete är att snabba upp dessa transformationer genom parallellisering med hjälp av Python-modulen multiprocessing. Datamängderna omvandlas med hjälp av regler, kallade filter. Dessa filter analyserades för att identifiera begränsningar på ordningen i vilken datamängden kan behandlas, och därigenom finna en parallelliseringsstrategi. Python-profileraren cProfile an- vändes även för att hitta potentiella parallelliseringspunkter i koden. Denna analys resulterade i användandet av ett “task”-baserat tillvägagångssätt, där transformationen delades in i ett sekventiellt pre-processingsteg, ett parallelt steg där grupper av rader distribuerades ut bland arbetar-processer, och ett sekventiellt post-processingsteg. Implementationen testades genom transformation av fyra datamängder av olika storlekar, med upp till 16 arbetarprocesser. Resultaten för de fyra datamängderna var en speedup på 0.5, 2.1, 3.8 respektive 4.81. En linjär ökning i minnesanvändning uppvisades även. Experimenten resulterade i slutsatsen att datamängdens storlek var en betydande faktor i hur mycket speedup som uppvisades, delvis på grund av faktumet att de sekventiella delarna tar upp en mindre del av programmet. Den stora minnesåtgången noterades som en nackdel med att använda multiprocessing i kombination med cachning, på grund av duplicerad data. Detta examensarbete visar att det är möjligt att snabba upp datamängdstransformation genom att använda radgrupper som tasks, även om en relativt låg speedup uppvisades. parallel computing multicore Python Computer Sciences Datavetenskap (datalogi)
103	Accelerator-based look-up table for coarse-grained molecular dynamics computations Gangopadhyay, Ananya 13 May 2019 (has links) Molecular Dynamics (MD) is a simulation technique widely used by computational chemists and biologists to simulate and observe the physical properties of a system of particles or molecules. The method provides invaluable three-dimensional structural and transport property data for macromolecules that can be used in applications such as the study of protein folding and drug design. The most time-consuming and inefficient routines in MD packages, particularly for large systems, are the ones involving the computation of intermolecular energy and forces for each molecule. Many fully atomistic systems such as CHARMM and NAMD have been refined over the years to improve their efficiency. But, simulating complex long-time events such as protein folding remains out reach for atomistic simulations. The consensus view amongst computational chemists and biologists is that the development of a coarse-grained (CG) MD package will make the long timescales required for protein folding simulations possible. The shortcoming of this method remains an inability to produce accurate dynamics and results that are comparable with atomistic simulations. It is the objective of this dissertation to develop a coarse-grained method that is computationally faster than atomistic simulations, while being dynamically accurate enough to produce structural and transport property data comparable to results from the latter. Firstly, the accuracy of the Gay-Berne potential in modelling liquid benzene in comparison to fully atomistic simulations was investigated. Following this, the speed of a course-grained condensed phase benzene simulation employing a Gay-Berne potential was compared with that of a fully atomistic simulation. While coarse-graining algorithmically reduces the total number of particles in consideration, the execution time and efficiency scales poorly for large systems. Both fully-atomistic and coarse-grained developers have accelerated packages using high-performance parallel computing platforms such as multi-core CPU clusters, Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs). GPUs have especially gained popularity in recent years due to their massively parallel architecture on a single chip, making them a cheaper alternative to a CPU cluster. Their relatively shorter development time also gives them an advantage over FPGAs. NAMD is perhaps the most popular MD package that employs efficient use of a single GPU or a multi-GPU cluster to conduct simulations. The Scientific Computing Research Unit’s in-house generalised CG code, the Free Energy Force Induced (FEFI) coarse-grained MD package, was accelerated using a GPU to investigate the achievable speed-up in comparison to the CPU algorithm. To achieve this, a parallel version of the sequential force routine, i.e. the computation of the energy, force and torque per molecule, was developed and implemented on a GPU. The GPU-accelerated FEFI package was then used to simulate benzene, which is almost exclusively governed by van der Waal’s forces (i.e. dispersion effects), using the parameters for the Gay-Berne potential from a study by Golubkov and Ren in their work “Generalized coarse-grained model based on point multipole and Gay-Berne potentials”. The coarse-grained condensed phase structural properties, such as the radial and orientational distribution functions, proved to be inaccurate. Further, the transport properties such as diffusion were significantly more unsatisfactory compared to a CHARMM simulation. From this, a conclusion was reached that the Gay-Berne potential was not able to model the subtle effects of dispersion as observed in liquid benzene. In place of the analytic Gay-Berne potential, a more accurate approach would be to use a multidimensional free energy-based potential. Using the Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) method, a four-dimensional Free Energy Volume (FEV) for two interacting benzene molecules was computed for liquid benzene. The focal point of this dissertation was to use this FEV as the coarse-grained interaction potential in FEFI to conduct CG simulations of condensed phase liquid benzene. The FEV can act as a numerical potential or Look-Up Table (LUT) from which the interaction energy and four partial derivatives required to compute the forces and torques can be obtained via numerical methods at each step of the CG MD simulation. A significant component of this dissertation was the development and implementation of four-dimensional LUT routines to use the FEV for accurate condensed phase coarse-grained simulations. To compute the energy and partial derivatives between the grid points of the surface, an interpolation algorithm was required. A four-dimensional cubic B-spline interpolation was developed because of the method’s superior accuracy and resistance to oscillations compared with other polynomial interpolation methods. When The algorithm’s introduction into the FEFI CG MD package for CPUs exhausted the single-core CPU architecture with its large number of interpolations for each MD step. It was therefore impractical for the high throughput interpolation required for MD simulations. The 4D cubic B-spline algorithm and the LUT routine were then developed and implemented on a GPU. Following evaluation, the LUT was integrated into the FEFI MD simulation package. A FEFI CG simulation of liquid benzene was run using the 4D FEV for a benzene molecular pair as the numerical potential. The structural and transport properties outperformed the analytical Gay-Berne CG potential, more closely approximating the atomistic predicted properties. The work done in this dissertation demonstrates the feasibility of a coarse-grained simulation using a free energy volume as a numerical potential to accurately simulate dispersion effects, a key feature needed for protein folding.
104	Simulation des réseaux à grande échelle sur les architectures de calculs hétérogènes / Large-scale network simulation over heterogeneous computing architecture Ben Romdhanne, Bilel 16 December 2013 (has links) La simulation est une étape primordiale dans l'évolution des systèmes en réseaux. L’évolutivité et l’efficacité des outils de simulation est une clef principale de l’objectivité des résultats obtenue, étant donné la complexité croissante des nouveaux des réseaux sans-fils. La simulation a évènement discret est parfaitement adéquate au passage à l'échelle, cependant les architectures logiciel existantes ne profitent pas des avancées récente du matériel informatique comme les processeurs parallèle et les coprocesseurs graphique. Dans ce contexte, l'objectif de cette thèse est de proposer des mécanismes d'optimisation qui permettent de surpasser les limitations des approches actuelles en combinant l’utilisation des ressources de calcules hétérogène. Pour répondre à la problématique de l’efficacité, nous proposons de changer la représentation d'événement, d'une représentation bijective (évènement-descripteur) à une représentation injective (groupe d'évènements-descripteur). Cette approche permet de réduire la complexité de l'ordonnancement d'une part et de maximiser la capacité d'exécuter massivement des évènements en parallèle d'autre part. Dans ce sens, nous proposons une approche d'ordonnancement d'évènements hybride qui se base sur un enrichissement du descripteur pour maximiser le degré de parallélisme en combinons la capacité de calcule du CPU et du GPU dans une même simulation. Les résultats comparatives montre un gain en terme de temps de simulation de l’ordre de 100x en comparaison avec une exécution équivalente sur CPU uniquement. Pour répondre à la problématique d’évolutivité du système, nous proposons une nouvelle architecture distribuée basée sur trois acteurs. / The simulation is a primary step on the evaluation process of modern networked systems. The scalability and efficiency of such a tool in view of increasing complexity of the emerging networks is a key to derive valuable results. The discrete event simulation is recognized as the most scalable model that copes with both parallel and distributed architecture. Nevertheless, the recent hardware provides new heterogeneous computing resources that can be exploited in parallel.The main scope of this thesis is to provide a new mechanisms and optimizations that enable efficient and scalable parallel simulation using heterogeneous computing node architecture including multicore CPU and GPU. To address the efficiency, we propose to describe the events that only differs in their data as a single entry to reduce the event management cost. At the run time, the proposed hybrid scheduler will dispatch and inject the events on the most appropriate computing target based on the event descriptor and the current load obtained through a feedback mechanisms such that the hardware usage rate is maximized. Results have shown a significant gain of 100 times compared to traditional CPU based approaches. In order to increase the scalability of the system, we propose a new simulation model, denoted as general purpose coordinator-master-worker, to address jointly the challenge of distributed and parallel simulation at different levels. The performance of a distributed simulation that relies on the GP-CMW architecture tends toward the maximal theoretical efficiency in a homogeneous deployment. The scalability of such a simulation model is validated on the largest European GPU-based supercomputer Calcul parallèle CPU/GPU Parallel computing CPU/GPU
105	Real-Time Prediction-Driven Dynamics Simulation to Mitigate Frame Time Variation Buck, Mackinnon A 01 December 2021 (has links) (PDF) Real-time physics engines have seen recent performance improvements through techniques like hardware acceleration and artificial intelligence. However, state of the art physics simulation technology fails to account for the variation in simulation complexity over time. Sudden increases in contact frequency between simulated bodies can momentarily increase the processing time per frame. To solve this, we present a prediction-driven real-time dynamics method that uses a memory-efficient graph-based state buffer to minimize the cost of mispredictions. This buffer, which is generated by a separate thread running the physics pipeline, allows physics computation to temporarily run slower than real-time without affecting the frame rate of the host application. The main thread, whose role in dynamics computation gets limited to querying the simulation state and regenerating mispredicted state, sees a significant reduction in time spent per frame on dynamics computation when our multi-threaded prediction pipeline is enabled. Thus, our technique enables interactive multimedia applications to increase the computational budget for graphics at no cost perceptible to the end user. Furthermore, our method guarantees determinism and low input latency, making it suitable in competitive games and other real-time interactive applications. We also provide a C++ API to integrate custom game logic with the prediction engine to further minimize the frequency of mispredictions. Real-Time Physics Engine Physics Simulation Dynamics Parallel Computing
106	Epithelium: The lightweight, customizable epithelial tissue simulator Drag, Melvyn I. 29 May 2015 (has links) No description available. Biology Computer Science Mathematics parallel computing epithelial tissue
107	A High Productivity Framework for Parallel Data Intensive Computing in MATLAB Panuganti, Rajkiran 26 June 2009 (has links) No description available. Computer Science High Performance High Productivity Matlab Parallel Computing Frameworks
108	High-Performance Multi-Transport MPI Design for Ultra-Scale InfiniBand Clusters Koop, Matthew J. 03 September 2009 (has links) No description available. Computer Science MPI InfiniBand Message Passing Cluster Parallel Computing
109	Optimization Frameworks for Discrete Composite Laminate Stacking Sequences Adams, David Bruce 23 August 2005 (has links) Composite panel structure optimization is commonly decomposed into panel optimization subproblems, with specified local loads, resulting in manufacturing incompatibilities between adjacent panel designs. Using genetic algorithms to optimize local panel stacking sequences allows panel populations of stacking sequences to evolve in parallel and send migrants to adjacent panels, so as to blend the local panel designs globally. The blending process is accomplished using the edit distance between individuals of a population and the set of migrants from adjacent panels. The objective function evaluating the fitness of designs is modified according to the severity of mismatches detected between neighboring populations. This lays the ground work for natural evolution to a blended global solution without leaving the paradigm of genetic algorithms. An additional method applied here for constructing globally blended panel designs uses a parallel decomposition antithetical to that of earlier work. Rather than performing concurrent panel genetic optimizations, a single genetic optimization is conducted for the entire structure with the parallelism solely within the fitness evaluations. A guide based genetic algorithm approach is introduced to exclusively generate and evaluate valid globally blended designs, utilizing a simple master-slave parallel implementation, implicitly reducing the size of the problem design space and increasing the quality of discovered local optima. / Ph. D. Blending Decomposition Genetic Algorithms Composite Laminates Combinatorial Optimization Parallel Computing
110	Building Maze Solutions with Computational Dreaming Jackson, Scott Michael 25 July 2014 (has links) Modern parallel computing techniques are subject to poor scalability. Their performance tends to suffer diminishing returns and even losses with increasing parallelism. Some methods of intelligent computing, such as neural networks and genetic algorithms, lend themselves well to massively parallel systems but come with other drawbacks that can limit their usefulness such as the requirement of a training phase and/or sensitivity to randomness. This thesis investigates the feasibility of a novel method of intelligent parallel computing by implementing a true multiple instruction stream, single data stream (MISD) computing system that is theoretically nearly perfectly scalable. Computational dreaming (CD) is inspired by the structure and dreaming process of the human brain. It examines previously observed input data during a 'dream phase' and is able to develop and select a simplified model to use during the day phase of computation. Using mazes as an example problem space, a CD simulator is developed and successfully used to demonstrate the viability and robustness of CD. Experiments that focused on CD viability resulted in the CD system solving 15% of mazes (ranging from small and simple to large and complex) compared with 2.2% solved by random model selection. Results also showed that approximately 50% of successful solutions generated match up with those that would be generated by algorithms such as depth first search and Dijkstra's algorithm. Experiments focusing on robustness performed repeated trials with identical parameters. Results demonstrated that CD is capable of achieving this result consistently, solving over 32% of mazes across 10 trials compared to only 3.6% solved by random model selection. A significant finding is that CD does not get stuck on local minima, always converging on a solution model. Thus, CD has the potential to enable significant contributions to computing by potentially finding elegant solutions to, for example, NP-hard or previously intractable problems. / Master of Science Parallel computing computational dreaming MISD bio-inspired computer architecture

Search results