Global ETD Search

1	Optimizing parallel simulation of multi-core system Dong, Zhenjiang 27 May 2016 (has links) Multi-core design for CPU is the recent trend and we believe the trend will continue in near future. Researchers and industry architects utilize simulation to evaluate their designs and gain a certain level of confidence before manufacturing the actual products. Due to the fact that modern multi-core systems are complex, traditional sequential simulation can hit the bottlenecks in terms of execution time. To handle the complexity, Parallel Discrete Event Simulation (PDES) programs are employed. PDES program with well-designed partitioning schemes, synchronization algorithm and other optimizations can take advantage of the parallel hardware and achieve scalability for the simulation of multi-core systems. The objective of this dissertation is to design, develop, test and evaluate a variety of technologies to improve the performance and efficiency of parallel simulation of multi-core systems. The technologies include a general guide for partitioning schemes, an efficient front-end for timing-directed simulation, and a new conservative synchronization algorithm. Parallel simulation Multicore-system
2	Adaptive techniques for BSP Time Warp Low, Malcolm Yoke Hean January 2002 (has links) Parallel simulation is a well developed technique for executing large and complex simulation models in order to obtain simulation output for analysis within an acceptable time frame. The main contribution of this thesis is the development of different adaptive techniques to improve the consistency, performance and resilience of the BSP Time Warp as a general purpose parallel simulation protocol. We first study the problem of risk hazards in the BSP Time Warp optimistic simulation protocols. Successive refinements to the BSP Time Warp protocol are carried out to eliminate errors in simulation execution due to different risk hazards. We show that these refinements can be incorporated into the BSP Time Warp protocol with minimal performance degradation. We next propose an adaptive scheme for the BSP Time Warp algorithm that automatically throttles the number of events to be executed per superstep. We show that the scheme, operating in a shared memory environment, can minimize computation load-imbalance and rollback overhead at the expense of incurring higher synchronization cost. The next contribution of this thesis is the study of different techniques for dynamic load-balancing and process migration for Time Warp on a cluster of workstations. We propose different dynamic load-balancing algorithms for BSP Time Warp that seek to balance both computation workload and communication workload, optimizing lookaheads between processors, as well as manage interruption from external workload. Finally, we propose an adaptive technique for BSP Time Warp that automatically varies the number of processors used for parallel computation based on the characteristics of the underlying parallel computing platform and the simulation workload. 003 Parallel simulation
3	Hardware-based Parallel Simulation of Flexible Manufacturing Systems Xu, Dong 27 August 2001 (has links) This research explores a hardware-based parallel simulation mechanism that can dramatically improve the speed of simulating flexible manufacturing systems (FMS) by applying appropriate enabling hardware technologies. The hardware-based parallel simulation refers to running a simulation on a multi-microprocessor integrated circuit board, called the simulator, which is specifically designed for the purpose of simulating a specific FMS. The board is composed of a collection of micro-emulators capable of mimicking the operation of equipment in FMS such as machining centers, transporters, and load/unload stations. To design possible architectures for the board, a mapping technology is applied by making use of the physical layout information of an FMS. Under such a mapping method, the simulation model is decomposed into a cluster of micro emulator on the board where each workstation is represented by one micro emulator. Three potential architectures for the proposed simulator, namely, the bus-based architecture, the shared-memory based architecture, and the parallel I/O port based architecture, are studied. To provide a suitable parallel computing platform, a prototype simulator based on the combination of the shared-memory and the parallel I/O port architecture is physically built. Besides the development of the hardware simulator, a time scaling simulation method is also developed for execution on the proposed simulator. The method uses the on-board digital clock to synchronize the parallel simulation being performed on different microprocessors. The advantage of the time scaling technology is that the sequence of simulation events is sorted naturally in consistent with the real events. In this way, no entangled waiting is needed as in the conservative parallel simulation methods so as to reduce the synchronization overhead and the danger of having deadlock. Experiments on the prototype simulator show that the time scaling simulation method, combined with the unique hardware features of the FMS specific simulator, achieves a large speedup compared to conventional software-based simulation methods. / Ph. D.
4	Development of a Parallel Electrostatic PIC Code for Modeling Electric Propulsion Pierru, Julien 23 September 2005 (has links) This thesis presents the parallel version of Coliseum, the Air Force Research Laboratory plasma simulation framework. The parallel code was designed to run large simulations on the world fastest supercomputers as well as home mode clusters. Plasma simulations are extremely computationally intensive as they require tracking millions of particles and solving field equations over large domains. This new parallel version will allow Coliseum to run simulations of spacecraft-plasma interactions in domain large enough to reproduce space conditions. The parallel code ran on two of the world fastest supercomputers, the NASA JPL Cosmos supercomputer ranked 37th on the TOP500 list and Virginia Tech's System X, ranked 7th. DRACO, the Virginia Tech PIC module to Coliseum, was modified with parallel algorithms to create a full parallel PIC code. A parallel solver was added to DRACO. It uses a Gauss-Seidel method with SOR acceleration on a Red-Black checkerboard scheme. Timing results were obtained on JPL Cosmos supercomputer to determine the efficiency of the parallel code. Although the communication overhead limits the code's parallel efficiency, the speed up obtained greatly decreases the time required to run the simulations. A speed up of 51 was reached on 128 processors. The parallel code was also used to simulate the plume expansion of an ion thruster array composed of three NSTAR thrusters. Results showed that the multiple beams merge to form a single plume similar to the plume created by a single ion thruster. / Master of Science parallel simulation MPI particle in cell
5	An Adaptive Time Window Algorithm for Large Scale Network Emulation Kodukula, Surya Ravikiran 07 February 2002 (has links) With the continuing growth of the Internet and network protocols, there is a need for Protocol Development Environments. Simulation environments like ns and OPNET require protocol code to be rewritten in a discrete event model. Direct Code Execution Environments (DCEE) solve the Verification and Validation problems by supporting the execution of unmodified protocol code in a controlled environment. Open Network Emulator (ONE) is a system supporting Direct Code Execution in a parallel environment - allowing unmodified protocol code to run on top of a parallel simulation layer, capable of simulating complex network topologies. Traditional approaches to the problem of Parallel Discrete Event Simulation (PDES) broadly fall into two categories. Conservative approaches allow processing of events only after it has been asserted that the event handling would not result in a causality error. Optimistic approaches allow for causality errors and support means of restoring state — i.e., rollback. All standard approaches to the problem of PDES are either flawed by their assumption of existing event patterns in the system or cannot be applied to ONE due to their restricted analysis on simplified models like queues and Petri-nets. The Adaptive Time Window algorithm is a bounded optimistic parallel simulation algorithm with the capability to change the degree of optimism with changes in the degree of causality in the network. The optimism at any instant is bounded by the amount of virtual time called the time window. The algorithm assumes efficient rollback capabilities supported by the â Weaves' framework. The algorithm is reactive and responds to changes in the degree of causality in the system by adjusting the length of its time window. With sufficient history gathered the algorithm adjusts to the increasing causality in the system with a small time window (conservative approach) and increases to a higher value (optimistic approach) during idle periods. The problem of splitting the entire simulation run into time windows of arbitrary length, whereby the total number of rollbacks in the system is minimal, is NP-complete. The Adaptive Time Window algorithm is compared against offline greedy approaches to the NP-complete problem called Oracle Computations. The total number of rollbacks in the system and the total execution time for the Adaptive Time Window algorithm were comparable to the ones for Oracle Computations. / Master of Science Parallel Simulation Time Window Emulation Optimistic Algorithm
6	Large-Scale Simulation of Neural Networks with Biophysically Accurate Models on Graphics Processors Wang, Mingchao 2012 May 1900 (has links) Efficient simulation of large-scale mammalian brain models provides a crucial computational means for understanding complex brain functions and neuronal dynamics. However, such tasks are hindered by significant computational complexities. In this work, we attempt to address the significant computational challenge in simulating large-scale neural networks based on the most biophysically accurate Hodgkin-Huxley (HH) neuron models. Unlike simpler phenomenological spiking models, the use of HH models allows one to directly associate the observed network dynamics with the underlying biological and physiological causes, but at a significantly higher computational cost. We exploit recent commodity massively parallel graphics processors (GPUs) to alleviate the significant computational cost in HH model based neural network simulation. We develop look-up table based HH model evaluation and efficient parallel implementation strategies geared towards higher arithmetic intensity and minimum thread divergence. Furthermore, we adopt and develop advanced multi-level numerical integration techniques well suited for intricate dynamical and stability characteristics of HH models. On a commodity CPU card with 240 streaming processors, for a neural network with one million neurons and 200 million synaptic connections, the presented GPU neural network simulator is about 600X faster than a basic serial CPU based simulator, 28X faster than the CPU implementation of the proposed techniques, and only two to three times slower than the GPU based simulation using simpler spiking models. brain functions neuronal dynamics parallel simulation graphcis processors
7	Partitioning of Urban Transportation Networks Utilizing Real-World Traffic Parameters for Distributed Simulation in SUMO Ahmed, Md Salman, Hoque, Mohammad A. 27 January 2017 (has links) This paper describes a partitioning algorithm for real-world transportation networks incorporating previously unaccounted parameters like signalized traffic intersection, road segment length, traffic density, number of lanes and inter-partition communication overhead due to the migration of vehicles from one partition to another. We also describe our hypothetical framework for distributed simulation of the partitioned road network on SUMO, where a master controller is currently under development using TraCI APIs and MPI library to coordinate the parallel simulation and synchronization between the sub-networks generated by our proposed algorithm. METIS MPI network partition OSM parallel simulation SUMO TraCI
8	Performance and Power Optimization of Parallel Discrete Event Simulations Using DVFS Child, Ryan 08 October 2012 (has links) No description available. Computer Engineering DVFS Time Warp parallel simulation many-core processors
9	Modeling Complex Forest Ecology in a Parallel Computing Infrastructure Mayes, John 08 1900 (has links) Effective stewardship of forest ecosystems make it imperative to measure, monitor, and predict the dynamic changes of forest ecology. Measuring and monitoring provides us a picture of a forest's current state and the necessary data to formulate models for prediction. However, societal and natural events alter the course of a forest's development. A simulation environment that takes into account these events will facilitate forest management. In this thesis, we describe an efficient parallel implementation of a land cover use model, Mosaic, and discuss the development efforts to incorporate spatial interaction and succession dynamics into the model. To evaluate the performance of our implementation, an extensive set of simulation experiments was carried out using a dataset representing the H.J. Andrews Forest in the Oregon Cascades. Results indicate that a significant reduction in the simulation execution time of our parallel model can be achieved as compared to uni-processor simulations. Forest ecology -- Computer simulation. Simulation parallel simulation forest simulation semi-Markov
10	Understanding Multicore Performance : Efficient Memory System Modeling and Simulation Sandberg, Andreas January 2014 (has links) To increase performance, modern processors employ complex techniques such as out-of-order pipelines and deep cache hierarchies. While the increasing complexity has paid off in performance, it has become harder to accurately predict the effects of hardware/software optimizations in such systems. Traditional microarchitectural simulators typically execute code 10 000×–100 000× slower than native execution, which leads to three problems: First, high simulation overhead makes it hard to use microarchitectural simulators for tasks such as software optimizations where rapid turn-around is required. Second, when multiple cores share the memory system, the resulting performance is sensitive to how memory accesses from the different cores interleave. This requires that applications are simulated multiple times with different interleaving to estimate their performance distribution, which is rarely feasible with today's simulators. Third, the high overhead limits the size of the applications that can be studied. This is usually solved by only simulating a relatively small number of instructions near the start of an application, with the risk of reporting unrepresentative results. In this thesis we demonstrate three strategies to accurately model multicore processors without the overhead of traditional simulation. First, we show how microarchitecture-independent memory access profiles can be used to drive automatic cache optimizations and to qualitatively classify an application's last-level cache behavior. Second, we demonstrate how high-level performance profiles, that can be measured on existing hardware, can be used to model the behavior of a shared cache. Unlike previous models, we predict the effective amount of cache available to each application and the resulting performance distribution due to different interleaving without requiring a processor model. Third, in order to model future systems, we build an efficient sampling simulator. By using native execution to fast-forward between samples, we reach new samples much faster than a single sample can be simulated. This enables us to simulate multiple samples in parallel, resulting in almost linear scalability and a maximum simulation rate close to native execution. / CoDeR-MP / UPMARC Computer Architecture Simulation Modeling Sampling Caches Memory Systems gem5 Parallel Simulation Virtualization Sampling Multicore

Search results