Global ETD Search

121	Performance Optimization of Ice Sheet Simulation Models : Examining ways to speed up simulations, enabling for upscaling with more data Brink, Fredrika January 2023 (has links) This study aims to examine how simulation models can be performance optimized in Python. Optimized in the sense of executing faster and enabling upscaling with more data. To meet this aim, two models simulating the Greenland ice sheet are studied. The simulation of ice sheets is an important part of glaciology and climate change research. By following an iterative spiral model of software development and evolution with focus on the bottlenecks, it is possible to optimize the most time-consuming code sections. Several iterations of implementing tools and techniques suitable for Python code are performed, such as implementing libraries, changing data structures, and improving code hygiene. Once the models are optimized, the upscaling with a new dataset, called CARRA, created from observations and modelled outcomes combined, is studied. The results indicate that the most effective approach of performance optimizing is to implement the Numba library to compile critical code sections to machine code and to parallelize the simulations using Joblib. Depending on the data used and the size and granularity of the simulations, simulations between 1.5 and 3.2 times the speed are gained. When simulating CARRA data, the optimized code still results in faster simulations. However, the outcome demonstrates that differences exist between the ice sheets simulated by the dataset initially used and CARRA data. Even though the CARRA dataset yields a different glaciological result, the overall changes in the ice sheet are similar to the changes shown in the initial dataset simulations. The CARRA dataset could possibly be used for getting an overview of what is happening to the ice sheet, but not for making detailed analyses, where exact numbers are needed. performance optimization Python libraries ice sheet simulations Numba Joblib upscaling parallelization Computer and Information Sciences Data- och informationsvetenskap
122	Parallelization of boolean operations for CAD Software using WebGPU / Parallelisering av CAD Mjukvara på Webben med WebGPU Helmrich, Max, Käll, Linus January 2023 (has links) This project is about finding ways to improve performance of a Computer-Aided-Design (CAD) application running in the web browser. With the new Web API WebGPU, it is now possible to use the GPU to accelerate calculations for CAD applications in the web. In this project, we tried to find if using the GPU could yield significant performance improvements and if they are worth implementing. Typical tasks for a CAD application are split and union, used for finding intersections and combining shapes in geometry, which we parallelized during this project. Our final implementation utilizes lazy evaluation and the HistoPyramid data structure, to compete with a state-of-the-art line-sweep based algorithm called Polygon Clipping. Although the Polygon Clipping intersection is still faster than our implementations in most cases, we found that WebGPU can still give significant performance boosts. Parallelization Web Boolean Operations WebGPU CAD GPU Acceleration Computer Sciences Datavetenskap (datalogi)
123	A Distributed Memory Implementation of LOCI George, Thomas 14 December 2001 (has links) Distributed memory systems have gained immense popularity due to their favorable price/performance ratios. This study seeks to reduce the complexities, involved in developing parallel applications for distributed memory systems. The Loci system is a coordination framework which was developed to eliminate most of the accidental complexities involved in numerical simulation software development. A distributed memory version of Loci is developed and has been tested and validated using a finite-rate chemically reacting flow solver developed in the sequential Loci framework. The application developed in the original sequential version of Loci was parallelized with minimal changes in its source code. A comparison with the results from the original sequential version guarantees a correct implementation. The performance measurements indicate that an efficient implementation has been achieved. coordination framework accidental complexity flow solver distributed memory high performance automatic parallelization
124	The Self-Optimizing Inverse Methodology for Material Parameter Identification and Distributed Damage Detection Weaver, Josh 29 May 2015 (has links) No description available. Civil Engineering inverse parameter estimation parallelization Self-optim ABAQUS sensitivity analysis stochastic damage detection
125	Tools for Performance Optimizations and Tuning of Affine Loop Nests Hartono, Albert January 2009 (has links) No description available. Computer Science compilers loop optimizations parametric tiling loop parallelization wavefront parallelism empirical tuning annotation-based optimizations
126	Interior Penalty Discontinuous Galerkin Finite Element Method for the Time-Domain Maxwell's Equations Dosopoulos, Stylianos 22 June 2012 (has links) No description available. Electrical Engineering Electromagnetics Discontinuous Galerkin Time Domain Non-conformal MPI/GPU parallelization
127	Design and Analysis of A Parallelized Electrically Controlled Droplet Generating Device ZHU, CHAO 10 1900 (has links) <p>Microdroplets find use in a variety of applications ranging from chemical synthesis to biological analysis. However, commercial use of microdroplets has been stymied in many applications, as current devices lack one or more of the critical features such as precise and dynamic control of the droplet size, high throughput and easy fabrication. This work involves design, fabrication and characterization of a microdroplet generating device that uses low cost fabrication, allows dynamic control of the droplet size and achieves parallelized droplet generation for high throughput.</p> <p>Dynamic droplet size control by DC electric field has been demonstrated with the device. By varying the potential from 300 V to 1000 V, the droplet size can change from 140 microns to around 40 microns . The transition of the droplet size just takes few seconds. Parallelized droplet generation has also been demonstrated. The standard deviation of the droplet size is lower than 4% for the three-capillary device and lower than 6% for the five-capillary device under different operating conditions. Highest throughput of 0.75 mL/hour is achieved on the five-capillary device. It has been show that this proposed device has a better performance than the existing PDMS based parallel droplet generating devices. A theoretical model of the droplet generating process has also been developed which is able to predict the droplet size at various potentials. The theoretical results are in good agreement with experimental ones.</p> / Master of Applied Science (MASc) Microfluidics Droplet Generation Electrically Control Parallelization Other Mechanical Engineering Other Mechanical Engineering
128	Generalizing the Utility of Graphics Processing Units in Large-Scale Heterogeneous Computing Systems Xiao, Shucai 03 July 2013 (has links) Today, heterogeneous computing systems are widely used to meet the increasing demand for high-performance computing. These systems commonly use powerful and energy-efficient accelerators to augment general-purpose processors (i.e., CPUs). The graphic processing unit (GPU) is one such accelerator. Originally designed solely for graphics processing, GPUs have evolved into programmable processors that can deliver massive parallel processing power for general-purpose applications. Using SIMD (Single Instruction Multiple Data) based components as building units; the current GPU architecture is well suited for data-parallel applications where the execution of each task is independent. With the delivery of programming models such as Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), programming GPUs has become much easier than before. However, developing and optimizing an application on a GPU is still a challenging task, even for well-trained computing experts. Such programming tasks will be even more challenging in large-scale heterogeneous systems, particularly in the context of utility computing, where GPU resources are used as a service. These challenges are largely due to the limitations in the current programming models: (1) there are no intra-and inter-GPU cooperative mechanisms that are natively supported; (2) current programming models only support the utilization of GPUs installed locally; and (3) to use GPUs on another node, application programs need to explicitly call application programming interface (API) functions for data communication. To reduce the mapping efforts and to better utilize the GPU resources, we investigate generalizing the utility of GPUs in large-scale heterogeneous systems with GPUs as accelerators. We generalize the utility of GPUs through the transparent virtualization of GPUs, which can enable applications to view all GPUs in the system as if they were installed locally. As a result, all GPUs in the system can be used as local GPUs. Moreover, GPU virtualization is a key capability to support the notion of "GPU as a service." Specifically, we propose the virtual OpenCL (or VOCL) framework for the transparent virtualization of GPUs. To achieve good performance, we optimize and extend the framework in three aspects: (1) optimize VOCL by reducing the data transfer overhead between the local node and remote node; (2) propose GPU synchronization to reduce the overhead of switching back and forth if multiple kernel launches are needed for data communication across different compute units on a GPU; and (3) extend VOCL to support live virtual GPU migration for quick system maintenance and load rebalancing across GPUs. With the above optimizations and extensions, we thoroughly evaluate VOCL along three dimensions: (1) show the performance improvement for each of our optimization strategies; (2) evaluate the overhead of using remote GPUs via several microbenchmark suites as well as a few real-world applications; and (3) demonstrate the overhead as well as the benefit of live virtual GPU migration. Our experimental results indicate that VOCL can generalize the utility of GPUs in large-scale systems at a reasonable virtualization and migration cost. / Ph. D. Graphics Processing Unit (GPU) CUDA OpenCL BLAST Smith-Waterman Fine-Grained Parallelization GPU Virtualization
129	Parallel implementation and application of particle scale heat transfer in the Discrete Element Method Amritkar, Amit Ravindra 25 July 2013 (has links) Dense fluid-particulate systems are widely encountered in the pharmaceutical, energy, environmental and chemical processing industries. Prediction of the heat transfer characteristics of these systems is challenging. Use of a high fidelity Discrete Element Method (DEM) for particle scale simulations coupled to Computational Fluid Dynamics (CFD) requires large simulation times and limits application to small particulate systems. The overall goal of this research is to develop and implement parallelization techniques which can be applied to large systems with O(105- 106) particles to investigate particle scale heat transfer in rotary kiln and fluidized bed environments. The strongly coupled CFD and DEM calculations are parallelized using the OpenMP paradigm which provides the flexibility needed for the multimodal parallelism encountered in fluid-particulate systems. The fluid calculation is parallelized using domain decomposition, whereas N-body decomposition is used for DEM. It is shown that OpenMP-CFD with the first touch policy, appropriate thread affinity and careful tuning scales as well as MPI up to 256 processors on a shared memory SGI Altix. To implement DEM in the OpenMP framework, ghost particle transfers between grid blocks, which consume a substantial amount of time in DEM, are eliminated by a suitable global mapping of the multi-block data structure. The global mapping together with enforcing perfect particle load balance across OpenMP threads results in computational times between 2-5 times faster than an equivalent MPI implementation. Heat transfer studies are conducted in a rotary kiln as well as in a fluidized bed equipped with a single horizontal tube heat exchanger. Two cases, one with mono-disperse 2 mm particles rotating at 20 RPM and another with a poly-disperse distribution ranging from 1-2.8 mm and rotating at 1 RPM are investigated. It is shown that heat transfer to the mono-disperse 2 mm particles is dominated by convective heat transfer from the thermal boundary layer that forms on the heated surface of the kiln. In the second case, during the first 24 seconds, the heat transfer to the particles is dominated by conduction to the larger particles that settle at the bottom of the kiln. The results compare reasonably well with experiments. In the fluidized bed, the highly energetic transitional flow and thermal field in the vicinity of the tube surface and the limits placed on the grid size by the volume-averaged nature of the governing equations result in gross under prediction of the heat transfer coefficient at the tube surface. It is shown that the inclusion of a subgrid stress model and the application of a LES wall function (WMLES) at the tube surface improves the prediction to within ± 20% of the experimental measurements. / Ph. D. Computational fluid dynamics (CFD) Heat--Transmission OpenMP MPI Hybrid parallelization Performance tools Multiphase flows
130	Balancing of Parallel U-Shaped Assembly Lines with Crossover Points Rattan, Amanpreet 06 September 2017 (has links) This research introduces parallel U-shaped assembly lines with crossover points. Crossover points are connecting points between two parallel U-shaped lines making the lines interdependent. The assembly lines can be employed to manufacture a variety of products belonging to the same product family. This is achieved by utilizing the concepts of crossover points, multi-line stations, and regular stations. The binary programming formulation presented in this research can be employed for any scenario (e.g. task times, cycle times, and the number of tasks) in the configuration that includes a crossover point. The comparison of numerical problem solutions based on the proposed heuristic approach with the traditional approach highlights the possible reduction in the quantity of workers required. The conclusion from this research is that a wider variety of products can be manufactured at the same capital expense using parallel U-shaped assembly lines with crossover points, leading to a reduction in the total number of workers. / M. S. U-shaped line assembly balancing crossover points AMPL binary programming formulation.

Search results