Global ETD Search

321	Towards Simulations of Binary Neutron Star Mergers and Core-Collapse Supernovae with GenASiS Budiardja, Reuben Donald 01 August 2010 (has links) This dissertation describes the current version of GenASiS and reports recent progress in its development. GenASiS is a new computational astrophysics code built for large-scale and multi-dimensional computer simulations of astrophysical phenomena, with primary emphasis on the simulations of neutron star mergers and core-collapse supernovae. Neutron star mergers are of high interest to the astrophysics community because they should be the prodigious source of gravitation waves and the most promising candidates for gravitational wave detection. Neutron star mergers are also thought to be associated with the production of short-duration, hard-spectral gamma-ray bursts, though the mechanism is not well understood. In contrast, core-collapse supernovae with massive progenitors are associated with long-duration, soft-spectral gamma-ray bursts, with the `collapsar' hypothesis as the favored mechanism. Of equal interest is the mechanism of core-collapse supernovae themselves, which has been in the forefront of many research efforts for the better half of a century but remains a partially-solved mystery. In addition supernovae, and possibly neutron star mergers, are thought to be sites for the emph{r}-process nucleosynthesis responsible for producing many of the heavy elements. Until we have a proper understanding of these events, we will have only a limited understanding of the origin of the elements. These questions provide some of the scientific motivations and guidelines for the development of GenASiS. In this document the equations and numerical scheme for Newtonian and relativistic magnetohydrodynamics are presented. A new FFT-based parallel solver for Poisson's equation in GenASiS are described. Adaptive mesh refinement in GenASiS, and a novel way to solve Poisson's equation on a mesh with refinement based on a multigrid algorithm, are also presented. Following these descriptions, results of simulations of neutron star mergers with GenASiS such as their evolution and the gravitational wave signals and spectra that they generate are shown. In the context of core-collapse supernovae, we explore the capacity of the stationary shock instability to generate magnetic fields starting from a weak, stationary, and radial magnetic field in an initially spherically symmetric fluid configuration that models the stalled shock in the post-bounce supernova environment. Our results show that the magnetic energy can be amplified by almost 4 orders of magnitude. The amplification mechanisms for the magnetic fields are then explained. computational astrophysics supernovae neutron star merger numerical high performance computing Fluid Dynamics
322	Algorithms for Advection on Hybrid Parallel Computers White, James Buford, III 01 May 2011 (has links) Current climate models have a limited ability to increase spatial resolution because numerical stability requires the time step to decrease. I describe initial experiments with two independent but complementary strategies for attacking this "time barrier". First I describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. My test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. I present results for Fortran implementations using various combinations of MPI, OpenMP, and CUDA, with and without overlap of computation and communication. Second I describe a semi-Lagrangian method for tracer transport that is stable for arbitrary Courant numbers, along with a parallel implementation discretized on the cubed sphere. It shows optimal accuracy at Courant numbers of 10-20, more than an order of magnitude higher than explicit methods. Finally I describe the development and stability analyses of the time integrators and advection methods I used for my experiments. I develop explicit single-step methods with stability up to Courant numbers of one in each dimension, hybrid explicit-implict methods with stability for arbitrary Courant numbers, and interpolation operators that enable the arbitrary stability of semi-Lagrangian methods. linear advection tracer transport semi-Lagrangian high-performance computing linear stability analysis time integration
323	A Heterogeneous, Purpose Built Computer Architecture For Accelerating Biomolecular Simulation Madill, Christopher Andre 09 June 2011 (has links) Molecular dynamics (MD) is a powerful computer simulation technique providing atomistic resolution across a broad range of time scales. In the past four decades, researchers have harnessed the exponential growth in computer power and applied it to the simulation of diverse molecular systems. Although MD simulations are playing an increasingly important role in biomedical research, sampling limitations imposed by both hardware and software constraints establish a \textit{de facto} upper bound on the size and length of MD trajectories. While simulations are currently approaching the hundred-thousand-atom, millisecond-timescale mark using large-scale computing centres optimized for general-purpose data processing, many interesting research topics are still beyond the reach of practical computational biophysics efforts. The purpose of this work is to design a high-speed MD machine which outperforms standard simulators running on commodity hardware or on large computing clusters. In pursuance of this goal, an MD-specific computer architecture is developed which tightly couples the fast processing power of Field-Programmable Gate Array (FPGA) computer chips with a network of high-performance CPUs. The development of this architecture is a multi-phase approach. Core MD algorithms are first analyzed and deconstructed to identify the computational bottlenecks governing the simulation rate. High-speed, parallel algorithms are subsequently developed to perform the most time-critical components in MD simulations on specialized hardware much faster than is possible with general-purpose processors. Finally, the functionality of the hardware accelerators is expanded into a fully-featured MD simulator through the integration of novel parallel algorithms running on a network of CPUs. The developed architecture enabled the construction of various prototype machines running on a variety of hardware platforms which are explored throughout this thesis. Furthermore, simulation models are developed to predict the rate of acceleration using different architectural configurations and molecular systems. With initial acceleration efforts focused primarily on expensive van der Waals and Coulombic force calculations, an architecture was developed whereby a single machine achieves the performance equivalent of an 88-core InfiniBand-connected network of CPUs. Finally, a methodology to successively identify and accelerate the remaining time-critical aspects of MD simulations is developed. This design leads to an architecture with a projected performance equivalent of nearly 150 CPU-cores, enabling supercomputing performance in a single computer chassis, plugged into a standard wall socket. molecular dynamics biomolecular simulation FPGA HPRC high performance reconfigurable computer HPC high performance computing computer architecture 0487
324	Reduced-Order Modeling of Multiscale Turbulent Convection: Application to Data Center Thermal Management Rambo, Jeffrey D. 27 March 2006 (has links) Data centers are computing infrastructure facilities used by industries with large data processing needs and the rapid increase in power density of high performance computing equipment has caused many thermal issues in these facilities. Systems-level thermal management requires modeling and analysis of complex fluid flow and heat transfer processes across several decades of length scales. Conventional computational fluid dynamics and heat transfer techniques for such systems are severely limited as a design tool because their large model sizes render parameter sensitivity studies and optimization impractically slow. The traditional proper orthogonal decomposition (POD) methodology has been reformulated to construct physics-based models of turbulent flows and forced convection. Orthogonal complement POD subspaces were developed to parametrize inhomogeneous boundary conditions and greatly extend the use of the existing POD methodology beyond prototypical flows with fixed parameters. A flux matching procedure was devised to overcome the limitations of Galerkin projection methods for the Reynolds-averaged Navier-Stokes equations and greatly improve the computational efficiency of the approximate solutions. An implicit coupling procedure was developed to link the temperature and velocity fields and further extend the low-dimensional modeling methodology to conjugate forced convection heat transfer. The overall reduced-order modeling framework was able to reduce numerical models containing 105 degrees of freedom (DOF) down to less than 20 DOF, while still retaining greater that 90% accuracy over the domain. Rigorous a posteriori error bounds were formulated by using the POD subspace to partition the error contributions and dual residual methods were used to show that the flux matching procedure is a computationally superior approach for low-dimensional modeling of steady turbulent convection. To efficiently model large-scale systems, individual reduced-order models were coupled using flow network modeling as the component interconnection procedure. The development of handshaking procedures between low-dimensional component models lays the foundation to quickly analyze and optimize the modular systems encountered in electronics thermal management. This modularized approach can also serve as skeletal structure to allow the efficient integration of highly-specialized models across disciplines and significantly advance simulation-based design. Turbulence Proper orthogonal decomposition Data center Convection Reduced-order High performance computing Decision support systems Heat engineering Design
325	Linear Static Analysis Of Large Structural Models On Pc Clusters Ozmen, Semih 01 July 2009 (has links) (PDF) This research focuses on implementing and improving a parallel solution framework for the linear static analysis of large structural models on PC clusters. The framework consists of two separate programs where the first one is responsible from preparing data for the parallel solution that involves partitioning, workload balancing, and equation numbering. The second program is a fully parallel nite element program that utilizes substructure based solution approach with direct solvers. The first step of data preparation is partitioning the structure into substructures. After creating the initial substructures, the estimated imbalance of the substructures is adjusted by iteratively transferring nodes from the slower substructures to the faster ones. Once the final substructures are created, the solution phase is initiated. Each processor assembles its substructure&#039 / s stiffness matrix and condenses it to the interfaces. The interface equations are then solved in parallel with a block-cyclic dense matrix solver. After computing the interface unknowns, each processor calculates the internal displacements and element stresses or forces. Comparative tests were done to demonstrate the performance of the solution framework.
326	Integrating algorithmic and systemic load balancing strategies in parallel scientific applications Ghafoor, Sheikh Khaled, January 2003 (has links) Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
327	Pricing of American Options by Adaptive Tree Methods on GPUs Lundgren, Jacob January 2015 (has links) An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large. GPU Graphics Processing Unit American Options Computer Hardware High Performance Computing Computational Finance Scientific Computing Mathematical Finance Parallel Programming
328	Autonomic Cloud Resource Management Tunc, Cihan January 2015 (has links) The power consumption of data centers and cloud systems has increased almost three times between 2007 and 2012. The traditional resource allocation methods are typically designed for high performance as the primary objective to support peak resource requirements. However, it is shown that server utilization is between 12% and 18%, while the power consumption is close to those at peak loads. Hence, there is a pressing need for devising sophisticated resource management approaches. State of the art dynamic resource management schemes typically rely on only a single resource such as core number, core speed, memory, disk, and network. There is a lack of fundamental research on methods addressing dynamic management of multiple resources and properties with the objective of allocating just enough resources for each workload to meet quality of service requirements while optimizing for power consumption. The main focus of this dissertation is to simultaneously manage power and performance for large cloud systems. The objective of this research is to develop a framework of performance and power management and investigate a general methodology for an integrated autonomic cloud management. In this dissertation, we developed an autonomic management framework based on a novel data structure, AppFlow, used for modeling current and near-term future cloud application behavior. We have developed the following capabilities for the performance and power management of the cloud computing systems: 1) online modeling and characterizing the cloud application behavior and resource requirements; 2) predicting the application behavior to proactively optimize its operations at runtime; 3) a holistic optimization methodology for performance and power using number of cores, CPU frequency, and memory amount; and 4) an autonomic cloud management to support the dynamic change in VM configurations at runtime to simultaneously optimize multiple objectives including performance, power, availability, etc. We validated our approach using RUBiS benchmark (emulating eBay), on an IBM HS22 blade server. Our experimental results showed that our approach can lead to a significant reduction in power consumption upto 87% when compared to the static resource allocation strategy, 72% when compared to adaptive frequency scaling strategy, and 66% when compared to a multi-resource management strategy. cloud computing cloud resource management high performance computing power minimization ppw Electrical & Computer Engineering autonomic computing
329	A Heterogeneous, Purpose Built Computer Architecture For Accelerating Biomolecular Simulation Madill, Christopher Andre 09 June 2011 (has links) Molecular dynamics (MD) is a powerful computer simulation technique providing atomistic resolution across a broad range of time scales. In the past four decades, researchers have harnessed the exponential growth in computer power and applied it to the simulation of diverse molecular systems. Although MD simulations are playing an increasingly important role in biomedical research, sampling limitations imposed by both hardware and software constraints establish a \textit{de facto} upper bound on the size and length of MD trajectories. While simulations are currently approaching the hundred-thousand-atom, millisecond-timescale mark using large-scale computing centres optimized for general-purpose data processing, many interesting research topics are still beyond the reach of practical computational biophysics efforts. The purpose of this work is to design a high-speed MD machine which outperforms standard simulators running on commodity hardware or on large computing clusters. In pursuance of this goal, an MD-specific computer architecture is developed which tightly couples the fast processing power of Field-Programmable Gate Array (FPGA) computer chips with a network of high-performance CPUs. The development of this architecture is a multi-phase approach. Core MD algorithms are first analyzed and deconstructed to identify the computational bottlenecks governing the simulation rate. High-speed, parallel algorithms are subsequently developed to perform the most time-critical components in MD simulations on specialized hardware much faster than is possible with general-purpose processors. Finally, the functionality of the hardware accelerators is expanded into a fully-featured MD simulator through the integration of novel parallel algorithms running on a network of CPUs. The developed architecture enabled the construction of various prototype machines running on a variety of hardware platforms which are explored throughout this thesis. Furthermore, simulation models are developed to predict the rate of acceleration using different architectural configurations and molecular systems. With initial acceleration efforts focused primarily on expensive van der Waals and Coulombic force calculations, an architecture was developed whereby a single machine achieves the performance equivalent of an 88-core InfiniBand-connected network of CPUs. Finally, a methodology to successively identify and accelerate the remaining time-critical aspects of MD simulations is developed. This design leads to an architecture with a projected performance equivalent of nearly 150 CPU-cores, enabling supercomputing performance in a single computer chassis, plugged into a standard wall socket. molecular dynamics biomolecular simulation FPGA HPRC high performance reconfigurable computer HPC high performance computing computer architecture 0487
330	Storage and aggregation for fast analytics systems Amur, Hrishikesh 13 January 2014 (has links) Computing in the last decade has been characterized by the rise of data- intensive scalable computing (DISC) systems. In particular, recent years have wit- nessed a rapid growth in the popularity of fast analytics systems. These systems exemplify a trend where queries that previously involved batch-processing (e.g., run- ning a MapReduce job) on a massive amount of data, are increasingly expected to be answered in near real-time with low latency. This dissertation addresses the problem that existing designs for various components used in the software stack for DISC sys- tems do not meet the requirements demanded by fast analytics applications. In this work, we focus specifically on two components: 1. Key-value storage: Recent work has focused primarily on supporting reads with high throughput and low latency. However, fast analytics applications require that new data entering the system (e.g., new web-pages crawled, currently trend- ing topics) be quickly made available to queries and analysis codes. This means that along with supporting reads efficiently, these systems must also support writes with high throughput, which current systems fail to do. In the first part of this work, we solve this problem by proposing a new key-value storage system – called the WriteBuffer (WB) Tree – that provides up to 30× higher write per- formance and similar read performance compared to current high-performance systems. 2. GroupBy-Aggregate: Fast analytics systems require support for fast, incre- mental aggregation of data for with low-latency access to results. Existing techniques are memory-inefficient and do not support incremental aggregation efficiently when aggregate data overflows to disk. In the second part of this dis- sertation, we propose a new data structure called the Compressed Buffer Tree (CBT) to implement memory-efficient in-memory aggregation. We also show how the WB Tree can be modified to support efficient disk-based aggregation. Key-value storage GroupBy-Aggregate Resource efficiency Write-optimized data structures Real-time data processing High performance computing

Search results