Spelling suggestions: "subject:"performancecomputing"" "subject:"performance.comparing""
321 |
Accelerated many-body protein side-chain repacking using gpus: application to proteins implicated in hearing lossTollefson, Mallory RaNae 15 December 2017 (has links)
With recent advances and cost reductions in next generation sequencing (NGS), the amount of genetic sequence data is increasing rapidly. However, before patient specific genetic information reaches its full potential to advance clinical diagnostics, the immense degree of genetic heterogeneity that contributes to human disease must be more fully understood. For example, although large numbers of genetic variations are discovered during clinical use of NGS, annotating and understanding the impact of such coding variations on protein phenotype remains a bottleneck (i.e. what is the molecular mechanism behind deafness phenotypes). Fortunately, computational methods are emerging that can be used to efficiently study protein coding variants, and thereby overcome the bottleneck brought on by rapid adoption of clinical sequencing.
To study proteins via physics-based computational algorithms, high-quality 3D structural models are essential. These protein models can be obtained using a variety of numerical optimization methods that operate on physics-based potential energy functions. Accurate protein structures serve as input to downstream variation analysis algorithms. In this work, we applied a novel amino acid side-chain optimization algorithm, which operated on an advanced model of atomic interactions (i.e. the AMOEBA polarizable force field), to a set of 164 protein structural models implicated in deafness. The resulting models were evaluated with the MolProbity structure validation tool. MolProbity “scores” were originally calibrated to predict the quality of X-ray diffraction data used to generate a given protein model (i.e. a 1.0 Å or lower MolProbity score indicates a protein model from high quality data, while a score of 4.0 Å or higher reflects relatively poor data). In this work, the side-chain optimization algorithm improved mean MolProbity score from 2.65 Å (42nd percentile) to nearly atomic resolution at 1.41 Å (95th percentile). However, side-chain optimization with the AMOEBA many-body potential function is computationally expensive. Thus, a second contribution of this work is a parallelization scheme that utilizes nVidia graphical processing units (GPUs) to accelerate the side-chain repacking algorithm. With the use of one GPU, our side-chain optimization algorithm achieved a 25 times speed-up compared to using two Intel Xeon E5-2680v4 central processing units (CPUs). We expect the GPU acceleration scheme to lessen demand on computing resources dedicated to protein structure optimization efforts and thereby dramatically expand the number of protein structures available to aid in interpretation of missense variations associated with deafness.
|
322 |
Towards Simulations of Binary Neutron Star Mergers and Core-Collapse Supernovae with GenASiSBudiardja, Reuben Donald 01 August 2010 (has links)
This dissertation describes the current version of GenASiS and reports recent progress in its development. GenASiS is a new computational astrophysics code built for large-scale and multi-dimensional computer simulations of astrophysical phenomena, with primary emphasis on the simulations of neutron star mergers and core-collapse supernovae. Neutron star mergers are of high interest to the astrophysics community because they should be the prodigious source of gravitation waves and the most promising candidates for gravitational wave detection. Neutron star mergers are also thought to be associated with the production of short-duration, hard-spectral gamma-ray bursts, though the mechanism is not well understood. In contrast, core-collapse supernovae with massive progenitors are associated with long-duration, soft-spectral gamma-ray bursts, with the `collapsar' hypothesis as the favored mechanism. Of equal interest is the mechanism of core-collapse supernovae themselves, which has been in the forefront of many research efforts for the better half of a century but remains a partially-solved mystery. In addition supernovae, and possibly neutron star mergers, are thought to be sites for the emph{r}-process nucleosynthesis responsible for producing many of the heavy elements. Until we have a proper understanding of these events, we will have only a limited understanding of the origin of the elements. These questions provide some of the scientific motivations and guidelines for the development of GenASiS. In this document the equations and numerical scheme for Newtonian and relativistic magnetohydrodynamics are presented. A new FFT-based parallel solver for Poisson's equation in GenASiS are described. Adaptive mesh refinement in GenASiS, and a novel way to solve Poisson's equation on a mesh with refinement based on a multigrid algorithm, are also presented. Following these descriptions, results of simulations of neutron star mergers with GenASiS such as their evolution and the gravitational wave signals and spectra that they generate are shown. In the context of core-collapse supernovae, we explore the capacity of the stationary shock instability to generate magnetic fields starting from a weak, stationary, and radial magnetic field in an initially spherically symmetric fluid configuration that models the stalled shock in the post-bounce supernova environment. Our results show that the magnetic energy can be amplified by almost 4 orders of magnitude. The amplification mechanisms for the magnetic fields are then explained.
|
323 |
Algorithms for Advection on Hybrid Parallel ComputersWhite, James Buford, III 01 May 2011 (has links)
Current climate models have a limited ability to increase spatial resolution because numerical stability requires the time step to decrease. I describe initial experiments with two independent but complementary strategies for attacking this "time barrier". First I describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. My test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. I present results for Fortran implementations using various combinations of MPI, OpenMP, and CUDA, with and without overlap of computation and communication. Second I describe a semi-Lagrangian method for tracer transport that is stable for arbitrary Courant numbers, along with a parallel implementation discretized on the cubed sphere. It shows optimal accuracy at Courant numbers of 10-20, more than an order of magnitude higher than explicit methods. Finally I describe the development and stability analyses of the time integrators and advection methods I used for my experiments. I develop explicit single-step methods with stability up to Courant numbers of one in each dimension, hybrid explicit-implict methods with stability for arbitrary Courant numbers, and interpolation operators that enable the arbitrary stability of semi-Lagrangian methods.
|
324 |
A Heterogeneous, Purpose Built Computer Architecture For Accelerating Biomolecular SimulationMadill, Christopher Andre 09 June 2011 (has links)
Molecular dynamics (MD) is a powerful computer simulation technique providing atomistic resolution across a broad range of time scales.
In the past four decades, researchers have harnessed the exponential growth in computer power and applied it to the simulation of diverse molecular systems. Although MD simulations are playing an increasingly
important role in biomedical research, sampling limitations imposed by both hardware and software constraints establish a \textit{de facto} upper bound on the size and length of MD trajectories. While simulations are currently approaching the hundred-thousand-atom, millisecond-timescale
mark using large-scale computing centres optimized for general-purpose data processing, many interesting research topics are still beyond the reach of practical computational biophysics efforts.
The purpose of this work is to design a high-speed MD machine which outperforms standard simulators running on commodity hardware or on large computing clusters. In pursuance of this goal, an MD-specific computer architecture is developed which tightly couples the fast processing power of Field-Programmable Gate Array (FPGA) computer
chips with a network of high-performance CPUs. The development of this architecture is a multi-phase approach. Core MD algorithms
are first analyzed and deconstructed to identify the computational bottlenecks governing the simulation rate. High-speed, parallel algorithms are subsequently developed to perform the most time-critical components in MD simulations on specialized
hardware much faster than is possible with general-purpose processors. Finally, the functionality of the hardware accelerators
is expanded into a fully-featured MD simulator through the integration of novel parallel algorithms running on a network of CPUs.
The developed architecture enabled the construction of various prototype machines running on a variety of hardware platforms
which are explored throughout this thesis. Furthermore, simulation models are developed to predict the rate of acceleration using
different architectural configurations and molecular systems.
With initial acceleration efforts focused primarily on expensive van der Waals and Coulombic force calculations, an architecture
was developed whereby a single machine achieves the performance equivalent of an 88-core InfiniBand-connected network of CPUs.
Finally, a methodology to successively identify and accelerate the remaining time-critical aspects of MD simulations is developed.
This design leads to an architecture with a projected performance equivalent of nearly 150 CPU-cores, enabling supercomputing
performance in a single computer chassis, plugged into a standard wall socket.
|
325 |
Reduced-Order Modeling of Multiscale Turbulent Convection: Application to Data Center Thermal ManagementRambo, Jeffrey D. 27 March 2006 (has links)
Data centers are computing infrastructure facilities used by industries with large data processing needs and the rapid increase in power density of high performance computing equipment has caused many thermal issues in these facilities. Systems-level thermal management requires modeling and analysis of complex fluid flow and heat transfer processes across several decades of length scales. Conventional computational fluid dynamics and heat transfer techniques for such systems are severely limited as a design tool because their large model sizes render parameter sensitivity studies and optimization impractically slow.
The traditional proper orthogonal decomposition (POD) methodology has been reformulated to construct physics-based models of turbulent flows and forced convection. Orthogonal complement POD subspaces were developed to parametrize inhomogeneous boundary conditions and greatly extend the use of the existing POD methodology beyond prototypical flows with fixed parameters. A flux matching procedure was devised to overcome the limitations of Galerkin projection methods for the Reynolds-averaged Navier-Stokes equations and greatly improve the computational efficiency of the approximate solutions. An implicit coupling procedure was developed to link the temperature and velocity fields and further extend the low-dimensional modeling methodology to conjugate forced convection heat transfer. The overall reduced-order modeling framework was able to reduce numerical models containing 105 degrees of freedom (DOF) down to less than 20 DOF, while still retaining greater that 90% accuracy over the domain.
Rigorous a posteriori error bounds were formulated by using the POD subspace to partition the error contributions and dual residual methods were used to show that the flux matching procedure is a computationally superior approach for low-dimensional modeling of steady turbulent convection.
To efficiently model large-scale systems, individual reduced-order models were coupled using flow network modeling as the component interconnection procedure. The development of handshaking procedures between low-dimensional component models lays the foundation to quickly analyze and optimize the modular systems encountered in electronics thermal management. This modularized approach can also serve as skeletal structure to allow the efficient integration of highly-specialized models across disciplines and significantly advance simulation-based design.
|
326 |
Linear Static Analysis Of Large Structural Models On Pc ClustersOzmen, Semih 01 July 2009 (has links) (PDF)
This research focuses on implementing and improving a parallel solution framework for
the linear static analysis of large structural models on PC clusters. The framework
consists of two separate programs where the first one is responsible from preparing data
for the parallel solution that involves partitioning, workload balancing, and equation
numbering. The second program is a fully parallel nite element program that utilizes
substructure based solution approach with direct solvers.
The first step of data preparation is partitioning the structure into substructures.
After creating the initial substructures, the estimated imbalance of the substructures
is adjusted by iteratively transferring nodes from the slower substructures to the faster
ones. Once the final substructures are created, the solution phase is initiated. Each
processor assembles its substructure' / s stiffness matrix and condenses it to the interfaces.
The interface equations are then solved in parallel with a block-cyclic dense
matrix solver. After computing the interface unknowns, each processor calculates the
internal displacements and element stresses or forces. Comparative tests were done to
demonstrate the performance of the solution framework.
|
327 |
Integrating algorithmic and systemic load balancing strategies in parallel scientific applicationsGhafoor, Sheikh Khaled, January 2003 (has links)
Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
328 |
Pricing of American Options by Adaptive Tree Methods on GPUsLundgren, Jacob January 2015 (has links)
An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large.
|
329 |
Autonomic Cloud Resource ManagementTunc, Cihan January 2015 (has links)
The power consumption of data centers and cloud systems has increased almost three times between 2007 and 2012. The traditional resource allocation methods are typically designed for high performance as the primary objective to support peak resource requirements. However, it is shown that server utilization is between 12% and 18%, while the power consumption is close to those at peak loads. Hence, there is a pressing need for devising sophisticated resource management approaches. State of the art dynamic resource management schemes typically rely on only a single resource such as core number, core speed, memory, disk, and network. There is a lack of fundamental research on methods addressing dynamic management of multiple resources and properties with the objective of allocating just enough resources for each workload to meet quality of service requirements while optimizing for power consumption. The main focus of this dissertation is to simultaneously manage power and performance for large cloud systems. The objective of this research is to develop a framework of performance and power management and investigate a general methodology for an integrated autonomic cloud management. In this dissertation, we developed an autonomic management framework based on a novel data structure, AppFlow, used for modeling current and near-term future cloud application behavior. We have developed the following capabilities for the performance and power management of the cloud computing systems: 1) online modeling and characterizing the cloud application behavior and resource requirements; 2) predicting the application behavior to proactively optimize its operations at runtime; 3) a holistic optimization methodology for performance and power using number of cores, CPU frequency, and memory amount; and 4) an autonomic cloud management to support the dynamic change in VM configurations at runtime to simultaneously optimize multiple objectives including performance, power, availability, etc. We validated our approach using RUBiS benchmark (emulating eBay), on an IBM HS22 blade server. Our experimental results showed that our approach can lead to a significant reduction in power consumption upto 87% when compared to the static resource allocation strategy, 72% when compared to adaptive frequency scaling strategy, and 66% when compared to a multi-resource management strategy.
|
330 |
A Heterogeneous, Purpose Built Computer Architecture For Accelerating Biomolecular SimulationMadill, Christopher Andre 09 June 2011 (has links)
Molecular dynamics (MD) is a powerful computer simulation technique providing atomistic resolution across a broad range of time scales.
In the past four decades, researchers have harnessed the exponential growth in computer power and applied it to the simulation of diverse molecular systems. Although MD simulations are playing an increasingly
important role in biomedical research, sampling limitations imposed by both hardware and software constraints establish a \textit{de facto} upper bound on the size and length of MD trajectories. While simulations are currently approaching the hundred-thousand-atom, millisecond-timescale
mark using large-scale computing centres optimized for general-purpose data processing, many interesting research topics are still beyond the reach of practical computational biophysics efforts.
The purpose of this work is to design a high-speed MD machine which outperforms standard simulators running on commodity hardware or on large computing clusters. In pursuance of this goal, an MD-specific computer architecture is developed which tightly couples the fast processing power of Field-Programmable Gate Array (FPGA) computer
chips with a network of high-performance CPUs. The development of this architecture is a multi-phase approach. Core MD algorithms
are first analyzed and deconstructed to identify the computational bottlenecks governing the simulation rate. High-speed, parallel algorithms are subsequently developed to perform the most time-critical components in MD simulations on specialized
hardware much faster than is possible with general-purpose processors. Finally, the functionality of the hardware accelerators
is expanded into a fully-featured MD simulator through the integration of novel parallel algorithms running on a network of CPUs.
The developed architecture enabled the construction of various prototype machines running on a variety of hardware platforms
which are explored throughout this thesis. Furthermore, simulation models are developed to predict the rate of acceleration using
different architectural configurations and molecular systems.
With initial acceleration efforts focused primarily on expensive van der Waals and Coulombic force calculations, an architecture
was developed whereby a single machine achieves the performance equivalent of an 88-core InfiniBand-connected network of CPUs.
Finally, a methodology to successively identify and accelerate the remaining time-critical aspects of MD simulations is developed.
This design leads to an architecture with a projected performance equivalent of nearly 150 CPU-cores, enabling supercomputing
performance in a single computer chassis, plugged into a standard wall socket.
|
Page generated in 0.0976 seconds