Global ETD Search

321	A Heterogeneous, Purpose Built Computer Architecture For Accelerating Biomolecular Simulation Madill, Christopher Andre 09 June 2011 (has links) Molecular dynamics (MD) is a powerful computer simulation technique providing atomistic resolution across a broad range of time scales. In the past four decades, researchers have harnessed the exponential growth in computer power and applied it to the simulation of diverse molecular systems. Although MD simulations are playing an increasingly important role in biomedical research, sampling limitations imposed by both hardware and software constraints establish a \textit{de facto} upper bound on the size and length of MD trajectories. While simulations are currently approaching the hundred-thousand-atom, millisecond-timescale mark using large-scale computing centres optimized for general-purpose data processing, many interesting research topics are still beyond the reach of practical computational biophysics efforts. The purpose of this work is to design a high-speed MD machine which outperforms standard simulators running on commodity hardware or on large computing clusters. In pursuance of this goal, an MD-specific computer architecture is developed which tightly couples the fast processing power of Field-Programmable Gate Array (FPGA) computer chips with a network of high-performance CPUs. The development of this architecture is a multi-phase approach. Core MD algorithms are first analyzed and deconstructed to identify the computational bottlenecks governing the simulation rate. High-speed, parallel algorithms are subsequently developed to perform the most time-critical components in MD simulations on specialized hardware much faster than is possible with general-purpose processors. Finally, the functionality of the hardware accelerators is expanded into a fully-featured MD simulator through the integration of novel parallel algorithms running on a network of CPUs. The developed architecture enabled the construction of various prototype machines running on a variety of hardware platforms which are explored throughout this thesis. Furthermore, simulation models are developed to predict the rate of acceleration using different architectural configurations and molecular systems. With initial acceleration efforts focused primarily on expensive van der Waals and Coulombic force calculations, an architecture was developed whereby a single machine achieves the performance equivalent of an 88-core InfiniBand-connected network of CPUs. Finally, a methodology to successively identify and accelerate the remaining time-critical aspects of MD simulations is developed. This design leads to an architecture with a projected performance equivalent of nearly 150 CPU-cores, enabling supercomputing performance in a single computer chassis, plugged into a standard wall socket. molecular dynamics biomolecular simulation FPGA HPRC high performance reconfigurable computer HPC high performance computing computer architecture 0487
322	Reduced-Order Modeling of Multiscale Turbulent Convection: Application to Data Center Thermal Management Rambo, Jeffrey D. 27 March 2006 (has links) Data centers are computing infrastructure facilities used by industries with large data processing needs and the rapid increase in power density of high performance computing equipment has caused many thermal issues in these facilities. Systems-level thermal management requires modeling and analysis of complex fluid flow and heat transfer processes across several decades of length scales. Conventional computational fluid dynamics and heat transfer techniques for such systems are severely limited as a design tool because their large model sizes render parameter sensitivity studies and optimization impractically slow. The traditional proper orthogonal decomposition (POD) methodology has been reformulated to construct physics-based models of turbulent flows and forced convection. Orthogonal complement POD subspaces were developed to parametrize inhomogeneous boundary conditions and greatly extend the use of the existing POD methodology beyond prototypical flows with fixed parameters. A flux matching procedure was devised to overcome the limitations of Galerkin projection methods for the Reynolds-averaged Navier-Stokes equations and greatly improve the computational efficiency of the approximate solutions. An implicit coupling procedure was developed to link the temperature and velocity fields and further extend the low-dimensional modeling methodology to conjugate forced convection heat transfer. The overall reduced-order modeling framework was able to reduce numerical models containing 105 degrees of freedom (DOF) down to less than 20 DOF, while still retaining greater that 90% accuracy over the domain. Rigorous a posteriori error bounds were formulated by using the POD subspace to partition the error contributions and dual residual methods were used to show that the flux matching procedure is a computationally superior approach for low-dimensional modeling of steady turbulent convection. To efficiently model large-scale systems, individual reduced-order models were coupled using flow network modeling as the component interconnection procedure. The development of handshaking procedures between low-dimensional component models lays the foundation to quickly analyze and optimize the modular systems encountered in electronics thermal management. This modularized approach can also serve as skeletal structure to allow the efficient integration of highly-specialized models across disciplines and significantly advance simulation-based design. Turbulence Proper orthogonal decomposition Data center Convection Reduced-order High performance computing Decision support systems Heat engineering Design
323	Linear Static Analysis Of Large Structural Models On Pc Clusters Ozmen, Semih 01 July 2009 (has links) (PDF) This research focuses on implementing and improving a parallel solution framework for the linear static analysis of large structural models on PC clusters. The framework consists of two separate programs where the first one is responsible from preparing data for the parallel solution that involves partitioning, workload balancing, and equation numbering. The second program is a fully parallel nite element program that utilizes substructure based solution approach with direct solvers. The first step of data preparation is partitioning the structure into substructures. After creating the initial substructures, the estimated imbalance of the substructures is adjusted by iteratively transferring nodes from the slower substructures to the faster ones. Once the final substructures are created, the solution phase is initiated. Each processor assembles its substructure&#039 / s stiffness matrix and condenses it to the interfaces. The interface equations are then solved in parallel with a block-cyclic dense matrix solver. After computing the interface unknowns, each processor calculates the internal displacements and element stresses or forces. Comparative tests were done to demonstrate the performance of the solution framework.
324	Integrating algorithmic and systemic load balancing strategies in parallel scientific applications Ghafoor, Sheikh Khaled, January 2003 (has links) Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
325	Pricing of American Options by Adaptive Tree Methods on GPUs Lundgren, Jacob January 2015 (has links) An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large. GPU Graphics Processing Unit American Options Computer Hardware High Performance Computing Computational Finance Scientific Computing Mathematical Finance Parallel Programming
326	Autonomic Cloud Resource Management Tunc, Cihan January 2015 (has links) The power consumption of data centers and cloud systems has increased almost three times between 2007 and 2012. The traditional resource allocation methods are typically designed for high performance as the primary objective to support peak resource requirements. However, it is shown that server utilization is between 12% and 18%, while the power consumption is close to those at peak loads. Hence, there is a pressing need for devising sophisticated resource management approaches. State of the art dynamic resource management schemes typically rely on only a single resource such as core number, core speed, memory, disk, and network. There is a lack of fundamental research on methods addressing dynamic management of multiple resources and properties with the objective of allocating just enough resources for each workload to meet quality of service requirements while optimizing for power consumption. The main focus of this dissertation is to simultaneously manage power and performance for large cloud systems. The objective of this research is to develop a framework of performance and power management and investigate a general methodology for an integrated autonomic cloud management. In this dissertation, we developed an autonomic management framework based on a novel data structure, AppFlow, used for modeling current and near-term future cloud application behavior. We have developed the following capabilities for the performance and power management of the cloud computing systems: 1) online modeling and characterizing the cloud application behavior and resource requirements; 2) predicting the application behavior to proactively optimize its operations at runtime; 3) a holistic optimization methodology for performance and power using number of cores, CPU frequency, and memory amount; and 4) an autonomic cloud management to support the dynamic change in VM configurations at runtime to simultaneously optimize multiple objectives including performance, power, availability, etc. We validated our approach using RUBiS benchmark (emulating eBay), on an IBM HS22 blade server. Our experimental results showed that our approach can lead to a significant reduction in power consumption upto 87% when compared to the static resource allocation strategy, 72% when compared to adaptive frequency scaling strategy, and 66% when compared to a multi-resource management strategy. cloud computing cloud resource management high performance computing power minimization ppw Electrical & Computer Engineering autonomic computing
327	A Heterogeneous, Purpose Built Computer Architecture For Accelerating Biomolecular Simulation Madill, Christopher Andre 09 June 2011 (has links) Molecular dynamics (MD) is a powerful computer simulation technique providing atomistic resolution across a broad range of time scales. In the past four decades, researchers have harnessed the exponential growth in computer power and applied it to the simulation of diverse molecular systems. Although MD simulations are playing an increasingly important role in biomedical research, sampling limitations imposed by both hardware and software constraints establish a \textit{de facto} upper bound on the size and length of MD trajectories. While simulations are currently approaching the hundred-thousand-atom, millisecond-timescale mark using large-scale computing centres optimized for general-purpose data processing, many interesting research topics are still beyond the reach of practical computational biophysics efforts. The purpose of this work is to design a high-speed MD machine which outperforms standard simulators running on commodity hardware or on large computing clusters. In pursuance of this goal, an MD-specific computer architecture is developed which tightly couples the fast processing power of Field-Programmable Gate Array (FPGA) computer chips with a network of high-performance CPUs. The development of this architecture is a multi-phase approach. Core MD algorithms are first analyzed and deconstructed to identify the computational bottlenecks governing the simulation rate. High-speed, parallel algorithms are subsequently developed to perform the most time-critical components in MD simulations on specialized hardware much faster than is possible with general-purpose processors. Finally, the functionality of the hardware accelerators is expanded into a fully-featured MD simulator through the integration of novel parallel algorithms running on a network of CPUs. The developed architecture enabled the construction of various prototype machines running on a variety of hardware platforms which are explored throughout this thesis. Furthermore, simulation models are developed to predict the rate of acceleration using different architectural configurations and molecular systems. With initial acceleration efforts focused primarily on expensive van der Waals and Coulombic force calculations, an architecture was developed whereby a single machine achieves the performance equivalent of an 88-core InfiniBand-connected network of CPUs. Finally, a methodology to successively identify and accelerate the remaining time-critical aspects of MD simulations is developed. This design leads to an architecture with a projected performance equivalent of nearly 150 CPU-cores, enabling supercomputing performance in a single computer chassis, plugged into a standard wall socket. molecular dynamics biomolecular simulation FPGA HPRC high performance reconfigurable computer HPC high performance computing computer architecture 0487
328	Storage and aggregation for fast analytics systems Amur, Hrishikesh 13 January 2014 (has links) Computing in the last decade has been characterized by the rise of data- intensive scalable computing (DISC) systems. In particular, recent years have wit- nessed a rapid growth in the popularity of fast analytics systems. These systems exemplify a trend where queries that previously involved batch-processing (e.g., run- ning a MapReduce job) on a massive amount of data, are increasingly expected to be answered in near real-time with low latency. This dissertation addresses the problem that existing designs for various components used in the software stack for DISC sys- tems do not meet the requirements demanded by fast analytics applications. In this work, we focus specifically on two components: 1. Key-value storage: Recent work has focused primarily on supporting reads with high throughput and low latency. However, fast analytics applications require that new data entering the system (e.g., new web-pages crawled, currently trend- ing topics) be quickly made available to queries and analysis codes. This means that along with supporting reads efficiently, these systems must also support writes with high throughput, which current systems fail to do. In the first part of this work, we solve this problem by proposing a new key-value storage system – called the WriteBuffer (WB) Tree – that provides up to 30× higher write per- formance and similar read performance compared to current high-performance systems. 2. GroupBy-Aggregate: Fast analytics systems require support for fast, incre- mental aggregation of data for with low-latency access to results. Existing techniques are memory-inefficient and do not support incremental aggregation efficiently when aggregate data overflows to disk. In the second part of this dis- sertation, we propose a new data structure called the Compressed Buffer Tree (CBT) to implement memory-efficient in-memory aggregation. We also show how the WB Tree can be modified to support efficient disk-based aggregation. Key-value storage GroupBy-Aggregate Resource efficiency Write-optimized data structures Real-time data processing High performance computing
329	Approche par la simulation pour la gestion de ressources / Simulation approach for resource management Poquet, Millian 19 December 2017 (has links) Les plateformes de calcul se multiplient, grandissent en taille et gagnent encomplexité.De nombreux défis restent à relever pour construire les prochaines générationsde plateformes, mais exploiter cesdites plateformes est également un défi en soi.Des contraintes comme la consommation énergétique, les mouvement de donnéesou la résilience risquent de devenir prépondérantes et de s'ajouter à lacomplexité actuelle de la gestion des plateformes.Les méthodes de gestion de ressources peuvent également évoluer avec laconvergence des différents types de plateformes distribuées.Les gestionnaires de ressources sont des systèmes critiques au cœur desplateformes qui permettent aux utilisateurs d'exploiter les ressources.Les faire évoluer est nécessaire pour exploiter au mieux lesressources en prenant en compte ces nouvelles contraintes.Ce processus d'évolution est risqué et nécessite de nombreuses itérationsqu'il semble peu raisonnable de réaliser in vivo tant les coûts impliquéssont importants.La simulation, beaucoup moins coûteuse, est généralement préférée pourfaire ce type d'études mais pose des questions quant au réalisme des résultatsainsi obtenus.La première contribution de cette thèse est de proposer une méthode desimulation modulaire pour étudier les gestionnaires de ressources et leurévolution --- ainsi que le simulateur résultant nommé Batsim.L'idée principale est de séparer fortement la simulation et les algorithmes deprise de décision.Cela permet une séparation des préoccupations puisque les algorithmes,quels qu'ils soient, peuvent bénéficier d'une simulation validée proposantdifférents niveaux de réalisme.Cette méthode simplifie la mise en production de nouvelles politiquespuisque des codes issus à la fois de gestionnaires de ressources de productionet de prototypes académiques peuvent être étudiés dans le même contexte.La méthode de simulation proposée est illustrée dans la seconde partie de cettethèse, qui s'intéresse à des problèmes de gestion de ressourcesnon clairvoyants mêlant optimisation des performances et de laconsommation énergétique.Différents algorithmes sont d'abord proposés et étudiés afin de respecter unbudget d'énergie pendant une période de temps donnée.Nous étudions ensuite plus généralement les différents compromis réalisablesentre performances et énergie grâce à différentes politiques d'extinction denœuds de calcul. / Computing platforms increasingly grow in power and complexity.Numerous challenges remain to build next generations of platforms,but exploiting the platforms is a challenge per se.Constraints such as energy consumption, data movements and resiliencerisk to initiate breaking points in the way that the platforms aremanaged --- especially with the convergence of the different types ofdistributed platforms.Resource and Jobs Management Systems (RJMSs) are critical middlewaresthat allow users to exploit the resources of such platforms.They must evolve to make the best use of the computing platforms whilecomplying with these new constraints.Each evolution ideally require many iterations, but conducting them in vivois not reasonable due to huge overhead.Simulation is an efficient way to tackle the subsequent problems,but particular caution must be taken when drawing results from simulationas using ill-suited models may lead to invalid results.The first contribution of this thesis is the proposition of a modularsimulation methodology to study RJMSs and their evolution realistically --- andthe related simulator Batsim.The main idea is to strongly separate the simulation from the decision-makingalgorithms.This allows separation of concerns as any algorithm can benefit from a validatedsimulation with multiple levels of realism (features, accuracy of the models).This methodology improves the production launch of new policies since bothacademic prototypes and production RJMSs can be studied in the same context.Batsim is used in the second part of this thesis,which focuses on online and non-clairvoyant resource management policies tosave energy.Several algorithms are first proposed and analyzed to maximize performancesunder an energy budget for a given time period.This thesis then explores more generally possible energy and performancestrade-offs that can be obtained with node shutdown techniques. Gestion de ressources Simulation Calcul haute performance Modularité Énergie Resource management Simulation High performance computing Separation of concerns Energy 004
330	Dynamic load-balancing : a new strategy for weather forecast models Rodrigues, Eduardo Rocha January 2011 (has links) Weather forecasting models are computationally intensive applications and traditionally they are executed in parallel machines. However, some issues prevent these models from fully exploiting the available computing power. One of such issues is load imbalance, i.e., the uneven distribution of load across the processors of the parallel machine. Since weather models are typically synchronous applications, that is, all tasks synchronize at every time-step, the execution time is determined by the slowest task. The causes of such imbalance are either static (e.g. topography) or dynamic (e.g. shortwave radiation, moving thunderstorms). Various techniques, often embedded in the application’s source code, have been used to address both sources. However, these techniques are inflexible and hard to use in legacy codes. In this thesis, we explore the concept of processor virtualization for dynamically balancing the load in weather models. This means that the domain is over-decomposed in more tasks than the available processors. Assuming that many tasks can be safely executed in a single processor, each processor is put in charge of a set of tasks. In addition, the system can migrate some of them from overloaded processors to underloaded ones when it detects load imbalance. This approach has the advantage of decoupling the application from the load balancing strategy. Our objective is to show that processor virtualization can be applied to weather models as long as an appropriate strategy for migrations is used. Our proposal takes into account the communication pattern of the application in addition to the load of each processor. In this text, we present the techniques used to minimize the amount of change needed in order to apply processor virtualization to a real-world application. Furthermore, we analyze the effects caused by the frequency at which the load balancer is invoked and a threshold that activates rebalancing. We propose an automatic strategy to find an optimal threshold to trigger load balancing. These strategies are centralized and work well for moderately large machines. For larger machines, we present a fully distributed algorithm and analyze its performance. As a study case, we demonstrate the effectiveness of our approach for dynamically balancing the load in Brams, a mesoscale weather forecasting model based on MPI parallelization. We choose this model because it presents a considerable load imbalance caused by localized thunderstorms. In addition, we analyze how other effects of processor virtualization can improve performance. Processamento paralelo Metereologia Processamento : Alto desempenho High performance computing Dynamic load balancing Weather forecast models Processor virtualization

Search results