81 |
Introducing enhanced fully-adaptive routing decisions within Torus-Mesh and hypercube interconnect networksLydick, Christopher L. January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Don M. Gruenbacher / The method for communicating within an interconnection network, or fabric of connections between nodes, can be as diverse as are the applications which utilize them. Because of dynamic traffic loads on these interconnection networks, fully-adaptive routing algorithms have been shown to exploit locality while balancing loads and softening the effects of hot-spots. One issue which has been overlooked is the impact of data traveling along the periphery of a selected minimal routable quadrant (MRQ) within these fully-adaptive algorithms. As data aligns with the destination in the x, y, and z dimensions for instance, the data then traverses the periphery of an MRQ. For each dimension that this occurs, the data is given one less choice for routing around hotspots which could appear later along the path. By weighting the decision of selecting a next-hop by avoiding the periphery of the selected MRQ, the data then has more options for avoiding hotspots. One hybridized routing algorithm which borrows heavily from CQR (an efficient and stable fully-adaptive algorithm), is introduced within this work. Enhanced CQR with Periphery Avoidance, attempts to weight the routing decision for a next hop using both output queues and the proximity to the periphery of the MRQ. This fully-adaptive algorithm is tested using simulations and a laboratory research cluster using a USB interconnect in the hypercube topology. It is also compared against other static, oblivious, and adaptive algorithms. Thor's Tack Hammer, the Kansas State University research cluster, is also benchmarked and discussed as an inexpensive and dependable parallel system.
|
82 |
Building Reliable and Cost-Effective Storage Systems for High-Performance Computing DatacentersQiao, Zhi 08 1900 (has links)
In this dissertation, I first incorporate declustered redundant array of independent disks (RAID) technology in the existing system by maximizing the aggregated recovery I/O and accelerating post-failure remediation. Our analytical model affirms the accelerated data recovery stage significantly improves storage reliability. Then I present a proactive data protection framework that augments storage availability and reliability. It utilizes the failure prediction methods to efficiently rescue data on drives before failures occur, which significantly reduces the storage downtime and lowers the risk of nested failures. Finally, I investigate how an active storage system enables energy-efficient computing. I explore an emerging storage device named Ethernet drive to offload data-intensive workloads from the host to drives and process the data on drives. It not only minimizes data movement and power usage, but also enhances data availability and storage scalability. In summary, my dissertation research provides intelligence at the drive, storage node, and system levels to tackle the rising reliability challenge in modern HPC datacenters. The results indicate that this novel storage paradigm cost-effectively improves storage scalability, availability, and reliability.
|
83 |
Application-aware resource management for datacenters / Applikationsmedveten resurshantering för datacenterSouza, Abel Pinto Coelho de January 2018 (has links)
High Performance Computing (HPC) and Cloud Computing datacenters are extensively used to steer and solve complex problems in science, engineering, and business, such as calculating correlations and making predictions. Already in a single datacenter server, there are thousands of hardware and software metrics – Key Performance Indicators (KPIs) – that individually and aggregated can give insight in the performance, robustness, and efficiency of the datacenter and the provisioned applications. At the datacenter level, the number of KPIs is even higher. The fast growing interest on datacenter management from both public and industry together with the rapid expansion in scale and complexity of datacenter resources and the services being provided on them have made monitoring, profiling, controlling, and provisioning compute resources dynamically at runtime into a challenging and complex task. Commonly, correlations of application KPIs, like response time and throughput, with resource capacities show that runtime systems (e.g., containers or virtual machines) that are used to provision these applications do not utilize available resources efficiently. This reduces datacenter efficiency, which in term results in higher operational costs and longer waiting times for results. The goal of this thesis is to develop tools and autonomic techniques for improving datacenter operations, management and utilization, while improving and/or minimizing impacts on applications performance. To this end, we make use of application resource descriptors to create a library that dynamically adjusts the amount of resources used, enabling elasticity for scientific workflows in HPC datacenters. For mission critical applications, high availability is of great concern since these services must be kept running even in the event of system failures. By modeling and correlating specific resource counters, like CPU, memory and network utilization, with the number of runtime synchronizations, we present adaptive mechanisms to dynamically select which fault tolerant mechanism to use. Likewise, for scientific applications we propose a hybrid extensible architecture for dual-level scheduling of data intensive jobs in HPC infrastructures, allowing operational simplification, on-boarding of new types of applications and achieving greater job throughput with higher overall datacenter efficiency.
|
84 |
Computation of a Damping Matrix for Finite Element Model UpdatingPilkey, Deborah F. 26 April 1998 (has links)
The characterization of damping is important in making accurate predictions of both the true response and the frequency response of any device or structure dominated by energy dissipation. The process of modeling damping matrices and experimental verification of those is challenging because damping can not be determined via static tests as can mass and stiffness. Furthermore, damping is more difficult to determine from dynamic measurements than natural frequency. However, damping is extremely important in formulating predictive models of structures. In addition, damping matrix identification may be useful in diagnostics or health monitoring of structures.
The objective of this work is to find a robust, practical procedure to identify damping matrices. All aspects of the damping identification procedure are investigated. The procedures for damping identification presented herein are based on prior knowledge of the finite element or analytical mass matrices and measured eigendata. Alternately, a procedure is based on knowledge of the mass and stiffness matrices and the eigendata. With this in mind, an exploration into model reduction and updating is needed to make the problem more complete for practical applications. Additionally, high performance computing is used as a tool to deal with large problems. High Performance Fortran is exploited for this purpose. Finally, several examples, including one experimental example are used to illustrate the use of these new damping matrix identification algorithms and to explore their robustness. / Ph. D.
|
85 |
General Resource Management for Computationally Demanding Scientific SoftwareXinchen Guo (13965024) 17 October 2022 (has links)
<p>Many scientific problems contain nonlinear systems of equations that require multiple iterations to reach converged results. Such software pattern follows the bulk synchronous parallel model. In that sense, an iteration is a superstep, which includes computation of local data, global communication to update data for the next iteration, and synchronization between iterations. In modern HPC environments, MPI is used to distribute data and OpenMP is used to accelerate computation of each data. More MPI processes increase the cost of communication and synchronization whereas more OpenMP threads increase the overhead of multithreading. A proper combination of MPI and OpenMP is critical to accelerate each superstep. Proper orchestration of MPI processes and OpenMP threads is also needed to efficiently use the underlying hardware resources.</p>
<p> </p>
<p>Purdue’s multi-purpose nanodevice simulation tool NEMO5 distributes the computation of independent spectral points by MPI. The computation of each spectral point is accelerated with OpenMP threads. A few examples of resource utilization optimizations are presented. One type of simulation applies the non-equilibrium Green’s function method to accurately predict drug molecules. Our profiling results suggest the optimum combination has more MPI processes and fewer OpenMP threads. However, NEMO5's memory usage has large spikes for each spectral point. Such behavior limits the concurrency of spectral point calculation due to the lack of swap space on HPC nodes to prevent out-of-memory. </p>
<p><br></p>
<p>A distributed resource management framework is proposed and developed to automatically and dynamically manage memory and CPU usage. The concurrent calculation of spectral points is pipelined to avoid simultaneous peak memory usage. This allows more MPI processes and fewer OpenMP threads for higher parallel efficiency. Automatic CPU usage adjustment also reduces the time cost to fill and drain the calculation pipeline. The resource management framework requires minimum code intrusion and successfully speeds up the calculation. It can also be generalized for other simulation software.</p>
|
86 |
Redistribution of Tensors for Distributed ContractionsNikam, Akshay Machhindra 02 June 2014 (has links)
No description available.
|
87 |
A Novel System for Wireless Robotic Surgery Through the Use of Ultrasonic Tracking Coupled with Advanced Modeling TechniquesLilly, Bradford R. 09 July 2012 (has links)
No description available.
|
88 |
QoS In Parallel Job SchedulingIslam, Mohammad Kamrul 11 September 2008 (has links)
No description available.
|
89 |
Scalable Job Startup and Inter-Node Communication in Multi-Core InfiniBand ClustersSridhar, Jaidev Krishna 02 September 2009 (has links)
No description available.
|
90 |
Efficient Run-time Support For Global View Programming of Linked Data Structures on Distributed Memory Parallel SystemsLarkins, Darrell Brian 30 July 2010 (has links)
No description available.
|
Page generated in 0.1477 seconds