Global ETD Search

1	Interactive Supercomputing with MIT Matlab Husbands, Parry, Isbell, Charles Lee, Jr., Edelman, Alan 28 July 1998 (has links) This paper describes MITMatlab, a system that enables users of supercomputers or networked PCs to work on large data sets within Matlab transparently. MITMatlab is based on the Parallel Problems Server (PPServer), a standalone 'linear algebra server' that provides a mechanism for running distributed memory algorithms on large data sets. The PPServer and MITMatlab enable high-performance interactive supercomputing. With such a tool, researchers can now use Matlab as more than a prototyping tool for experimenting with small problems. Instead, MITMatlab makes is possible to visualize and operate interactively on large data sets. This has implications not only in supercomputing, but for Artificial Intelligence applicatons such as Machine Learning, Information Retrieval and Image Processing. Supercomputing
2	Scalability-Driven Approaches to Key Aspects of the Message Passing Interface for Next Generation Supercomputing Zounmevo, Ayi Judicael 23 May 2014 (has links) The Message Passing Interface (MPI), which dominates the supercomputing programming environment, is used to orchestrate and fulfill communication in High Performance Computing (HPC). How far HPC programs can scale depends in large part on the ability to achieve fast communication; and to overlap communication with computation or communication with communication. This dissertation proposes a new asynchronous solution to the nonblocking Rendezvous protocol used between pairs of processes to transfer large payloads. On top of enforcing communication/computation overlapping in a comprehensive way, the proposal trumps existing network device-agnostic asynchronous solutions by being memory-scalable and by avoiding brute force strategies. Achieving overlapping between communication and computation is important; but each communication is also expected to generate minimal latency. In that respect, the processing of the queues meant to hold messages pending reception inside the MPI middleware is expected to be fast. Currently though, that processing slows down when program scales grow. This research presents a novel scalability-driven message queue whose processing skips altogether large portions of queue items that are deterministically guaranteed to lead to unfruitful searches. For having little sensitivity to program sizes, the proposed message queue maintains a very good performance, on top of displaying a low and flattening memory footprint growth pattern. Due to the blocking nature of its required synchronizations, the one-sided communication model of MPI creates both communication/computation and communication/communication serializations. This research fixes these issues and latency-related inefficiencies documented for MPI one-sided communications by proposing completely nonblocking and non-serializing versions for those synchronizations. The improvements, meant for consideration in a future MPI standard, also allow new classes of programs to be more efficiently expressed in MPI. Finally, a persistent distributed service is designed over MPI to show its impacts at large scales beyond communication-only activities. MPI is analyzed in situations of resource exhaustion, partial failure and heavy use of internal objects for communicating and non-communicating routines. Important scalability issues are revealed and solution approaches are put forth. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2014-05-23 15:08:58.56 HPC MPI Scalability Supercomputing
3	Data-flow vs control-flow for extreme level computing Evripidou, P., Kyriacou, Costas January 2013 (has links) No / This paper challenges the current thinking for building High Performance Computing (HPC) Systems, which is currently based on the sequential computing also known as the von Neumann model, by proposing the use of Novel systems based on the Dynamic Data-Flow model of computation. The switch to Multi-core chips has brought the Parallel Processing into the mainstream. The computing industry and research community were forced to do this switch because they hit the Power and Memory walls. Will the same happen with HPC? The United States through its DARPA agency commissioned a study in 2007 to determine what kind of technologies will be needed to build an Exaflop computer. The head of the study was very pessimistic about the possibility of having an Exaflop computer in the foreseeable future. We believe that many of the findings that caused the pessimistic outlook were due to the limitations of the sequential model. A paradigm shift might be needed in order to achieve the affordable Exascale class Supercomputers. Supercomputing; Data-flow; HPC; Exascale
4	Compiler-assisted staggered checkpointing Norman, Alison Nicholas 23 November 2010 (has links) To make progress in the face of failures, long-running parallel applications need to save their state, known as a checkpoint. Unfortunately, current checkpointing techniques are becoming untenable on large-scale supercomputers. Many applications checkpoint all processes simultaneously--a technique that is easy to implement but often saturates the network and file system, causing a significant increase in checkpoint overhead. This thesis introduces compiler-assisted staggered checkpointing, where processes checkpoint at different places in the application text, thereby reducing contention for the network and file system. This checkpointing technique is algorithmically challenging since the number of possible solutions is enormous and the number of desirable solutions is small, but we have developed a compiler algorithm that both places staggered checkpoints in an application and ensures that the solution is desirable. This algorithm successfully places staggered checkpoints in parallel applications configured to use tens of thousands of processes. For our benchmarks, this algorithm successfully finds and places useful recovery lines that are up to 37% faster for all configurations than recovery lines where all processes write their data at approximately the same time. We also analyze the success of staggered checkpointing by investigating sets of application and system characteristics for which it reduces network and file system contention. We find that for many configurations, staggered checkpointing reduces both checkpointing time and overall execution time. To perform these analyses, we develop an event-driven simulator for large-scale systems that estimates the behavior of the network, global file system, and local hardware using predictive models. Our simulator allows us to accurately study applications that have thousands of processes; it on average predicts execution times as 83% of their measured value. / text Supercomputing Checkpointing Simulator Large-scale parallel applications
5	Erarbeitung einer grafischen Benutzerschnittstelle fuer das Intensive Computing Schumann, Merten 21 June 1995 (has links) (PDF) Entwicklung einer grafischen Nutzerschnittstelle auf der Basis von WWW, um Jobs fuer das Batchsystem DQS zu aktivieren. Supercomputing Apache CGI WWW Jobsystem Batchmanagement DQS ddc:004 Parallelrechner
6	HPC scheduling in a brave new world Gonzalo P., Rodrigo January 2017 (has links) Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources. / <p>Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.</p> High Performance Computing HPC supercomputing scheduling workflows workloads exascale Computer Science Datavetenskap (datalogi)
7	Isolation of Temporary Storage in High Performance Computing via Linux Namespacing Satchwell, Steven Tanner 01 June 2018 (has links) Per job isolation of temporary file storage in High Performance Computing (HPC) environments provide benefits in security, efficiency, and administration. HPC system administrators can use the mount_isolation Slurm task plugin to improve security by isolating temporary files where no isolation previously existed. The mount_isolation plugin also increases efficiency by removing obsolete temporary files immediately after each job terminates. This frees valuable disk space in the HPC environment to be used by other jobs. These two improvements reduce the amount of work system administrators must expend to ensure temporary files are removed in a timely manner.Previous temporary file removal solutions were removal on reboot, manual removal, or removal through a Slurm epilog script. The epilog script was the most effective of these, allowing files to be removed in a timely manner. However, HPC users can have multiple supercomputing jobs running concurrently. Temporary files generated by these concurrent or overlapping jobs are only deleted by the epilog script when all jobs run by that user on the compute node have completed. Even though the user may have only one running job, the temporary directory may still contain temporary files from many previously executed jobs, taking up valuable temporary storage on the compute node. The mount_isolation plugin isolates these temporary files on a per job basis allowing prompt removal of obsolete files regardless of job overlap. bind mounts mount namespaces Slurm supercomputing HPC temporary storage Science and Technology Studies
8	High Performance Computing as a Combination of Machines and Methods and Programming Tadonki, Claude 16 May 2013 (has links) (PDF) High Performance Computing (HPC) aims at providing reasonably fast computing solutions to both scientific and real life technical problems. Many efforts have indeed been made on the way to powerful supercomputers, both generic and customized configurations. However, whatever their current and future breathtaking capabilities, supercomputers work by brute force and deterministic steps, while human mind works by few strokes of brilliance. Thus, in order to take a significant advantage of hardware advances, we need powerful methods to solve problems together with highly skillful programming efforts and relevant frameworks. The advent of multicore architectures is noteworthy in the HPC history, because it has brought the underlying concept of multiprocessing into common consideration and has changed the landscape of standard computing. At a larger scale, there is a keen desire to build or host frontline supercomputers. The yearly Top500 ranking nicely illustrates and orchestrates this supercomputers saga. For many years, computers have been falling in price while gaining processing power often strengthened by specialized accelerator units. We clearly see that what commonly springs up in mind when it comes to HPC is computer capability. However, this availability of increasingly fast computers has changed the rule of scientific discovery and has motivated the consideration of challenging applications. Thus, we are routinely at the door of large-scale problems, and most of time, the speed of calculation by itself is no longer sufficient. Indeed, the real concern of HPC users is the time-to-output. Thus, we need to study each important aspect in the critical path between inputs and outputs, and keep striving to reach the expected level of performance. This is the main concern of the viewpoints and the achievements reported in this book. The document is organized into five chapters articulated around our main contributions. The first chapter depicts the landscape of supercomputers, comments the need for tremendous processing speed, and analyze the main trends in supercomputing. The second chapter deals with solving large-scale combinatorial problems through a mixture of continuous and discrete optimization methods, we describe the main generic approaches and present an important framework on which we have been working so far. The third chapter is devoted to the topic accelerated computing, we discuss the motivations and the issues, and we describe three case studies from our contributions. In chapter four, we address the topic of energy minimization in a formal way and present our method based on a mathematical programming approach. Chapter five debates on hybrid supercomputing, we discuss technical issues with hierarchical shared memories and illustrate hybrid coding through a large-scale linear algebra implementation on a supercomputer. architecture optimization Accelerated computing CELL Hybrid supercomputing
9	Performance analysis and modeling of GYRO Lively, Charles Wesley, III 30 October 2006 (has links) Efficient execution of scientific applications requires an understanding of how system features impact the performance of the application. Performance models provide significant insight into the performance relationships between an application and the system used for execution. In particular, models can be used to predict the relative performance of different systems used to execute an application. Recently, a significant effort has been devoted to gaining a more detailed understanding of the performance characteristics of a fusion reaction application, GYRO. GYRO is a plasma-physics application used to gain a better understanding of the interaction of ions and electrons in fusion reactions. In this thesis, we use the well-known Prophesy system to analyze and model the performance of GYRO across various supercomputer platforms. Using processor partitioning, we determine that utilizing the smallest number of processors per node is the most effective processor configuration for executing the application. Further, we explore trends in kernel coupling values across platforms to understand how kernels of GYRO interact. In this work, experiments are conducted on the supercomputers Seaborg and Jacquard at the DOE National Energy Research Scientific Computing Center and the supercomputers DataStar P655 and P690 at the San Diego Supercomputing Center. Across all four platforms, our results show that utilizing one processor per node (ppn) yields better performance than full or half ppn usage. Our experimental results also show that using kernel coupling to model and predict the performance of GYRO is more accurate than summation. On average, kernel coupling provides for prediction estimates that have less than a 7% error. The performance relationship between kernel coupling values and the sharing of information throughout the GYRO application is explored by understanding the global communication within the application and data locality. performance supercomputing GYRO kernel coupling processor partitioning performance prediction MPI modeling
10	The Case For Hardware Overprovisioned Supercomputers Patki, Tapasya January 2015 (has links) Power management is one of the most critical challenges on the path to exascale supercomputing. High Performance Computing (HPC) centers today are designed to be worst-case power provisioned, leading to two main problems: limited application performance and under-utilization of procured power. In this dissertation we introduce hardware overprovisioning: a novel, flexible design methodology for future HPC systems that addresses the aforementioned problems and leads to significant improvements in application and system performance under a power constraint. We first establish that choosing the right configuration based on application characteristics when using hardware overprovisioning can improve application performance under a power constraint by up to 62%. We conduct a detailed analysis of the infrastructure costs associated with hardware overprovisioning and show that it is an economically viable supercomputing design approach. We then develop RMAP (Resource MAnager for Power), a power-aware, low-overhead, scalable resource manager for future hardware overprovisioned HPC systems. RMAP addresses the issue of under-utilized power by using power-aware backfilling and improves job turnaround times by up to 31%. This dissertation opens up several new avenues for research in power-constrained supercomputing as we venture toward exascale, and we conclude by enumerating these. Hardware Overprovisioning High Performance Computing Performance Optimization Power Supercomputing Computer Science Energy Efficiency

Search results