Global ETD Search

411	Data services: bringing I/O processing to petascale Abbasi, Mohammad Hasan 08 July 2011 (has links) The increasing size of high performance computing systems and the associated increase in the volume of generated data, has resulted in an I/O bottleneck for these applications. This bottleneck is further exacerbated by the imbalance in the growth of processing capability compared to storage capability, due mainly to the power and cost requirements of scaling the storage. This thesis introduces data services, a new abstraction which provides significant benefits for data intensive applications. Data services combine low overhead data movement with flexible placement of data manipulation operations, to address the I/O challenges of leadership class scientific applications. The impact of asynchronous data movement on application runtime is minimized by utilizing novel server side data movement schedulers to avoid contention related jitter in application communication. Additionally, the JITStager component is presented. Utilizing dynamic code generation and flexible code placement, the JITStager allows data services to be executed as a pipeline extending from the application to storage. It is shown in this thesis that data services can add new functionality to the application without having an significant negative impact on performance. I/O Scientific computing Data services I/O frameworks Infiniband Cray Supercomputing Data HPC Petaflops computers High performance computing Electronic data processing Computer science
412	Coordinated system level resource management for heterogeneous many-core platforms Gupta, Vishakha 24 August 2011 (has links) A challenge posed by future computer architectures is the efficient exploitation of their many and sometimes heterogeneous computational cores. This challenge is exacerbated by the multiple facilities for data movement and sharing across cores resident on such platforms. To answer the question of how systems software should treat heterogeneous resources, this dissertation describes an approach that (1) creates a common manageable pool for all the resources present in the platform, and then (2) provides virtual machines (VMs) with multiple `personalities', flexibly mapped to and efficiently run on the heterogeneous underlying hardware. A VM's personality is its execution context on the different types of available processing resources usable by the VM. We provide mechanisms for making such platforms manageable and evaluate coordinated scheduling policies for mapping different VM personalities on heterogeneous hardware. Towards that end, this dissertation contributes technologies that include (1) restructuring hypervisor and system functions to create high performance environments that enable flexibility of execution and data sharing, (2) scheduling and other resource management infrastructure for supporting diverse application needs and heterogeneous platform characteristics, and (3) hypervisor level policies to permit efficient and coordinated resource usage and sharing. Experimental evaluations on multiple heterogeneous platforms, like one comprised of x86-based cores with attached NVIDIA accelerators and others with asymmetric elements on chip, demonstrate the utility of the approach and its ability to efficiently host diverse applications and resource management methods. Coordinated scheduling Heterogeneous many-core systems Asymmetric multi-cores Virtualization Kinship model Performance points Virtual computer systems Computing platforms Computer architecture Heterogeneous computing High performance computing
413	High-performance computer system architectures for embedded computing Lee, Dongwon 26 August 2011 (has links) The main objective of this thesis is to propose new methods for designing high-performance embedded computer system architectures. To achieve the goal, three major components - multi-core processing elements (PEs), DRAM main memory systems, and on/off-chip interconnection networks - in multi-processor embedded systems are examined in each section respectively. The first section of this thesis presents architectural enhancements to graphics processing units (GPUs), one of the multi- or many-core PEs, for improving performance of embedded applications. An embedded application is first mapped onto GPUs to explore the design space, and then architectural enhancements to existing GPUs are proposed for improving throughput of the embedded application. The second section proposes high-performance buffer mapping methods, which exploit useful features of DRAM main memory systems, in DSP multi-processor systems. The memory wall problem becomes increasingly severe in multiprocessor environments because of communication and synchronization overheads. To alleviate the memory wall problem, this section exploits bank concurrency and page mode access of DRAM main memory systems for increasing the performance of multiprocessor DSP systems. The final section presents a network-centric Turbo decoder and network-centric FFT processors. In the era of multi-processor systems, an interconnection network is another performance bottleneck. To handle heavy communication traffic, this section applies a crossbar switch - one of the indirect networks - to the parallel Turbo decoder, and applies a mesh topology to the parallel FFT processors. When designing the mesh FFT processors, a very different approach is taken to improve performance; an optical fiber is used as a new interconnection medium. Turbo decoding GPU architecture SDF graph DRAM system Embedded computer systems High performance computing Electronic data processing
414	Parallel algorithms for direct blood flow simulations Rahimian, Abtin 21 February 2012 (has links) Fluid mechanics of blood can be well approximated by a mixture model of a Newtonian fluid and deformable particles representing the red blood cells. Experimental and theoretical evidence suggests that the deformation and rheology of red blood cells is similar to that of phospholipid vesicles. Vesicles and red blood cells are both area preserving closed membranes that resist bending. Beyond red blood cells, vesicles can be used to investigate the behavior of cell membranes, intracellular organelles, and viral particles. Given the importance of vesicle flows, in this thesis we focus in efficient numerical methods for such problems: we present computationally scalable algorithms for the simulation of dilute suspension of deformable vesicles in two and three dimensions. Our method is based on the boundary integral formulation of Stokes flow. We present new schemes for simulating the three-dimensional hydrodynamic interactions of large number of vesicles with viscosity contrast. The algorithms incorporate a stable time-stepping scheme, high-order spatiotemporal discretizations, spectral preconditioners, and a reparametrization scheme capable of resolving extreme mesh distortions in dynamic simulations. The associated linear systems are solved in optimal time using spectral preconditioners. The highlights of our numerical scheme are that (i) the physics of vesicles is faithfully represented by using nonlinear solid mechanics to capture the deformations of each cell, (ii) the long-range, N-body, hydrodynamic interactions between vesicles are accurately resolved using the fast multipole method (FMM), and (iii) our time stepping scheme is unconditionally stable for the flow of single and multiple vesicles with viscosity contrast and its computational cost-per-simulation-unit-time is comparable to or less than that of an explicit scheme. We report scaling of our algorithms to simulations with millions of vesicles on thousands of computational cores. Numerical simulation Spectral methods High performance computing Vesicles Boundary integral Stokes flow Fluid mechanics Algorithms Computer algorithms Fluid dynamics Computational fluid dynamics
415	Advanced Memory Data Structures for Scalable Event Trace Analysis Knüpfer, Andreas 17 April 2009 (has links) (PDF) The thesis presents a contribution to the analysis and visualization of computational performance based on event traces with a particular focus on parallel programs and High Performance Computing (HPC). Event traces contain detailed information about speciﬁed incidents (events) during run-time of programs and allow minute investigation of dynamic program behavior, various performance metrics, and possible causes of performance ﬂaws. Due to long running and highly parallel programs and very ﬁne detail resolutions, event traces can accumulate huge amounts of data which become a challenge for interactive as well as automatic analysis and visualization tools. The thesis proposes a method of exploiting redundancy in the event traces in order to reduce the memory requirements and the computational complexity of event trace analysis. The sources of redundancy are repeated segments of the original program, either through iterative or recursive algorithms or through SPMD-style parallel programs, which produce equal or similar repeated event sequences. The data reduction technique is based on the novel Complete Call Graph (CCG) data structure which allows domain speciﬁc data compression for event traces in a combination of lossless and lossy methods. All deviations due to lossy data compression can be controlled by constant bounds. The compression of the CCG data structure is incorporated in the construction process, such that at no point substantial uncompressed parts have to be stored. Experiments with real-world example traces reveal the potential for very high data compression. The results range from factors of 3 to 15 for small scale compression with minimum deviation of the data to factors &gt; 100 for large scale compression with moderate deviation. Based on the CCG data structure, new algorithms for the most common evaluation and analysis methods for event traces are presented, which require no explicit decompression. By avoiding repeated evaluation of formerly redundant event sequences, the computational effort of the new algorithms can be reduced in the same extent as memory consumption. The thesis includes a comprehensive discussion of the state-of-the-art and related work, a detailed presentation of the design of the CCG data structure, an elaborate description of algorithms for construction, compression, and analysis of CCGs, and an extensive experimental validation of all components. / Diese Dissertation stellt einen neuartigen Ansatz für die Analyse und Visualisierung der Berechnungs-Performance vor, der auf dem Ereignis-Tracing basiert und insbesondere auf parallele Programme und das Hochleistungsrechnen (High Performance Computing, HPC) zugeschnitten ist. Ereignis-Traces (Ereignis-Spuren) enthalten detaillierte Informationen über spezifizierte Ereignisse während der Laufzeit eines Programms und erlauben eine sehr genaue Untersuchung des dynamischen Verhaltens, verschiedener Performance-Metriken und potentieller Performance-Probleme. Aufgrund lang laufender und hoch paralleler Anwendungen und dem hohen Detailgrad kann das Ereignis-Tracing sehr große Datenmengen produzieren. Diese stellen ihrerseits eine Herausforderung für interaktive und automatische Analyse- und Visualisierungswerkzeuge dar. Die vorliegende Arbeit präsentiert eine Methode, die Redundanzen in den Ereignis-Traces ausnutzt, um sowohl die Speicheranforderungen als auch die Laufzeitkomplexität der Trace-Analyse zu reduzieren. Die Ursachen für Redundanzen sind wiederholt ausgeführte Programmabschnitte, entweder durch iterative oder rekursive Algorithmen oder durch SPMD-Parallelisierung, die gleiche oder ähnliche Ereignis-Sequenzen erzeugen. Die Datenreduktion basiert auf der neuartigen Datenstruktur der &quot;Vollständigen Aufruf-Graphen&quot; (Complete Call Graph, CCG) und erlaubt eine Kombination von verlustfreier und verlustbehafteter Datenkompression. Dabei können konstante Grenzen für alle Abweichungen durch verlustbehaftete Kompression vorgegeben werden. Die Datenkompression ist in den Aufbau der Datenstruktur integriert, so dass keine umfangreichen unkomprimierten Teile vor der Kompression im Hauptspeicher gehalten werden müssen. Das enorme Kompressionsvermögen des neuen Ansatzes wird anhand einer Reihe von Beispielen aus realen Anwendungsszenarien nachgewiesen. Die dabei erzielten Resultate reichen von Kompressionsfaktoren von 3 bis 5 mit nur minimalen Abweichungen aufgrund der verlustbehafteten Kompression bis zu Faktoren &gt; 100 für hochgradige Kompression. Basierend auf der CCG_Datenstruktur werden außerdem neue Auswertungs- und Analyseverfahren für Ereignis-Traces vorgestellt, die ohne explizite Dekompression auskommen. Damit kann die Laufzeitkomplexität der Analyse im selben Maß gesenkt werden wie der Hauptspeicherbedarf, indem komprimierte Ereignis-Sequenzen nicht mehrmals analysiert werden. Die vorliegende Dissertation enthält eine ausführliche Vorstellung des Stands der Technik und verwandter Arbeiten in diesem Bereich, eine detaillierte Herleitung der neu eingeführten Daten-strukturen, der Konstruktions-, Kompressions- und Analysealgorithmen sowie eine umfangreiche experimentelle Auswertung und Validierung aller Bestandteile. High Performance Computing Performance Analysis Event Tracing Data Compression Complete Call Graph Hochleistungsrechnen Performance-Analyse Ereignis-Spuren Datenkompression Vollständige Aufruf-Graphen ddc:004 rvk:ST 150 rvk:ST 284
416	Modeling of an adaptive parallel system with malleable applications in a distributed computing environment Ghafoor, Sheikh Khaled, January 2007 (has links) Thesis (Ph.D.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
417	Intelligent text recognition system on a heterogeneous multi-core processor cluster a performance profile and architecture exploration / Ritholtz, Lee. January 2009 (has links) Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical and Computer Engineering, 2009. / Includes bibliographical references.
418	Simulations of pulsatile flow through bileaflet mechanical heart valves using a suspension flow model: to assess blood damage Yun, Brian Min 08 June 2015 (has links) Defective or diseased native valves have been replaced by bileaflet mechanical heart valves (BMHVs) for many years. However, severe complications still exist, and thus blood damage that occurs in BMHV flows must be well understood. The aim of this research is to numerically study platelet damage that occurs in BMHV flows. The numerical suspension flow method combines lattice-Boltzmann fluid modeling with the external boundary force method. This method is validated as a general suspension flow solver, and then validated against experimental BMHV flow data. Blood damage is evaluated for a physiologic adult case of BMHV flow and then for BMHVs with pediatric sizing and flow conditions. Simulations reveal intricate, small-scale BMHV flow features, and the presence of turbulence in BMHV flow. The results suggest a shift from previous evaluations of instantaneous flow to the determination of long-term flow recirculation regions when assessing thromboembolic potential. Sharp geometries that may induce these recirculation regions should be avoided in device design. Simulations for predictive assessment of pediatric sized valves show increased platelet damage values for potential pediatric valves. However, damage values do not exceed platelet activation thresholds, and highly damaged platelets are found far from the valve. Thus, the increased damage associated with resized valves is not such that pediatric valve development should be hindered. This method can also be used as a generic tool for future evaluation of novel prosthetic devices or cardiovascular flow problems. Biomedical flows Prosthetic devices Cardiovascular flows Blood damage Lattice-Boltzmann methods Computational fluid dynamics Vorticity dynamics Suspension flows Computational methods High performance computing
419	Design by transformation : from domain knowledge to optimized program generation Marker, Bryan Andrew 20 June 2014 (has links) Expert design knowledge is essential to develop a library of high-performance software. This includes how to implement and parallelize domain operations, how to optimize implementations, and estimates of which implementation choices are best. An expert repeatedly applies his knowledge, often in a rote and tedious way, to develop all of the related functionality expected from a domain-specific library. Expert knowledge is hard to gain and is easily lost over time when an expert forgets or when a new engineer starts developing code. The domain of dense linear algebra (DLA) is a prime example with software that is so well designed that much of experts' important work has become tediously rote in many ways. In this dissertation, we demonstrate how one can encode design knowledge for DLA so it can be automatically applied to generate code as an expert would or to generate better code. Further, the knowledge is encoded for perpetuity, so it can be reused to make implementing functionality on new hardware easier or it can be used to teach how software is designed to a non-expert. We call this approach to software engineering (encoding expert knowledge and automatically applying it) Design by Transformation (DxT). We present our vision, the methodology, a prototype code generation system, and possibilities when applying DxT to the domain of dense linear algebra. / text Automatic program generation Software engineering Code transformation High-performance computing Distributed-memory computing Shared-memory computing Dense linear algebra Design by transformation
420	A computational framework for the solution of infinite-dimensional Bayesian statistical inverse problems with application to global seismic inversion Martin, James Robert, Ph. D. 18 September 2015 (has links) Quantifying uncertainties in large-scale forward and inverse PDE simulations has emerged as a central challenge facing the field of computational science and engineering. The promise of modeling and simulation for prediction, design, and control cannot be fully realized unless uncertainties in models are rigorously quantified, since this uncertainty can potentially overwhelm the computed result. While statistical inverse problems can be solved today for smaller models with a handful of uncertain parameters, this task is computationally intractable using contemporary algorithms for complex systems characterized by large-scale simulations and high-dimensional parameter spaces. In this dissertation, I address issues regarding the theoretical formulation, numerical approximation, and algorithms for solution of infinite-dimensional Bayesian statistical inverse problems, and apply the entire framework to a problem in global seismic wave propagation. Classical (deterministic) approaches to solving inverse problems attempt to recover the “best-fit” parameters that match given observation data, as measured in a particular metric. In the statistical inverse problem, we go one step further to return not only a point estimate of the best medium properties, but also a complete statistical description of the uncertain parameters. The result is a posterior probability distribution that describes our state of knowledge after learning from the available data, and provides a complete description of parameter uncertainty. In this dissertation, a computational framework for such problems is described that wraps around the existing forward solvers, as long as they are appropriately equipped, for a given physical problem. Then a collection of tools, insights and numerical methods may be applied to solve the problem, and interrogate the resulting posterior distribution, which describes our final state of knowledge. We demonstrate the framework with numerical examples, including inference of a heterogeneous compressional wavespeed field for a problem in global seismic wave propagation with 10⁶ parameters. Bayesian inference Infinite-dimensional inverse problems Uncertainty quantification Low-rank approximation Optimality Scalable algorithms High performance computing Markov chain Monte Carlo Stochastic Newton Seismic wave propagation

Search results