• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 48
  • 7
  • 4
  • 4
  • 2
  • 1
  • Tagged with
  • 89
  • 89
  • 44
  • 26
  • 18
  • 18
  • 15
  • 15
  • 15
  • 14
  • 13
  • 13
  • 11
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Popcorn Linux: enabling efficient inter-core communication in a Linux-based multikernel operating system

Shelton, Benjamin H. 31 May 2013 (has links)
As manufacturers introduce new machines with more cores, more NUMA-like architectures, and more tightly integrated heterogeneous processors, the traditional abstraction of a monolithic OS running on a SMP system is encountering new challenges.  One proposed path forward is the multikernel operating system.  Previous efforts have shown promising results both in scalability and in support for heterogeneity.  However, one effort\'s source code is not freely available (FOS), and the other effort is not self-hosting and does not support a majority of existing applications (Barrelfish). In this thesis, we present Popcorn, a Linux-based multikernel operating system.  While Popcorn was a group effort, the boot layer code and the memory partitioning code are the author\'s work, and we present them in detail here.  To our knowledge, we are the first to support multiple instances of the Linux kernel on a 64-bit x86 machine and to support more than 4 kernels running simultaneously. We demonstrate that existing subsystems within Linux can be leveraged to meet the design goals of a multikernel OS.  Taking this approach, we developed a fast inter-kernel network driver and messaging layer.  We demonstrate that the network driver can share a 1 Gbit/s link without degraded performance and that in combination with guest kernels, it meets or exceeds the performance of SMP Linux with an event-based web server.  We evaluate the messaging layer with microbenchmarks and conclude that it performs well given the limitations of current x86-64 hardware.  Finally, we use the messaging layer to provide live process migration between cores. / Master of Science
52

Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems using a Simulation-based Methodology

Sukhija, Nitin 09 May 2015 (has links)
Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity, wide distribution, and dynamism of the computing environments can lead to performance degradation of the scientific applications executing on these computing systems. Load imbalance arising from a variety of sources such as application, algorithmic, and systemic variations is one of the major contributors to their performance degradation. In general, load balancing is achieved via scheduling. Moreover, frequently occurring resource failures drastically affect the execution of applications running on high performance computing systems. Therefore, the study of deploying support for integrated scheduling and fault-tolerance mechanisms for guaranteeing that applications deployed on computing systems are resilient to failures becomes of paramount importance. Recently, several research initiatives have started to address the issue of resilience. However, the major focus of these efforts was geared more toward achieving system level resilience with less emphasis on achieving resilience at the application level. Therefore, it is increasingly important to extend the concept of resilience to the scheduling techniques at the application level for establishing a holistic approach that addresses the performability of these applications on high performance computing systems. This can be achieved by developing a comprehensive modeling framework that can be used to evaluate the resiliency of such techniques on heterogeneous computing systems for assessing the impact of failures as well as workloads in an integrated way. This dissertation presents an experimental methodology based on discrete event simulation for the analysis and the evaluation of the resilience of scheduling scientific applications on high performance computing systems. With the aid of the methodology a wide class of dependencies existing between application and computing system are captured within a deterministic model for quantifying the performance impact expected from changes in application and system characteristics. Ideally, the results obtained by employing the proposed simulation-based performance prediction framework enabled an introspective design and investigation of scheduling heuristics to reason about how to best fully optimize various often antagonistic objectives, such as minimizing application makespan and maximizing reliability.
53

A Qualitative Method for Dynamic Transport Selection in Heterogeneous Wireless Environments

Duffin, Heidi R. 23 August 2004 (has links) (PDF)
Computing devices are commonly equipped with multiple transport technologies such as IrDA, Bluetooth and WiFi. Transport switching technologies, such as Quality of Transport (QoT), take advantage of this heterogeneity to keep network sessions active as users move in and out of range of various transports or as the networking environment changes. During an active session, the goal is to keep the device connected over the best transport currently available. To accomplish that, this thesis introduces a two-phase decision making protocol. In phase one, intra-device prioritization, users indicate the relative importance of criteria such as speed, power, service charge, or signal range through a comprehensive user interface. QoT-enabled devices process this information with the prioritized soft constraint satisfaction (PSCS) scoring function to ascertain the transport that best meets the user's needs. The second phase, inter-device negotiation, facilitates two QoT-enabled devices in agreeing to a unified selection of the best transport. This phase uses a modified version of the PSCS scoring function based on the preferences of both users. Additionally, devices may utilize multiple transports simultaneously to more accurately meet user demands. The PSCS scoring function considers pairs of transports and calculates the ratio that will yield the desired performance. Another set of functions, also presented in this thesis, is then used to accomplish the desired performance level despite the potential introduction of additional overhead.
54

ReGen: Optimizing Genetic Selection Algorithms for Heterogeneous Computing

Winkleblack, Scott Kenneth Swinkleb 01 June 2014 (has links) (PDF)
GenSel is a genetic selection analysis tool used to determine which genetic markers are informational for a given trait. Performing genetic selection related analyses is a time consuming and computationally expensive task. Due to an expected increase in the number of genotyped individuals, analysis times will increase dramatically. Therefore, optimization efforts must be made to keep analysis times reasonable. This thesis focuses on optimizing one of GenSel’s underlying algorithms for heterogeneous computing. The resulting algorithm exposes task-level parallelism and data-level parallelism present but inaccessible in the original algorithm. The heterogeneous computing solution, ReGen, outperforms the optimized CPU implementation achieving a 1.84 times speedup.
55

FPGA Based Complete SAT Solver

Kannan, Sai Surya January 2022 (has links)
No description available.
56

Accelerating Component-Based Dataflow Middleware with Adaptivity and Heterogeneity

Hartley, Timothy D. R. 25 July 2011 (has links)
No description available.
57

An Adaptive Framework for Managing Heterogeneous Many-Core Clusters

Rafique, Muhammad Mustafa 21 October 2011 (has links)
The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters. / Ph. D.
58

Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing

Helal, Ahmed Elmohamadi Mohamed 29 January 2020 (has links)
In the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process. This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications. / Doctor of Philosophy / Current supercomputers integrate a massive number of heterogeneous compute units with varying speed, computational throughput, memory bandwidth, and memory access latency. This trend represents a major challenge to end users, as their applications have been designed from the ground up to primarily exploit homogeneous CPUs. While heterogeneous systems can deliver several orders of magnitude speedup compared to traditional CPU-based systems, end users need extensive software and hardware expertise as well as significant time and effort to efficiently utilize all the available compute resources. To streamline such a daunting process, this dissertation presents automated frameworks for analyzing and modeling the performance on parallel architectures and for transforming the execution of user applications at runtime. The proposed frameworks incorporate domain knowledge and adapt to the input data and the underlying hardware using novel static and dynamic analyses. The experimental results show the efficacy of the introduced frameworks across many important application domains, such as computational fluid dynamics (CFD), and computer-aided design (CAD). In particular, the adaptive execution approach on heterogeneous systems achieves up to an order-of-magnitude speedup over the optimized parallel implementations.
59

Enabling the use of Heterogeneous Computing for Bioinformatics

Bijanapalli Chakri, Ramakrishna 02 October 2013 (has links)
The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. / Master of Science
60

Performance Modeling, Optimization, and Characterization on Heterogeneous Architectures

Panwar, Lokendra Singh 21 October 2014 (has links)
Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection of best processor for an application, effective performance optimization strategies, maintaining performance portability across architectures etc. In this thesis, we present our techniques and approach to address some of these significant issues. Firstly, we present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases of this technique include scheduling or migrating GPU workloads over a heterogeneous cluster with different types of GPUs. We then present our approach to accelerate a seismology modeling application that is based on the finite difference method (FDM), using MPI and CUDA over a hybrid CPU+GPU cluster. We describe the generic computational complexities involved in porting such applications to the GPUs and present our strategy of efficient performance optimization and characterization. We also show how performance modeling can be used to reason and drive the hardware-specific optimizations on the GPU. The performance evaluation of our approach delivers a maximum speedup of 23-fold with a single GPU and 33-fold with dual GPUs per node over the serial version of the application, which in turn results in a many-fold speedup when coupled with the MPI distribution of the computation across the cluster. We also study the efficacy of GPU-integrated MPI, with MPI-ACC as an example implementation, in a seismology modeling application and discuss the lessons learned. / Master of Science

Page generated in 0.1087 seconds