• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 318
  • 189
  • 134
  • 56
  • 45
  • 32
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 874
  • 874
  • 874
  • 391
  • 387
  • 350
  • 349
  • 328
  • 325
  • 319
  • 319
  • 316
  • 314
  • 313
  • 313
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Failure Prediction using Machine Learning in a Virtualised HPC System and application

Bashir, Mohammed, Awan, Irfan U., Ugail, Hassan, Muhammad, Y. 21 March 2019 (has links)
Yes / Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the Support Vector Machine (SVM), Random Forest(RF), k-Nearest Neighbors (KNN), Classi cation and Regression Trees (CART) and Linear Discriminant Analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This f inding implies that our method can effectively predict all possible future system and application failures within the system. / Petroleum Technology Development Fund (PTDF) funding support under the OSS scheme with grant number (PTDF/E/OSS/PHD/MB/651/14)
72

Using Workload Characterization to Guide High Performance Graph Processing

Hassan, Mohamed Wasfy Abdelfattah 24 May 2021 (has links)
Graph analytics represent an important application domain widely used in many fields such as web graphs, social networks, and Bayesian networks. The sheer size of the graph data sets combined with the irregular nature of the underlying problem pose a significant challenge for performance, scalability, and power efficiency of graph processing. With the exponential growth of the size of graph datasets, there is an ever-growing need for faster more power efficient graph solvers. The computational needs of graph processing can take advantage of the FPGAs' power efficiency and customizable architecture paired with CPUs' general purpose processing power and sophisticated cache policies. CPU-FPGA hybrid systems have the potential for supporting performant and scalable graph solvers if both devices can work coherently to make up for each other's deficits. This study aims to optimize graph processing on heterogeneous systems through interdisciplinary research that would impact both the graph processing community, and the FPGA/heterogeneous computing community. On one hand, this research explores how to harness the computational power of FPGAs and how to cooperatively work in a CPU-FPGA hybrid system. On the other hand, graph applications have a data-driven execution profile; hence, this study explores how to take advantage of information about the graph input properties to optimize the performance of graph solvers. The introduction of High Level Synthesis (HLS) tools allowed FPGAs to be accessible to the masses but they are yet to be performant and efficient, especially in the case of irregular graph applications. Therefore, this dissertation proposes automated frameworks to help integrate FPGAs into mainstream computing. This is achieved by first exploring the optimization space of HLS-FPGA designs, then devising a domain-specific performance model that is used to build an automated framework to guide the optimization process. Moreover, the architectural strengths of both CPUs and FPGAs are exploited to maximize graph processing performance via an automated framework for workload distribution on the available hardware resources. / Doctor of Philosophy / Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance. With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications.
73

HPC-based Parallel Algorithms for Generating Random Networks and Some Other Network Analysis Problems

Alam, Md Maksudul 06 December 2016 (has links)
The advancement of modern technologies has resulted in an explosive growth of complex systems, such as the Internet, biological, social, and various infrastructure networks, which have, in turn, contributed to the rise of massive networks. During the past decade, analyzing and mining of these networks has become an emerging research area with many real-world applications. The most relevant problems in this area include: collecting and managing networks, modeling and generating random networks, and developing network mining algorithms. In the era of big data, speed is not an option anymore for the effective analysis of these massive systems, it is an absolute necessity. This motivates the need for parallel algorithms on modern high-performance computing (HPC) systems including multi-core, distributed, and graphics processor units (GPU) based systems. In this dissertation, we present distributed memory parallel algorithms for generating massive random networks and a novel GPU-based algorithm for index searching. This dissertation is divided into two parts. In Part I, we present parallel algorithms for generating massive random networks using several widely-used models. We design and develop a novel parallel algorithm for generating random networks using the preferential-attachment model. This algorithm can generate networks with billions of edges in just a few minutes using a medium-sized computing cluster. We develop another parallel algorithm for generating random networks with a given sequence of expected degrees. We also design a new a time and space efficient algorithmic method to generate random networks with any degree distributions. This method has been applied to generate random networks using other popular network models, such as block two-level Erdos-Renyi and stochastic block models. Parallel algorithms for network generation pose many nontrivial challenges such as dependency on edges, avoiding duplicate edges, and load balancing. We applied novel techniques to deal with these challenges. All of our algorithms scale very well to a large number of processors and provide almost linear speed-up. Dealing with a large number of networks collected from a variety of fields requires efficient management systems such as graph databases. Finding a record in those databases is very critical and typically is the main bottleneck for performance. In Part II of the dissertation, we develop a GPU-based parallel algorithm for index searching. Our algorithm achieves the fastest throughput ever reported in the literature for various benchmarks. / Ph. D.
74

High Performance Computing Issues in Large-Scale Molecular Statics Simulations

Pulla, Gautam 02 June 1999 (has links)
Successful application of parallel high performance computing to practical problems requires overcoming several challenges. These range from the need to make sequential and parallel improvements in programs to the implementation of software tools which create an environment that aids sharing of high performance hardware resources and limits losses caused by hardware and software failures. In this thesis we describe our approach to meeting these challenges in the context of a Molecular Statics code. We describe sequential and parallel optimizations made to the code and also a suite of tools constructed to facilitate the execution of the Molecular Statics program on a network of parallel machines with the aim of increasing resource sharing, fault tolerance and availability. / Master of Science
75

Efficient in-situ workflows for time-critical applications on heterogeneous ecosystems Item

Feng Li (16627272) 21 July 2023 (has links)
<p>In-situ workflows are a special class of scientific workflows, where different component applications (such as simulation, visualization, analysis) run concurrently, and data flows continuously between components during the whole workflow lifetime. Traditionally, simulations write large amounts of output data to persistent storage, which are later read for future analysis/visualization. In comparison, in-situ workflows allow analysis/visualization components to consume simulation data while the simulations are still running and thus reduce the I/O overhead. There are recent research works that focus on providing data transport libraries to help compose a group of applications into an integral in-situ workflow. However, only a few ``performance-oriented'' studies exist for in-situ workflows, and most of these works focus on workflows with simple structures (e.g., single producer and single consumer), also without consideration of heterogeneous environments for in-situ workflows. Being able to efficiently utilize heterogeneous computing resources such as multiple Clouds and HPCs can significantly accelerate real-world in-situ workflows, and benefit applications that require both significant computation power and real-time outputs(e.g., identifying abnormal patterns in fluid dynamics). The goal of this dissertation is to provide resource planning algorithms and runtime support, to improve in-situ workflow performance on heterogeneous environments.</p> <p><br></p> <p>This dissertation first investigates the emerging applications of in-situ workflows, which usually include parallel simulation, visualization, and analysis components. Two representative real-world in-situ workflows are studied in details-- a real-time CFD machine learning/visualization workflow and a wildfire spreading workflow. These workflows showcase the capability of in-situ workflows: e.g.,  decoupled and accelerated computation and fast near-real-time response time, however, there is a lack of resource planning and runtime support for general in-situ workflows. For resource planning, I first formulate the optimization problem, and then design and implement a heuristic algorithm called ``SNL'' (Scheduled-Neighbor-Lookup). SNL considers the pipelined execution pattern of in-situ workflows, and guides the resource planning of complex in-situ workflows to achieve higher workflow throughput. For the runtime support, I design and implement the ``INSTANT'' runtime framework, a runtime framework to configure, plan, launch, and monitor in-situ workflows for distributed computing environments. INSTANT provides intuitive interfaces to compose abstract in-situ workflows, manages in-site and cross-site data transfers with ADIOS2, and supports resource planning using profiled performance data. Experiments with the two use cases show that INSTANT can efficiently streamline the orchestration of complex in-situ workflows, and the resource planning capability allows INSTANT to plan and carry out fast workflow execution at different computing resource availabilities.</p>
76

High performance computing for irregular algorithms and applications with an emphasis on big data analytics

Green, Oded 22 May 2014 (has links)
Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.
77

Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance / Robust feature selection for histology images through high performance computing

Bouvier, Clément 18 January 2019 (has links)
L’histologie produit des images à l’échelle cellulaire grâce à des microscopes optiques très performants. La quantification du tissu marqué comme les neurones s’appuie de plus en plus sur des segmentations par apprentissage automatique. Cependant, l’apprentissage automatique nécessite une grande quantité d’informations intermédiaires, ou caractéristiques, extraites de la donnée brute multipliant d’autant la quantité de données à traiter. Ainsi, le nombre important de ces caractéristiques est un obstacle au traitement robuste et rapide de séries d’images histologiques. Les algorithmes de sélection de caractéristiques pourraient réduire la quantité d’informations nécessaires mais les ensembles de caractéristiques sélectionnés sont peu reproductibles. Nous proposons une méthodologie originale fonctionnant sur des infrastructures de calcul haute-performance (CHP) visant à sélectionner des petits ensembles de caractéristiques stables afin de permettre des segmentations rapides et robustes sur des images histologiques acquises à très haute-résolution. Cette sélection se déroule en deux étapes : la première à l’échelle des familles de caractéristiques. La deuxième est appliquée directement sur les caractéristiques issues de ces familles. Dans ce travail, nous avons obtenu des ensembles généralisables et stables pour deux marquages neuronaux différents. Ces ensembles permettent des réductions significatives des temps de traitement et de la mémoire vive utilisée. Cette méthodologie rendra possible des études histologiques exhaustives à haute-résolution sur des infrastructures CHP que ce soit en recherche préclinique et possiblement clinique. / In preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings.
78

Adaptive Power Management for Autonomic Resource Configuration in Large-scale Computer Systems

Zhang, Ziming (Software engineer) 08 1900 (has links)
In order to run and manage resource-intensive high-performance applications, large-scale computing and storage platforms have been evolving rapidly in various domains in both academia and industry. The energy expenditure consumed to operate and maintain these cloud computing infrastructures is a major factor to influence the overall profit and efficiency for most cloud service providers. Moreover, considering the mitigation of environmental damage from excessive carbon dioxide emission, the amount of power consumed by enterprise-scale data centers should be constrained for protection of the environment.Generally speaking, there exists a trade-off between power consumption and application performance in large-scale computing systems and how to balance these two factors has become an important topic for researchers and engineers in cloud and HPC communities. Therefore, minimizing the power usage while satisfying the Service Level Agreements have become one of the most desirable objectives in cloud computing research and implementation. Since the fundamental feature of the cloud computing platform is hosting workloads with a variety of characteristics in a consolidated and on-demand manner, it is demanding to explore the inherent relationship between power usage and machine configurations. Subsequently, with an understanding of these inherent relationships, researchers are able to develop effective power management policies to optimize productivity by balancing power usage and system performance. In this dissertation, we develop an autonomic power-aware system management framework for large-scale computer systems. We propose a series of techniques including coarse-grain power profiling, VM power modelling, power-aware resource auto-configuration and full-system power usage simulator. These techniques help us to understand the characteristics of power consumption of various system components. Based on these techniques, we are able to test various job scheduling strategies and develop resource management approaches to enhance the systems' power efficiency.
79

Virtualized resource management in high performance fabric clusters

Ranadive, Adit Uday 07 January 2016 (has links)
Providing performance and isolation guarantees for applications running in virtualized datacenter environments requires continuous management of the underlying physical resources. For communication- and I/O-intensive applications running on such platforms, the management methods must adequately deal with the shared use of the high-performance fabrics these applications require. In particular, new classes of latency-sensitive and data-intensive workloads running in virtualized environments rely on emerging fabrics like 40+Gbps Ethernet and InfiniBand/RoCE with support for RDMA, VMM-bypass and hardware-level virtualization (SR-IOV). However, the benefits provided by these technology advances are offset by several management constraints: (i) the inability of the hypervisor to monitor the VMs’ usage of these fabrics can affect the platform’s ability to provide isolation and performance guarantees, (ii) the hypervisor cannot provide fine-grained I/O provisioning or perform management decisions for VMs, thus reducing the degree of consolidation that can be supported on the platforms, and (iii) without such support it is harder to integrate these fabrics into emerging cloud computing platforms and datacenter fabric management solutions. This is made particularly challenging for workloads spanning multiple VMs, utilizing physical resources distributed across multiple server nodes and the interconnection fabric. This thesis addresses the problem of realizing a flexible, dynamic resource management system for virtualized platforms with high performance fabrics. We make the following key contributions: (i) A lightweight monitoring tool, IBMon, integrated with the hypervisor to monitor VMs’ use of RDMA-enabled virtualized interconnects, using memory introspection techniques. (ii) The design and construction of a resource management system that leverages IBMon to provide latency-sensitive applications performance guarantees. This system is built on microeconomic principles of supply and demand and can be deployed on a per-node (Resource Exchange) or a multi-node (Distributed Resource Exchange) basis. Fine-grained resource allocations can be enforced through several mechanisms, including CPU capping or fabric-level congestion control. (iii) Sphinx, a fabric management solution that leverages Resource Exchange to orchestrate network and provide latency proportionality for consolidated workloads, based on user/application-specified policies. (iv) Implementation and experimental evaluation using InfiniBand clusters virtualized with the Xen or KVM hypervisor, managed via the OpenFloodlight SDN controller, and using representative data-intensive and latency-sensitive benchmarks.
80

Parallel explicit FEM algorithms using GPU's

Banihashemi, Seyed Parsa 07 January 2016 (has links)
The Explicit Finite Element Method is a powerful tool in nonlinear dynamic finite element analysis. Recent major developments in computational devices, in particular, General Purpose Graphical Processing Units (GPGPU's) now make it possible to increase the performance of the explicit FEM. This dissertation investigates existing explicit finite element method algorithms which are then redesigned for GPU's and implemented. The performance of these algorithms is assessed and a new asynchronous variational integrator spatial decomposition (AVISD) algorithm is developed which is flexible and encompasses all other methods and can be tuned based for a user-defined problem and the performance of the user's computer. The mesh-aware performance of the proposed explicit finite element algorithm is studied and verified by implementation. The current research also introduces the use of a Particle Swarm Optimization method to tune the performance of the proposed algorithm automatically given a finite element mesh and the performance characteristics of a user's computer. For this purpose, a time performance model is developed which depends on the finite element mesh and the machine performance. This time performance model is then used as an objective function to minimize the run-time cost. Also, based on the performance model provided in this research and predictions about the changes in GPU's in the near future, the performance of the AVISD method is predicted for future machines. Finally, suggestions and insights based on these results are proposed to help facilitate future explicit FEM development.

Page generated in 0.0481 seconds