91 |
Applying Polyhedral Transformation to Fortran ProgramsGururaghavendran, Ashwin 31 March 2011 (has links)
No description available.
|
92 |
A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel ArchitecturesJiang, Wei 27 August 2012 (has links)
No description available.
|
93 |
Failure Prediction using Machine Learning in a Virtualised HPC System and applicationBashir, Mohammed, Awan, Irfan U., Ugail, Hassan, Muhammad, Y. 21 March 2019 (has links)
Yes / Failure is an increasingly important issue in high performance computing and cloud systems. As
large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and
providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional
existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of
the emerging complexities of high performance computing systems. This necessitates the importance of having
an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure
within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the Support Vector Machine (SVM), Random Forest(RF), k-Nearest Neighbors (KNN), Classi cation and Regression Trees (CART) and Linear Discriminant Analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This f inding implies that our method can effectively predict all possible future system and
application failures within the system. / Petroleum Technology Development Fund (PTDF) funding support under the OSS scheme with grant number (PTDF/E/OSS/PHD/MB/651/14)
|
94 |
Using Workload Characterization to Guide High Performance Graph ProcessingHassan, Mohamed Wasfy Abdelfattah 24 May 2021 (has links)
Graph analytics represent an important application domain widely used in many fields such as web graphs, social networks, and Bayesian networks. The sheer size of the graph data sets combined with the irregular nature of the underlying problem pose a significant challenge for performance, scalability, and power efficiency of graph processing. With the exponential growth of the size of graph datasets, there is an ever-growing need for faster more power efficient graph solvers. The computational needs of graph processing can take advantage of the FPGAs' power efficiency and customizable architecture paired with CPUs' general purpose processing power and sophisticated cache policies. CPU-FPGA hybrid systems have the potential for supporting performant and scalable graph solvers if both devices can work coherently to make up for each other's deficits.
This study aims to optimize graph processing on heterogeneous systems through interdisciplinary research that would impact both the graph processing community, and the FPGA/heterogeneous computing community. On one hand, this research explores how to harness the computational power of FPGAs and how to cooperatively work in a CPU-FPGA hybrid system. On the other hand, graph applications have a data-driven execution profile; hence, this study explores how to take advantage of information about the graph input properties to optimize the performance of graph solvers.
The introduction of High Level Synthesis (HLS) tools allowed FPGAs to be accessible to the masses but they are yet to be performant and efficient, especially in the case of irregular graph applications. Therefore, this dissertation proposes automated frameworks to help integrate FPGAs into mainstream computing. This is achieved by first exploring the optimization space of HLS-FPGA designs, then devising a domain-specific performance model that is used to build an automated framework to guide the optimization process. Moreover, the architectural strengths of both CPUs and FPGAs are exploited to maximize graph processing performance via an automated framework for workload distribution on the available hardware resources. / Doctor of Philosophy / Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance.
With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications.
|
95 |
Automatic Scheduling of Compute Kernels Across Heterogeneous ArchitecturesLyerly, Robert Frantz 24 June 2014 (has links)
The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types of applications — traditional "big" CPUs (like the Intel Xeon) are optimized for low latency while other architectures (such as the NVidia Tesla K20x) are optimized for high-throughput. These architectures have different tradeoffs and different performance profiles, meaning fantastic performance gains for the right types of applications. However applications that are ill-suited for a given architecture may experience significant slowdown; therefore, it is imperative that applications are scheduled onto the correct processor.
In order to perform this scheduling, applications must be analyzed to determine their execution characteristics. Traditionally this application-to-hardware mapping was determined statically by the programmer. However, this requires intimate knowledge of the application and underlying architecture, and precludes load-balancing by the system. We demonstrate and empirically evaluate a system for automatically scheduling compute kernels by extracting program characteristics and applying machine learning techniques. We develop a machine learning process that is system-agnostic, and works for a variety of contexts (e.g. embedded, desktop/workstation, server). Finally, we perform scheduling in a workload-aware and workload-adaptive manner for these compute kernels. / Master of Science
|
96 |
High Performance Computing Issues in Large-Scale Molecular Statics SimulationsPulla, Gautam 02 June 1999 (has links)
Successful application of parallel high performance computing to practical problems requires overcoming several challenges. These range from the need to make sequential and parallel improvements in programs to the implementation of software tools which create an environment that aids sharing of high performance hardware resources and limits losses caused by hardware and software failures. In this thesis we describe our approach to meeting these challenges in the context of a Molecular Statics code. We describe sequential and parallel optimizations made to the code and also a suite of tools constructed to facilitate the execution of the Molecular Statics program on a network of parallel machines with the aim of increasing resource sharing, fault tolerance and availability. / Master of Science
|
97 |
Scalable and Productive Data Management for High-Performance AnalyticsYoussef, Karim Yasser Mohamed Yousri 07 November 2023 (has links)
Advancements in data acquisition technologies across different domains, from genome sequencing to satellite and telescope imaging to large-scale physics simulations, are leading to an exponential growth in dataset sizes. Extracting knowledge from this wealth of data enables scientific discoveries at unprecedented scales. However, the sheer volume of the gathered datasets is a bottleneck for knowledge discovery. High-performance computing (HPC) provides a scalable infrastructure to extract knowledge from these massive datasets. However, multiple data management performance gaps exist between big data analytics software and HPC systems. These gaps arise from multiple factors, including the tradeoff between performance and programming productivity, data growth at a faster rate than memory capacity, and the high storage footprints of data analytics workflows. This dissertation bridges these gaps by combining productive data management interfaces with application-specific optimizations of data parallelism, memory operation, and storage management. First, we address the performance-productivity tradeoff by leveraging Spark and optimizing input data partitioning. Our solution optimizes programming productivity while achieving comparable performance to the Message Passing Interface (MPI) for scalable bioinformatics. Second, we address the operating system's kernel limitations for out-of-core data processing by autotuning memory management parameters in userspace. Finally, we address I/O and storage efficiency bottlenecks in data analytics workflows that iteratively and incrementally create and reuse persistent data structures such as graphs, data frames, and key-value datastores. / Doctor of Philosophy / Advancements in various fields, like genetics, satellite imaging, and physics simulations, are generating massive amounts of data. Analyzing this data can lead to groundbreaking scientific discoveries. However, the sheer size of these datasets presents a challenge. High-performance computing (HPC) offers a solution to process and understand this data efficiently. Still, several issues hinder the performance of big data analytics software on HPC systems. These problems include finding the right balance between performance and ease of programming, dealing with the challenges of handling massive amounts of data, and optimizing storage usage. This dissertation focuses on three areas to improve high-performance data analytics (HPDA). Firstly, it demonstrates how using Spark and optimized data partitioning can optimize programming productivity while achieving similar scalability as the Message Passing Interface (MPI) for scalable bioinformatics. Secondly, it addresses the limitations of the operating system's memory management for processing data that is too large to fit entirely in memory. Lastly, it tackles the efficiency issues related to input/output operations and storage when dealing with data structures like graphs, data frames, and key-value datastores in iterative and incremental workflows.
|
98 |
Evaluating and Enhancing FAIR Compliance in Data Resource Portal DevelopmentYiqing Qu (18437745) 01 May 2024 (has links)
<p dir="ltr">There is a critical need for improvement in scientific data management when the big-data era arrives. Motivated by the evolution and significance of FAIR principles in contemporary research, the study focuses on the development and evaluation of a FAIR-compliant data resource portal. The challenge lies in translating the abstract FAIR principles into actionable, technological implementations and the evaluation. After baseline selection, the study aims to benchmark standards and outperform existing FAIR compliant data resource portals. The proposed approach includes an assessment of existing portals, the interpretation of FAIR principles into practical considerations, and the integration of modern technologies for the implementation. With a FAIR-ness evaluation framework designed and applied to the implementation, this study evaluated and improved the FAIR-compliance of data resource portal. Specifically, the study identified the need for improved persistent identifiers, comprehensive descriptive metadata, enhanced metadata access methods and adherence to community standards and formats. The evaluation of the FAIR-compliant data resource portal with FAIR implementation, showed a significant improvement in FAIR compliance, and eventually enhanced data discoverability, usability, and overall management in academic research.</p>
|
99 |
HPC-based Parallel Algorithms for Generating Random Networks and Some Other Network Analysis ProblemsAlam, Md Maksudul 06 December 2016 (has links)
The advancement of modern technologies has resulted in an explosive growth of complex systems, such as the Internet, biological, social, and various infrastructure networks, which have, in turn, contributed to the rise of massive networks. During the past decade, analyzing and mining of these networks has become an emerging research area with many real-world applications. The most relevant problems in this area include: collecting and managing networks, modeling and generating random networks, and developing network mining algorithms. In the era of big data, speed is not an option anymore for the effective analysis of these massive systems, it is an absolute necessity. This motivates the need for parallel algorithms on modern high-performance computing (HPC) systems including multi-core, distributed, and graphics processor units (GPU) based systems. In this dissertation, we present distributed memory parallel algorithms for generating massive random networks and a novel GPU-based algorithm for index searching.
This dissertation is divided into two parts. In Part I, we present parallel algorithms for generating massive random networks using several widely-used models. We design and develop a novel parallel algorithm for generating random networks using the preferential-attachment model. This algorithm can generate networks with billions of edges in just a few minutes using a medium-sized computing cluster. We develop another parallel algorithm for generating random networks with a given sequence of expected degrees. We also design a new a time and space efficient algorithmic method to generate random networks with any degree distributions. This method has been applied to generate random networks using other popular network models, such as block two-level Erdos-Renyi and stochastic block models. Parallel algorithms for network generation pose many nontrivial challenges such as dependency on edges, avoiding duplicate edges, and load balancing. We applied novel techniques to deal with these challenges. All of our algorithms scale very well to a large number of processors and provide almost linear speed-up.
Dealing with a large number of networks collected from a variety of fields requires efficient management systems such as graph databases. Finding a record in those databases is very critical and typically is the main bottleneck for performance. In Part II of the dissertation, we develop a GPU-based parallel algorithm for index searching. Our algorithm achieves the fastest throughput ever reported in the literature for various benchmarks. / Ph. D. / The advancement of modern technologies has resulted in an explosive growth of complex systems, such as the Internet, biological, social, and various infrastructure networks, which have, in turn, contributed to the rise of massive networks. During the past decade, analyzing and mining of these networks has become an emerging research area with many real-world applications. The most relevant problems in this area include: collecting and managing networks, modeling and generating random networks, and developing network mining algorithms. As the networks are massive in size, we need faster algorithms for the quick and effective analysis of these systems. This motivates the need for parallel algorithms on modern high-performance computing (HPC) based systems. In this dissertation, we present HPC-based parallel algorithms for generating massive random networks and managing large scale network data.
This dissertation is divided into two parts. In Part I, we present parallel algorithms for generating massive random networks using several widely-used models, such as the preferential attachment model, the Chung-Lu model, the block two-level Erdős-Rényi model and the stochastic block model. Our algorithms can generate networks with billions of edges in just a few minutes using a medium-sized HPC-based cluster. We applied novel load balancing techniques to distribute workloads equally among the processors. As a result, all of our algorithms scale very well to a large number of processors and provide almost linear speed-up. In Part II of the dissertation, we develop a parallel algorithm for finding records by given keys. Dealing with a large number of network data collected from a variety of fields requires efficient database management systems such as graph databases. Finding a record in those databases is very critical and typically is the main bottleneck for performance. Our algorithm achieves the fastest data lookup throughput ever reported in the literature for various benchmarks.
|
100 |
Nas benchmark evaluation of HKU cluster of workstations麥志華, Mak, Chi-wah. January 1999 (has links)
published_or_final_version / abstract / toc / Computer Science / Master / Master of Philosophy
|
Page generated in 0.1449 seconds