Spelling suggestions: "subject:"high performance anda"" "subject:"high performance ando""
41 |
Computational Parameter Selection and Simulation of Complex Sphingolipid Pathway MetabolismHenning, Peter Allen 22 May 2006 (has links)
Systems biology is an emerging field of study that seeks to provide systems-level understanding of biological systems through the integration of high-throughput biological data into predictive computational models. The integrative nature of this field is in sharp contrast as compared to the Reductionist methods that have been employed since the advent of molecular biology. Systems biology investigates not only the individual components of the biological system, such as metabolic pathways, organelles, and signaling cascades, but also considers the relationships and interactions between the components in the hope that an understandable model of the entire system can eventually be developed. This field of study is being hailed by experts as a potential vital technology in revolutionizing the pharmaceutical development process in the post-genomic era. This work not only provides a systems biology investigation into principles governing de novo sphingolipid metabolism but also the various computational obstacles that are present in converting high-throughput data into an insightful model.
|
42 |
Design and Implementation of High Performance Algorithms for the (n,k)-Universal Set ProblemLuo, Ping 14 January 2010 (has links)
The k-path problem is to find a simple path of length k. This
problem is NP-complete and has applications in bioinformatics for
detecting signaling pathways in protein interaction networks and for biological subnetwork matching. There are algorithms implemented to
solve the problem for k up to 13. The fastest implementation has
running time O^*(4.32^k), which is slower than the best known algorithm of running time O^*(4^k). To implement the best known algorithm for the k-path problem, we need to construct (n,k)-universal set.
In this thesis, we study the practical algorithms for constructing the (n,k)-universal set problem. We propose six algorithm variants to
handle the increasing computational time and memory space needed for
k=3, 4, ..., 8. We propose two major empirical techniques that cut
the time and space tremendously, yet generate good results. For the case k=7, the size of the universal set found by our algorithm is 1576, and is 4611 for the case k=8.
We implement the proposed algorithms with the OpenMP parallel interface and construct universal sets for k=3, 4, ..., 8. Our experiments show that our algorithms for the (n,k)-universal set problem exhibit very good parallelism and hence shed light on its MPI implementation.
Ours is the first implementation effort for the (n,k)-universal set
problem. We share the effort by proposing an extensible universal set construction and retrieval system. This system integrates universal set construction algorithms and the universal sets constructed. The sets are
stored in a centralized database and an interface is provided to access the database easily.
The (n,k)-universal set have been applied to many other NP-complete
problems such as the set splitting problems and the matching
and packing problems. The small (n,k)-universal set constructed
by us will reduce significantly the time to solve those problems.
|
43 |
noneYang, Yung-an 23 June 2008 (has links)
When employee turnover rate in a company is too high, not only it will result in the increase of cost, affecting employee morale, making negative impression on customers, but also impacting corporate performance in the long run. This research uses one of the qualitative research techniques to study the human resource management practices of the four interviewed companies, trying to identify the best practices they have in common that contribute to the decline of employee turnover rate or maintain it within acceptable level.
After interviewing the four companies, two in service industry and the other two in high-tech manufacture industry, this research analyzed their human resource management practices, and found five best practices they have in common that help these four companies successfully keep employee turnover rate in control. Therefore this research concluded that a company, whether it is in service industry or high-tech manufacture industry, or whether its organizational culture is more performance-oriented or rather paternalistic, by recruiting employees through diversified approaches and selecting them by their personalities, and strongly linking performance appraisal system, reward system, promotion system and training and development system together, the synergy of the best practices all together will contribute to the decline or maintain of employee turnover rate.
|
44 |
Memory management for high-performance applicationsBerger, Emery David. January 2002 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2002. / Vita. Includes bibliographical references. Available also from UMI Company.
|
45 |
Assessment of open-source software for high-performance computingRapur, Gayatri. January 2003 (has links) (PDF)
Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
46 |
Power and performance modeling for high-performance computing algorithmsChoi, Jee Whan 08 June 2015 (has links)
The overarching goal of this thesis is to provide an algorithm-centric approach to analyzing the relationship between time, energy, and power. This research is aimed at algorithm designers and performance tuners so that they may be able to make decisions on how algorithms should be designed and tuned depending on whether the goal is to minimize time or to minimize energy on current and future systems.
First, we present a simple analytical cost model for energy and power. Assuming a simple von Neumann architecture with a two-level memory hierarchy, this model pre- dicts energy and power for algorithms using just a few simple parameters, such as the number of floating point operations (FLOPs or flops) and the amount of data moved (bytes or words). Using highly optimized microbenchmarks and a small number of test platforms, we show that although this model uses only a few simple parameters, it is, nevertheless, accurate.
We can also visualize this model using energy “arch lines,” analogous to the “rooflines” in time. These “rooflines in energy” allow users to easily assess and com- pare different algorithms’ intensities in energy and time to various target systems’ balances in energy and time. This visualization of our model gives us many inter- esting insights, and as such, we refer to our analytical model as the energy roofline model.
Second, we present the results of our microbenchmarking study of time, energy, and power costs of computation and memory access of several candidate compute- node building blocks of future high–performance computing (HPC) systems. Over a dozen server-, desktop-, and mobile-class platforms that span a range of compute and power characteristics were evaluated, including x86 (both conventional and Xeon Phi accelerator), ARM, graphics processing units (GPU), and hybrid (AMD accelerated processing units (APU) and other system–on–chip (SoC)) processors.
The purpose of this study was twofold; first, it was to extend the validation of the energy roofline model to a more comprehensive set of target systems to show that the model works well independent of system hardware and microarchitecture; second, it was to improve the model by uncovering and remedying potential shortcomings, such as incorporating the effects of power “capping,” multi–level memory hierarchy, and different implementation strategies on power and performance.
Third, we incorporate dynamic voltage and frequency scaling (DVFS) into the energy roofline model to explore its potential for saving energy. Rather than the more traditional approach of using DVFS to reduce energy, whereby a “slack” in computation is used as an opportunity to dynamically cycle down the processor clock, the energy roofline model can be used to determine precisely how the time and energy costs of different operations, both compute and memory, change with respect to frequency and voltage settings. This information can be used to target a specific optimization goal, whether that be time, energy, or a combination of both.
In the final chapter of this thesis, we use our model to predict the energy dissi- pation of a real application running on a real system. The fast multipole method (FMM) kernel was executed on the GPU component of the Tegra K1 SoC under various frequency and voltage settings and a breakdown of instructions and data ac- cess pattern was collected via performance counters. The total energy dissipation of FMM was then calculated as a weighted sum of these instructions and the associated costs in energy. On eight different voltage and frequency settings and eight different algorithm–specific input parameters per setting, for a total of 64 total test cases, the accuracy of the energy roofline model for predicting total energy dissipation was within 6.2%, with a standard deviation of 4.7%, when compared to actual energy measurements.
Despite its simplicity and its foundation on the first principles of algorithm anal- ysis, the energy roofline model has proven to be both practical and accurate for real applications running on a real system. And as such, it can be an invaluable tool for al- gorithm designers and performance tuners with which they can more precisely analyze the impact of their design decisions on both performance and energy efficiency.
|
47 |
xBFT : Byzantine fault tolerance with high performance, low cost, and aggressive fault isolationKotla, Ramakrishna Rao, 1976- 24 September 2012 (has links)
We are increasingly relying on online services to store, access, share, and disseminate critical information from anywhere and at all times. Such services include email, digital storage, photos, video, health and financial services, etc. With increasing evidence of non-fail-stop failures in practical systems, Byzantine fault tolerant state machine replication technique is becoming increasingly attractive for building highlyreliable services in order to tolerate such failures. However, existing Byzantine fault tolerant techniques fall short of providing high availability, high performance, and long-term data durability guarantees with competitive replication cost. In this dissertation, we present BFT replication techniques that facilitate the design and implementation of such highly-reliable services by providing high availability, high performance and high durability with competitive replication cost (hardware, software, network, management). First, we propose CBASE, a BFT state machine replication architecture that leverages application-level parallelism to improve throughput of the replicated system by identifying and executing independent requests concurrently. Traditional state machine replication based Byzantine fault tolerant (BFT) techniques provide high availability and security but fail to provide high throughput. This limitation stems from the fundamental assumption of generalized state machine replication techniques that all replicas execute requests sequentially in the same total order to ensure consistency across replicas. Our architecture thus provides a general way to exploit application parallelism in order to provide high throughput without compromising correctness. Second, we present Zyzzyva, an efficient BFT agreement protocol that uses speculation to significantly reduce the performance overhead and replication cost of BFT state machine replication. In Zyzzyva, replicas respond to a client’s request without first running an expensive three-phase commit protocol to reach agreement on the order in which the request must be processed. Instead, they optimistically adopt the order proposed by the primary and respond immediately to the client. Replicas can thus become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima. Third, we design and implement SafeStore, a distributed storage system designed to maintain long-term data durability despite conventional hardware and software faults, environmental disruptions, and administrative failures caused by human error or malice. The architecture of SafeStore is based on fault isolation, which SafeStore applies aggressively along administrative, physical, and temporal dimensions by spreading data across autonomous storage service providers (SSPs). SafeStore also performs an efficient end-to-end audit of SSPs to detect data loss quickly and improve data durability by reducing MTTR. SafeStore offers durable storage with cost, performance, and availability competitive with traditional storage systems. We evaluate these techniques by implementing BFT replication libraries and further demonstrate the practicality of these approaches by implementing an NFS based replicated file system(CBASE-FS) and a durable storage system (SafeStore-FS). / text
|
48 |
A run-time hardware task execution framework for FPGA-accelerated heterogeneous clusterChoi, Yuk-ming, 蔡育明 January 2013 (has links)
The era of big data has led to problems of unprecedented scale and complexity that are challenging the computing capability of conventional computer systems. One way to address the computational and communication challenges of such demanding applications is to incorporate the use of non-conventional hardware accelerators such as FPGAs into existing systems. By providing a mix of FPGAs and conventional CPUs as computing resources in a heterogeneous cluster, a distributed computing environment can be achieved to address the need of both compute-intensive and data-intensive applications. However, utilizing heterogeneous clusters requires application developers’ comprehensive knowledge on both hardware and software. In order to assist programmers to take advantage of the synergy between hardware and software easily, an easy-to-use framework for virtualizing the underlying FPGA computing resources of the heterogeneous cluster is motivated.
In this work, a heterogeneous cluster consisting of both FPGAs and CPUs was built and a framework for managing multiple FPGAs across the cluster was designed. The major contribution of the framework is to provide an abstraction layer between the application developer and the underlying FPGA computing resources, so as to improve the overall design productivity. An inter-FPGA communication system was implemented such that gateware executing on FPGAs can communicate with each other autonomously to the CPU. Furthermore, to demonstrate a real-life application on the heterogeneous cluster, a generic k-means clustering application was implemented, using the MapReduce programming model.
The implementation of the k-means application on multiple FPGAs was compared with a software-only version that was run on a Hadoop multi-core computer cluster. The performance results show that the FPGA version outperforms the Hadoop version across various parameters. An in-depth study on the communication bottleneck presented in the system was also carried out. A number of experiments were specifically designed to benchmark the performance of each I/O channel. The study shows that the major source of I/O bottleneck lies at the communication between the host system and the FPGA. This gives insight into programming considerations of potential applications on the cluster as well as improvement to the framework. Moreover, the benefit of multiple FPGAs was investigated through a series of experiments. Compared with putting all mappers on a single FPGA, it was found that distributing the same amount of mappers across more FPGAs can provide a tradeoff between FPGA resources and I/O performance. / published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy
|
49 |
Memory management for high-performance applicationsBerger, Emery David 28 August 2008 (has links)
Not available / text
|
50 |
Algorithmic techniques for the micron automata processorRoy, Indranil 21 September 2015 (has links)
Our research is the first in-depth study in the use of the Micron Automata Processor, a novel re-configurable streaming co-processor which is purpose-built to execute thousands of Non-deterministic Finite Automata (NFA) in parallel. By design, this processor is well-suited to accelerate applications which need to find all occurrences of thousands of complex string-patterns in the input data. We have validated this by implementing two such applications, one from network security and the other from bioinformatics, both of which are significantly faster than their state-of-art counterparts. Our research has also widened the scope of the applications which can be accelerated through this processor by finding ways to quickly program any generic graph into it and then search for hard to find features like maximal-cliques and Hamiltonian paths. These applications and algorithms have yielded valuable design-inputs for next generation of the chip which is currently in design phase. We hope that this work paves the way to the early adoption of this upcoming architecture and to efficient solution of some of the currently computationally challenging problems.
|
Page generated in 0.0759 seconds