21 |
A run-time hardware task execution framework for FPGA-accelerated heterogeneous clusterChoi, Yuk-ming, 蔡育明 January 2013 (has links)
The era of big data has led to problems of unprecedented scale and complexity that are challenging the computing capability of conventional computer systems. One way to address the computational and communication challenges of such demanding applications is to incorporate the use of non-conventional hardware accelerators such as FPGAs into existing systems. By providing a mix of FPGAs and conventional CPUs as computing resources in a heterogeneous cluster, a distributed computing environment can be achieved to address the need of both compute-intensive and data-intensive applications. However, utilizing heterogeneous clusters requires application developers’ comprehensive knowledge on both hardware and software. In order to assist programmers to take advantage of the synergy between hardware and software easily, an easy-to-use framework for virtualizing the underlying FPGA computing resources of the heterogeneous cluster is motivated.
In this work, a heterogeneous cluster consisting of both FPGAs and CPUs was built and a framework for managing multiple FPGAs across the cluster was designed. The major contribution of the framework is to provide an abstraction layer between the application developer and the underlying FPGA computing resources, so as to improve the overall design productivity. An inter-FPGA communication system was implemented such that gateware executing on FPGAs can communicate with each other autonomously to the CPU. Furthermore, to demonstrate a real-life application on the heterogeneous cluster, a generic k-means clustering application was implemented, using the MapReduce programming model.
The implementation of the k-means application on multiple FPGAs was compared with a software-only version that was run on a Hadoop multi-core computer cluster. The performance results show that the FPGA version outperforms the Hadoop version across various parameters. An in-depth study on the communication bottleneck presented in the system was also carried out. A number of experiments were specifically designed to benchmark the performance of each I/O channel. The study shows that the major source of I/O bottleneck lies at the communication between the host system and the FPGA. This gives insight into programming considerations of potential applications on the cluster as well as improvement to the framework. Moreover, the benefit of multiple FPGAs was investigated through a series of experiments. Compared with putting all mappers on a single FPGA, it was found that distributing the same amount of mappers across more FPGAs can provide a tradeoff between FPGA resources and I/O performance. / published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy
|
22 |
Memory management for high-performance applicationsBerger, Emery David 28 August 2008 (has links)
Not available / text
|
23 |
Efficient communication subsystem for cluster computingLee, Chun-ming, 李俊明 January 1998 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
24 |
System level design issues for high performance SIMD architecturesAllen, James D. 05 1900 (has links)
No description available.
|
25 |
Design of a high performance and high availability distributed storage system : a dissertation presented to the faculty of the Graduate School, Tennessee Technological University /Ou, Li, January 2006 (has links)
Thesis (Ph.D.)--Tennessee Technological University, 2006. / Bibliography: leaves 112-122.
|
26 |
Accelerating a medical 3D brain MRI analysis algorithm using a high-performance reconfigurable computerKoo, Jahyun J. January 1900 (has links)
Thesis (M. Eng.). / Written for the Dept. of Electrical and Computer Engineering. Title from title page of PDF (viewed 2008/01/14). Includes bibliographical references.
|
27 |
xBFT Byzantine fault tolerance with high performance, low cost, and aggressive fault isolation /Kotla, Ramakrishna Rao, January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2008. / Vita. Includes bibliographical references.
|
28 |
Efficient communication subsystem for cluster computing /Lee, Chun-ming, January 1998 (has links)
Thesis (M. Phil.)--University of Hong Kong, 1999. / Includes bibliographical references (leaves 89-95).
|
29 |
Stressed-eye analysis and jitter separation for high-speed serial linksRadhakrishnan, Nitin, January 2009 (has links) (PDF)
Thesis (M.S.)--Missouri University of Science and Technology, 2009. / Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed November 17, 2009) Includes bibliographical references (p. 61-62).
|
30 |
On performance improvement of restricted bandwidth multimedia conferencing over packet switched networksElGebaly, Hani H. 08 September 2017 (has links)
Advances in computer technology such as faster processors, better data compression schemes, and cheaper audio and video devices have made it possible to integrate multimedia into the computing environment. Desktop conferencing evolved as a plausible result of this multimedia revolution. The bandwidth granted for these conferencing applications is restricted in most cases by the speed of the modem device connected to the network.
Poor performance of multimedia conferencing over the Internet can be attributed to two main factors: local and remote induced effects. Local effects are induced by bandwidth sharing between different media components, operating system limitations, or poor design. Remote effects include all Internet related problems such as unfairness, nonguaranteed quality of service, congestion, etc. Both effects are addressed in this study and some solutions are proposed. The primary goal is to maintain audio quality and prevent video from degrading audio performance.
We study characteristics of video and audio traffic sources of conferencing applications following the H.323 set of standards defined by the International Telecommunication Union (ITU). The media traffic uses the Real-time Transport Protocol (RTP) and User Datagram Protocol (UDP) as their transport vehicle over IP network protocol. Tradeoffs involved in the choice of multimedia traffic parameters are presented. Our measurements were carried out on audio and video codecs defined in G.723.1 and H.263 specifications respectively, both drafted by the ITU.
This dissertation investigates traffic multiplexing issues at the host, and the interaction of conferencing media components as they are multiplexed locally in a shared bandwidth transport medium. Lack of appropriate multiplexing algorithms can lead to one or more media components oversubscribing to the shared bandwidth and penalizing other participants. These local effects can contribute significantly to traffic delay or abuse of the network bandwidth. We propose the “bit rate adjuster” (BRA) algorithm and use it the network bandwidth. We propose the “bit rate adjuster” (BRA) algorithm and use it for regulating media flow. The algorithm compensates for video local effects induced by packet preparation or processing to allow for better audio performance. A new performance qualifier is introduced and used in the evaluation process.
Further on the remote side, we investigate reactive mechanisms used to recover media flow performance degradation caused by shared bandwidth traffic effects. We overview feedback mechanisms based on the Real-time Control Protocol (RTCP). We uncover its limitation on applications connected to the Internet through narrow bandwidth pipes. We propose an alternative approach that predicts and prevents the loss of audio packets before it occurs based on local computation of audio jitter. We also propose a mechanism that recovers audio traffic from jitter and latency effects introduced by the Internet shared medium. These approaches improve the audio performance significantly in multimedia conferencing sessions. / Graduate
|
Page generated in 0.1423 seconds