• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 4
  • Tagged with
  • 28
  • 28
  • 28
  • 18
  • 18
  • 14
  • 13
  • 10
  • 9
  • 9
  • 7
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A performance study of multithreading

Kwak, Hantak 07 December 1998 (has links)
As the performance gap between processor and memory grows, memory latency will be a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question however remains as to how effective multithreading is on tolerating memory latency. Due to the current availability of powerful microprocessors, high-speed networks and software infrastructure systems, a cost-effective parallel machine is often realized using a network of workstations. Therefore, we examine the possibility and the effectiveness of using multithreading in a networked computing environment. Also, we propose the Multithreaded Virtual Processor model as a means of integrating multithreaded programming paradigm and modern superscalar processor with support for fast context switching and thread scheduling. In order to validate our idea, a simulator was developed using a POSIX compliant Pthreads package and a generic superscalar simulator called Simple Scalar glued together with support for multithreading. The simulator is a powerful workbench that enables us to study how future superscalar design and thread management should be modified to better support multithreading. Our studies with MVP show that, in general, the performance improvement comes not only from tolerating memory latency, but also due to the data sharing among threads. / Graduation date: 1999
2

Static lock allocation

Halpert, Richard L. January 1900 (has links)
Thesis (M.Sc.). / Written for the School of Computer Science. Title from title page of PDF (viewed 2008/12/05). Includes bibliographical references.
3

Exploiting thread-level parallelism on simultaneous multithreaded processors /

Lo, Jack Lee-jay, January 1998 (has links)
Thesis (Ph. D.)--University of Washington, 1998. / Vita. Includes bibliographical references (p. [144]-155).
4

A study of hardware/software multithreading

Carlson, Ryan L. 04 June 1998 (has links)
As the design of computers advances, two important trends have surfaced: The exploitation of parallelism and the design against memory latency. Into these two new trends has come the Multithreaded Virtual Processor (MVP). Based on a standard superscalar core, the MVP is able to exploit both Instruction Level Parallelism (ILP) and, utilizing the concepts of multithreading, is able to further exploit Thread Level Parallelism (TLP) in program code. By combining both hardware and software multithreading techniques into a new hybrid model, the MVP is able to use fast hardware context switching techniques along with both hardware and software scheduling. The new hybrid creates a processor capable of exploiting long memory latency operations to increase parallelism, while introducing both minimal software overhead and hardware design changes. This thesis will explore the MVP model and simulator and provide results that illustrate MVP's effectiveness and demonstrate its recommendation to be included in future processor designs. Additionally, the thesis will show that MVP's effectiveness is governed by four main considerations: (1) The data set size relative to the cache size, (2) the number of hardware contexts/threads supported, (3) the amount of locality within the data sets, and (4) the amount of exploitable parallelism within the algorithms. / Graduation date: 1999
5

CDP a multithreaded implementation of a network communication protocol on the Cyclops-64 multithreaded architecture /

Gan, Ge. January 2007 (has links)
Thesis (M.S.)--University of Delaware, 2006. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
6

Run-time loop parallelization with efficient dependency checking on GPU-accelerated platforms

Zhang, Chenggang, 张呈刚 January 2011 (has links)
General-Purpose computing on Graphics Processing Units (GPGPU) has attracted a lot of attention recently. Exciting results have been reported in using GPUs to accelerate applications in various domains such as scientific simulations, data mining, bio-informatics and computational finance. However, up to now GPUs can only accelerate data-parallel loops with statically analyzable parallelism. Loops with dynamic parallelism (e.g., with array accesses through subscripted subscripts), an important pattern in many general-purpose applications, cannot be parallelized on GPUs using existing technologies. Run-time loop parallelization using Thread Level Speculation (TLS) has been proposed in the literatures to parallelize loops with statically un-analyzable dependencies. However, most of the existing TLS systems are designed for multiprocessor/multi-core CPUs. GPUs have fundamental differences with CPUs in both hardware architecture and execution model, making the previous TLS designs not work or inefficient when ported to GPUs. This thesis presents GPUTLS, a runtime system designed to support speculative loop parallelization on GPUs. The design of GPU-TLS addresses several key problems encountered when adapting TLS to GPUs: (1) To reduce the possibility of mis-speculation, deferred-update memory versioning scheme is adopted to avoid mis-speculations caused by inter-iteration WAR and WAW dependencies. A technique named intra-warp value forwarding is proposed to respect some inter-iteration RAW dependencies, which further reduces the mis-speculation possibility. (2) An incremental speculative execution scheme is designed to exploit partial parallelism within loops. This avoids excessive re-executions and reduces the mis-speculation penalty. (3) The dependency checking among thousands of speculative GPU threads poses large overhead and can easily become the performance bottleneck. To lower the overhead, we design several e_cient dependency checking schemes named PRW+BDC, SW, SR, SRW+EDC, and SRW+LDC respectively. (4) We devise a novel parallel commit scheme to avoid the overhead incurred by the serial commit phase in most existing TLS designs. We have carried out extensive experiments on two platforms with different NVIDIA GPUs, using both a synthetic loop that can simulate loops with different characteristics and several loops from real-life applications. Testing results show that the proposed intra-warp value forwarding and eager dependency checking techniques can improve the performance for almost all kinds of loop patterns. We observe that compared with other dependency checking schemes, SR and SW can achieve better performance in most cases. It is also shown that the proposed parallel commit scheme is especially useful for loops with large write set size and small number of inter-iteration WAW dependencies. Overall, GPU-TLS can achieve speedups ranging from 5 to 105 for loops with dynamic parallelism. / published_or_final_version / Computer Science / Master / Master of Philosophy
7

Breaking away from the OS shadow a program execution model aware thread virtual machine for multicore architectures /

Cuvillo, Juan del. January 2008 (has links)
Thesis (Ph.D.)--University of Delaware, 2008. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
8

Compiler optimization of value communication for thread-level speculation /

Zhai, Antonia. January 1900 (has links)
Thesis (Ph. D.)--Carnegie Mellon University, 2005. / "January 13, 2005." Includes bibliographical references.
9

Instruction fetching, scheduling, and forwarding in a dynamic multithreaded processor /

Browning, Adam W. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2007. / Printout. Includes bibliographical references (leaves 36-37). Also available on the World Wide Web.
10

Parallelization and performance optimization of bioinformatics and biomedical applications targeted to advanced computer architectures

Niu, Yanwei. January 2005 (has links)
Thesis (Ph.D.)--University of Delaware, 2005. / Principal faculty advisors: Kenneth E. Barner and Guang Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.

Page generated in 0.0923 seconds