• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 472
  • 85
  • 80
  • 20
  • 10
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 832
  • 832
  • 186
  • 173
  • 161
  • 124
  • 121
  • 119
  • 117
  • 101
  • 95
  • 94
  • 83
  • 83
  • 79
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Workload balancing in parallel video encoding

朱啓祥, Chu, Kai-cheung. January 2000 (has links)
published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy
112

Communication centric, multi-core, fine-grained processor architecture

Chadwick, Gregory Andrew January 2013 (has links)
No description available.
113

Architectures and limits of GPU-CPU heterogeneous systems

Wong, Henry Ting-Hei 11 1900 (has links)
As we continue to be able to put an increasing number of transistors on a single chip, the answer to the perpetual question of what the best processor we could build with the transistors is remains uncertain. Past work has shown that heterogeneous multiprocessor systems provide benefits in performance and efficiency. This thesis explores heterogeneous systems composed of a traditional sequential processor (CPU) and highly parallel graphics processors (GPU). This thesis presents a tightly-coupled heterogeneous chip multiprocessor architecture for general-purpose non-graphics computation and a limit study exploring the potential benefits of GPU-like cores for accelerating a set of general-purpose workloads. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with GMA X4500 GPU cores. Pangaea introduces a resource partitioning of the GPU, where 3D graphics-specific hardware is removed to reduce area or add more processing cores, and a 3-instruction extension to the IA32 ISA that supports fast communication between CPU and GPU by building user-level interrupts on top of existing cache coherency mechanisms. By removing graphics-specific hardware on a 65 nm process, the area saved is equivalent to 9 GPU cores, while the power saved is equivalent to 5 cores. Our FPGA prototype shows thread spawn latency improvements from thousands of clock cycles to 26. A set of non-graphics workloads demonstrate speedups of up to 8.8x. This thesis also presents a limit study, where we measure the limit of algorithm parallelism in the context of a heterogeneous system that can be usefully extracted from a set of general-purpose applications. We measure sensitivity to the sequential performance (register read-after-write latency) of the low-cost parallel cores, and latency and bandwidth of the communication channel between the two cores. Using these measurements, we propose system characteristics that maximize area and power efficiencies. As in previous limit studies, we find a high amount of parallelism. We show, however, that the potential speedup on GPU-like systems is low (2.2x - 12.7x) due to poor sequential performance. Communication latency and bandwidth have comparatively small performance effects (<25%). Optimal area efficiency requires a lower-cost parallel processor while optimal power efficiency requires a higher-performance parallel processor than today's GPUs.
114

A parallelism, instruction throughput, and cycle time model of computer architectures

Taha, Tarek M. 12 1900 (has links)
No description available.
115

Implementation structures for message passing operating system architectures

Brass, Opal Ann 08 1900 (has links)
No description available.
116

Architectures and limits of GPU-CPU heterogeneous systems

Wong, Henry Ting-Hei 11 1900 (has links)
As we continue to be able to put an increasing number of transistors on a single chip, the answer to the perpetual question of what the best processor we could build with the transistors is remains uncertain. Past work has shown that heterogeneous multiprocessor systems provide benefits in performance and efficiency. This thesis explores heterogeneous systems composed of a traditional sequential processor (CPU) and highly parallel graphics processors (GPU). This thesis presents a tightly-coupled heterogeneous chip multiprocessor architecture for general-purpose non-graphics computation and a limit study exploring the potential benefits of GPU-like cores for accelerating a set of general-purpose workloads. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with GMA X4500 GPU cores. Pangaea introduces a resource partitioning of the GPU, where 3D graphics-specific hardware is removed to reduce area or add more processing cores, and a 3-instruction extension to the IA32 ISA that supports fast communication between CPU and GPU by building user-level interrupts on top of existing cache coherency mechanisms. By removing graphics-specific hardware on a 65 nm process, the area saved is equivalent to 9 GPU cores, while the power saved is equivalent to 5 cores. Our FPGA prototype shows thread spawn latency improvements from thousands of clock cycles to 26. A set of non-graphics workloads demonstrate speedups of up to 8.8x. This thesis also presents a limit study, where we measure the limit of algorithm parallelism in the context of a heterogeneous system that can be usefully extracted from a set of general-purpose applications. We measure sensitivity to the sequential performance (register read-after-write latency) of the low-cost parallel cores, and latency and bandwidth of the communication channel between the two cores. Using these measurements, we propose system characteristics that maximize area and power efficiencies. As in previous limit studies, we find a high amount of parallelism. We show, however, that the potential speedup on GPU-like systems is low (2.2x - 12.7x) due to poor sequential performance. Communication latency and bandwidth have comparatively small performance effects (<25%). Optimal area efficiency requires a lower-cost parallel processor while optimal power efficiency requires a higher-performance parallel processor than today's GPUs.
117

Performance improvement through predicated execution in VLIW machines / Morteza Biglari-Abhari.

Biglari-Abhari, Morteza January 2000 (has links)
Bibliography: leaves 136-153. / xiv, 153 leaves : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Investigates techniques to achieve performance improvement in Very Long Instruction Word machines through predicated execution. / Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 2000
118

Braids out-of-order performance with almost in-order complexity /

Tseng, Francis, January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
119

Low-energy instruction cache architecture /

Ali, Kashif. January 2006 (has links)
Thesis (M.Sc.)--York University, 2006. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 96-107). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19752
120

Performance improvement through predicated execution in VLIW machines /

Biglari-Abhari, Morteza. January 2000 (has links) (PDF)
Thesis (Ph.D.) -- University of Adelaide, Dept. of Electrical and Electronic Engineering, 2000. / Bibliography: leaves 136-153.

Page generated in 0.0616 seconds