Global ETD Search

111	Communication centric, multi-core, fine-grained processor architecture Chadwick, Gregory Andrew January 2013 (has links) No description available. 004
112	Architectures and limits of GPU-CPU heterogeneous systems Wong, Henry Ting-Hei 11 1900 (has links) As we continue to be able to put an increasing number of transistors on a single chip, the answer to the perpetual question of what the best processor we could build with the transistors is remains uncertain. Past work has shown that heterogeneous multiprocessor systems provide benefits in performance and efficiency. This thesis explores heterogeneous systems composed of a traditional sequential processor (CPU) and highly parallel graphics processors (GPU). This thesis presents a tightly-coupled heterogeneous chip multiprocessor architecture for general-purpose non-graphics computation and a limit study exploring the potential benefits of GPU-like cores for accelerating a set of general-purpose workloads. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with GMA X4500 GPU cores. Pangaea introduces a resource partitioning of the GPU, where 3D graphics-specific hardware is removed to reduce area or add more processing cores, and a 3-instruction extension to the IA32 ISA that supports fast communication between CPU and GPU by building user-level interrupts on top of existing cache coherency mechanisms. By removing graphics-specific hardware on a 65 nm process, the area saved is equivalent to 9 GPU cores, while the power saved is equivalent to 5 cores. Our FPGA prototype shows thread spawn latency improvements from thousands of clock cycles to 26. A set of non-graphics workloads demonstrate speedups of up to 8.8x. This thesis also presents a limit study, where we measure the limit of algorithm parallelism in the context of a heterogeneous system that can be usefully extracted from a set of general-purpose applications. We measure sensitivity to the sequential performance (register read-after-write latency) of the low-cost parallel cores, and latency and bandwidth of the communication channel between the two cores. Using these measurements, we propose system characteristics that maximize area and power efficiencies. As in previous limit studies, we find a high amount of parallelism. We show, however, that the potential speedup on GPU-like systems is low (2.2x - 12.7x) due to poor sequential performance. Communication latency and bandwidth have comparatively small performance effects (<25%). Optimal area efficiency requires a lower-cost parallel processor while optimal power efficiency requires a higher-performance parallel processor than today's GPUs. Multicore processors Computer architecture GPU
113	A parallelism, instruction throughput, and cycle time model of computer architectures Taha, Tarek M. 12 1900 (has links) No description available. Computer architecture Microprocessors Design and construction
114	Implementation structures for message passing operating system architectures Brass, Opal Ann 08 1900 (has links) No description available. Operating systems (Computers) Computer architecture
115	Architectures and limits of GPU-CPU heterogeneous systems Wong, Henry Ting-Hei 11 1900 (has links) As we continue to be able to put an increasing number of transistors on a single chip, the answer to the perpetual question of what the best processor we could build with the transistors is remains uncertain. Past work has shown that heterogeneous multiprocessor systems provide benefits in performance and efficiency. This thesis explores heterogeneous systems composed of a traditional sequential processor (CPU) and highly parallel graphics processors (GPU). This thesis presents a tightly-coupled heterogeneous chip multiprocessor architecture for general-purpose non-graphics computation and a limit study exploring the potential benefits of GPU-like cores for accelerating a set of general-purpose workloads. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with GMA X4500 GPU cores. Pangaea introduces a resource partitioning of the GPU, where 3D graphics-specific hardware is removed to reduce area or add more processing cores, and a 3-instruction extension to the IA32 ISA that supports fast communication between CPU and GPU by building user-level interrupts on top of existing cache coherency mechanisms. By removing graphics-specific hardware on a 65 nm process, the area saved is equivalent to 9 GPU cores, while the power saved is equivalent to 5 cores. Our FPGA prototype shows thread spawn latency improvements from thousands of clock cycles to 26. A set of non-graphics workloads demonstrate speedups of up to 8.8x. This thesis also presents a limit study, where we measure the limit of algorithm parallelism in the context of a heterogeneous system that can be usefully extracted from a set of general-purpose applications. We measure sensitivity to the sequential performance (register read-after-write latency) of the low-cost parallel cores, and latency and bandwidth of the communication channel between the two cores. Using these measurements, we propose system characteristics that maximize area and power efficiencies. As in previous limit studies, we find a high amount of parallelism. We show, however, that the potential speedup on GPU-like systems is low (2.2x - 12.7x) due to poor sequential performance. Communication latency and bandwidth have comparatively small performance effects (<25%). Optimal area efficiency requires a lower-cost parallel processor while optimal power efficiency requires a higher-performance parallel processor than today's GPUs. Multicore processors Computer architecture GPU
116	Performance improvement through predicated execution in VLIW machines / Morteza Biglari-Abhari. Biglari-Abhari, Morteza January 2000 (has links) Bibliography: leaves 136-153. / xiv, 153 leaves : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Investigates techniques to achieve performance improvement in Very Long Instruction Word machines through predicated execution. / Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 2000 Compilers (Computer programs) Computer architecture
117	Braids out-of-order performance with almost in-order complexity / Tseng, Francis, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
118	Low-energy instruction cache architecture / Ali, Kashif. January 2006 (has links) Thesis (M.Sc.)--York University, 2006. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 96-107). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19752
119	Performance improvement through predicated execution in VLIW machines / Biglari-Abhari, Morteza. January 2000 (has links) (PDF) Thesis (Ph.D.) -- University of Adelaide, Dept. of Electrical and Electronic Engineering, 2000. / Bibliography: leaves 136-153.
120	Mapping and localisation for mobile robots / Yang, Tun Yong, January 1900 (has links) Thesis (M. App. Sc.)--Carleton University, 2004. / Includes bibliographical references (p. 185-200). Also available in electronic format on the Internet.

Search results