Global ETD Search

501	Exploiting Parallelism in GPUs Hechtman, Blake Alan January 2014 (has links) <p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget.</p><p>Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications can benefit from a GPU's large computation and memory bandwidth. However, many of these applications are irregular and, as a result, require synchronization and scheduling that are commonly believed to perform poorly on GPUs. The basic building block of synchronization and scheduling is memory consistency, which is, therefore, the first place to look for improving performance on irregular applications. In this thesis, we approach the programmability of irregular applications on GPUs by thinking across traditional boundaries of the compute stack. We think about architecture, microarchitecture and runtime systems from the programmers perspective. To this end, we study architectural memory consistency on future GPUs with cache coherence. In addition, we design a GPU memory system</p><p>microarchitecture that can support fine-grain and coarse-grain synchronization without sacrificing throughput. Finally, we develop a task runtime that embraces the GPU microarchitecture to perform well</p><p>on fork/join parallelism desired by many programmers. Overall, this thesis contributes non-intuitive solutions to improve the performance and programmability of irregular applications from the programmer's perspective.</p> / Dissertation Computer engineering Computer science Cache Coherence GPU Memory Consistency Task Parallelism
502	Performance and Power Optimizations for Highly Reliable Caches Azizabadifarahani, Seyedmostafa 13 November 2013 (has links) This thesis introduces performance and power optimization techniques for caches. Our optimization techniques target both conventional caches, which are implemented using six-transistor (6T) cells, and highly reliable caches implemented using eight-transistor (8T) cells. In 6T cell caches, we enhance leakage power dissipation by adapting a previous proposed technique, Drowsy Cache, according to the application behavior. We show that spatial locality in embedded applications is low and Drowsy Cache misses a significant leakage power saving opportunities. By taking a finer granularity approach, we achieve a significant leakage power reduction with minimal performance overhead. Although 6T cell caches are commonly used, we show that they are not proper choice for future designs due to poor stability. We investigate 8T cells as alternative reliable designs for implementing caches. However, Column Selection Issue limits efficiency of 8T cells during write operations. Previous solution, Read-Modify-Write (RMW), addressed column selection issue by requiring a read operation before each write operation, imposing significant overhead on performance, cache traffic, and power. We observe that a significant share of cache accesses in RMW is either redundant or unnecessary, consequently can be avoided without compromising program execution consistency. Based on our observations, we propose two techniques which exploit a buffering mechanism to detect and filter out unnecessary and redundant cache accesses. Our simulation results show that our techniques improve performance and cache traffic effectively in 8T cell caches. Furthermore, we propose a novel dual threshold 8T cell which reduces leakage power significantly with negligible impact on performance. Our proposed cell also improves stability and robustness to process variations compared to the conventional 8T cells. / Graduate / 0544 / farahani.mostafa@gmail.com dynamic power leakage power reliability 8T cells column selection issue cache
503	School District Reorganization and Consolidation in Cache County, Utah Bagley, Grant Richard 01 January 1964 (has links) A historical study of school organization and school district consolidation enables both educators and lay citizens to have a better understanding and appreciation of schools as they are today. By studying past developments of a given institution, one can better evaluate current requirements and affect future changes as the needs arise. The Cache County School System as presently constituted has evolved over the years from a cluster of small independent village schools with separate boards of education to a highly centralized system with one board of education and consolidated schools. The purpose of this study is to trace and analyze the development of this system. school district reorganization consolidation cache county
504	Semantics-oriented low power architecture Ballapuram, Chinnakrishnan S. 01 April 2008 (has links) Innovations in the microarchitecture and prominent advances in the semiconductor process technology enable sophisticated and powerful microprocessors. However, they also lead to increased power consumption. The main contribution of the thesis is the demonstration of Semantics-Oriented Low Power Architecture techniques that use the semantics of memory references and variables used in an application program to reduce the power consumption in the memory sub-system of a microprocessor. The Semantic-Aware Multilateral Partitioning (SAM) technique reduces the cache and TLB power consumption by decoupling the data TLB lookups and the data cache accesses, based on the semantic regions defined by the programming languages and the software convention, into discrete reference sub-streams, namely, stack, global static, and heap. To reduce the power consumed by the snoops in Chip Multiprocessor, we propose a hardware technique called Selective Snoop Probe (SSP) and a compiler-based hardware supported technique called Essential Snoop Probe (ESP) that use the properties of the program variables. By selectively sending the snoop probes, the SSP and ESP techniques relax the conservative nature of the cache coherency protocol and its implementation to reduce power and improve performance. Semantics Snoop Low-power TLB Cache Computer architecture Microcomputing Energy conservation Memory management (Computer science)
505	Microarchitectural techniques to reduce energy consumption in the memory hierarchy Ghosh, Mrinmoy 03 April 2009 (has links) This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms. Energy Cache Dram Microarchitecture Computer architecture
506	Insights into access patterns of Internet media systems measurements, analysis, and system design / Guo, Lei. January 2008 (has links) Thesis (Ph. D.)--Ohio State University, 2008. / Title from first page of PDF file. Includes bibliographical references (p. 200-208).
507	Parallel PDE solvers on cc-NUMA systems / Nordén, Markus, January 2004 (has links) Lic.-avh. Uppsala : Univ., 2004. / Härtill 4 uppsatser.
508	An efficient algorithm for caching online analytical processing objects in a distributed environment Kamath, Akash S. January 2002 (has links) Thesis (M.S.)--Ohio University, August, 2002. / Title from PDF t.p. Includes bibliographical references (leaves 51-54).
509	Implementierung algorithmischer Optimierungen für Volume-Rendering in Hardware Entwicklung und Simulation eines Multithreading-Pipeline-Prozessors zur Visualisierung dreidimensionaler Datensätze / Vettermann, Bernd, January 2006 (has links) Mannheim, Univ., Diss., 2006.
510	POPCA : optimizing segment caching for peer-to-peer on-demand streaming / Tang, Ho-Shing. January 2008 (has links) Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2008. / Includes bibliographical references (leaves 33-37). Also available in electronic version.

Search results