Spelling suggestions: "subject:"cache"" "subject:"vache""
501 |
Exploiting Parallelism in GPUsHechtman, Blake Alan January 2014 (has links)
<p>Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget.</p><p>Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications can benefit from a GPU's large computation and memory bandwidth. However, many of these applications are irregular and, as a result, require synchronization and scheduling that are commonly believed to perform poorly on GPUs. The basic building block of synchronization and scheduling is memory consistency, which is, therefore, the first place to look for improving performance on irregular applications. In this thesis, we approach the programmability of irregular applications on GPUs by thinking across traditional boundaries of the compute stack. We think about architecture, microarchitecture and runtime systems from the programmers perspective. To this end, we study architectural memory consistency on future GPUs with cache coherence. In addition, we design a GPU memory system</p><p>microarchitecture that can support fine-grain and coarse-grain synchronization without sacrificing throughput. Finally, we develop a task runtime that embraces the GPU microarchitecture to perform well</p><p>on fork/join parallelism desired by many programmers. Overall, this thesis contributes non-intuitive solutions to improve the performance and programmability of irregular applications from the programmer's perspective.</p> / Dissertation
|
502 |
Performance and Power Optimizations for Highly Reliable CachesAzizabadifarahani, Seyedmostafa 13 November 2013 (has links)
This thesis introduces performance and power optimization techniques for caches. Our optimization techniques target both conventional caches, which are implemented using six-transistor (6T) cells, and highly reliable caches implemented using eight-transistor (8T) cells.
In 6T cell caches, we enhance leakage power dissipation by adapting a previous proposed technique, Drowsy Cache, according to the application behavior. We show that spatial locality in embedded applications is low and Drowsy Cache misses a significant leakage power saving opportunities. By taking a finer granularity approach, we achieve a significant leakage power reduction with minimal performance overhead.
Although 6T cell caches are commonly used, we show that they are not proper choice for future designs due to poor stability. We investigate 8T cells as alternative reliable designs for implementing caches. However, Column Selection Issue limits efficiency of 8T cells during write operations. Previous solution, Read-Modify-Write (RMW), addressed column selection issue by requiring a read operation before each write operation, imposing significant overhead on performance, cache traffic, and power.
We observe that a significant share of cache accesses in RMW is either redundant or unnecessary, consequently can be avoided without compromising program execution consistency. Based on our observations, we propose two techniques which exploit a buffering mechanism to detect and filter out unnecessary and redundant cache accesses. Our simulation results show that our techniques improve performance and cache traffic effectively in 8T cell caches.
Furthermore, we propose a novel dual threshold 8T cell which reduces leakage power significantly with negligible impact on performance. Our proposed cell also improves stability and robustness to process variations compared to the conventional 8T cells. / Graduate / 0544 / farahani.mostafa@gmail.com
|
503 |
School District Reorganization and Consolidation in Cache County, UtahBagley, Grant Richard 01 January 1964 (has links)
A historical study of school organization and school district consolidation enables both educators and lay citizens to have a better understanding and appreciation of schools as they are today. By studying past developments of a given institution, one can better evaluate current requirements and affect future changes as the needs arise. The Cache County School System as presently constituted has evolved over the years from a cluster of small independent village schools with separate boards of education to a highly centralized system with one board of education and consolidated schools. The purpose of this study is to trace and analyze the development of this system.
|
504 |
Semantics-oriented low power architectureBallapuram, Chinnakrishnan S. 01 April 2008 (has links)
Innovations in the microarchitecture and prominent advances in the semiconductor process technology enable sophisticated and powerful microprocessors. However, they also lead to increased power consumption. The main contribution of the thesis is the demonstration of Semantics-Oriented Low Power Architecture techniques that use the semantics of memory references and variables used in an application program to reduce the power consumption in the memory sub-system of a microprocessor. The Semantic-Aware Multilateral Partitioning (SAM) technique reduces the cache and TLB power consumption by decoupling the data TLB lookups and the data cache accesses, based on the semantic regions defined by the programming languages and the software convention, into discrete reference sub-streams, namely, stack, global static, and heap. To reduce the power consumed by the snoops in Chip Multiprocessor, we propose a hardware technique called Selective Snoop Probe (SSP) and a compiler-based hardware supported technique called Essential Snoop Probe (ESP) that use the properties of the program variables. By selectively sending the snoop probes, the SSP and ESP techniques relax the conservative nature of the cache coherency protocol and its implementation to reduce power and improve performance.
|
505 |
Microarchitectural techniques to reduce energy consumption in the memory hierarchyGhosh, Mrinmoy 03 April 2009 (has links)
This thesis states that dynamic profiling of the memory reference stream can improve energy
and performance in the memory hierarchy. The research presented in this theses provides
multiple instances of using lightweight hardware structures to profile the memory
reference stream. The objective of this research is to develop microarchitectural techniques
to reduce energy consumption at different levels of the memory hierarchy. Several simple
and implementable techniques were developed as a part of this research. One of the
techniques identifies and eliminates redundant refresh operations in DRAM and reduces
DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for
multiprocessor systems. The emphasis of this research has been to develop several techniques
of obtaining energy savings in caches using a simple hardware structure called the
counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain
energy savings by not accessing the L2 cache on a predicted miss. A simple extension of
this technique allows CBFs to do way-estimation of set associative caches to reduce energy
in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and
reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic
power consumption in level one caches using significance compression. The significant
energy and performance improvements demonstrated by the techniques presented in this
thesis suggest that this work will be of great value for designing memory hierarchies of
future computing platforms.
|
506 |
Insights into access patterns of Internet media systems measurements, analysis, and system design /Guo, Lei. January 2008 (has links)
Thesis (Ph. D.)--Ohio State University, 2008. / Title from first page of PDF file. Includes bibliographical references (p. 200-208).
|
507 |
Parallel PDE solvers on cc-NUMA systems /Nordén, Markus, January 2004 (has links)
Lic.-avh. Uppsala : Univ., 2004. / Härtill 4 uppsatser.
|
508 |
An efficient algorithm for caching online analytical processing objects in a distributed environmentKamath, Akash S. January 2002 (has links)
Thesis (M.S.)--Ohio University, August, 2002. / Title from PDF t.p. Includes bibliographical references (leaves 51-54).
|
509 |
Implementierung algorithmischer Optimierungen für Volume-Rendering in Hardware Entwicklung und Simulation eines Multithreading-Pipeline-Prozessors zur Visualisierung dreidimensionaler Datensätze /Vettermann, Bernd, January 2006 (has links)
Mannheim, Univ., Diss., 2006.
|
510 |
POPCA : optimizing segment caching for peer-to-peer on-demand streaming /Tang, Ho-Shing. January 2008 (has links)
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2008. / Includes bibliographical references (leaves 33-37). Also available in electronic version.
|
Page generated in 0.024 seconds