Global ETD Search

261	Sharing and caching characteristics of Internet content / Wolman, Alastair, January 2002 (has links) Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 137-147).
262	Cache design for low power and yield enhancement Mohammad, Baker Shehadah 13 September 2012 (has links) One of the major limiters to computer systems and systems on chip (SOC) designs is accessing the main memory, which is typically two orders of magnitude slower than the processor. To bridge this gap, modern processors already devote more than half of the on-chip transistors to the last-level cache. Caches have negative impact on area, power, and yield. This research goal is to design caches that operate at lower voltages while enhancing yield. Our strategy is to improve the static noise margin (SNM) and the writability of the conventional six-transistor SRAM cell by reducing the effect of parametric variations on the cell. This is done using a novel circuit that reduces the voltage swing on the word line during read operations and reduces the memory supply voltage during write operations. The proposed circuit increases the SRAM’s SNM and write margin using a single voltage supply that has minimal impacts on chip area, complexity, and timing. A test chip with an 8-kilobyte SRAM block manufactured in 45- nm technology is used to verify the practicality of the contribution and demonstrate the effectiveness of the new circuit’s implementation. Cache organization is one of the most important factors that affect cache design complexity, performance, area, and power. The main architectural choice for caches is whether to implement the tag array using a standard SRAM or using a content addressable memory (CAM). The choice made has far-reaching consequences on several aspects of the cache design, and in particular on power consumption. Our contribution in this area is an in-depth study of the complex tradeoffs of area, timing, power, and design complexity between an SRAM-based tag and a CAM-based one. Our results indicate that an SRAM-based tag design often provides a better overall design point and is superior with respect to energy, especially for interleaved multi-threading processors. Being able to test and screen chips is a key factor in achieving high yield. Most industry standard CAD tools used to analyze fault coverage and generate test vectors require gate level models. However, since caches are typically designed using a transistor-level flow, there is a need for an abstraction step to generate the gate models, which must be equivalent to the actual design (transistor level). The third contribution of the research is a framework to verify that the gate level representation of custom designs is equivalent to the transistor-level design. / text Cache memory--Design
263	A pervasive information framework based on semantic routing and cooperative caching Chen, Weisong, 陳偉松 January 2004 (has links) published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy Routers (Computer networks) Ubiquitous computing. Computer network architectures. Cache memory.
264	On channel adaptive wireless cache invalidation and game theoretic power a ware wireless data access Yeung, Kai-ho, Mark., 楊啟豪. January 2004 (has links) published_or_final_version / abstract / toc / Electrical and Electronic Engineering / Master / Master of Philosophy Cache memory. Mobile computing - Energy consumption. Telecommunication - Traffic.
265	Reducing the Area and Energy of Coherence Directories in Multicore Processors Zebchuk, Jason 14 January 2014 (has links) A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure. Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings. Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads. While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores. These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores. computer science multiprocessors computer architecture cache coherence 0984
266	AutoPilot: A Message-Passing Parallel Programming Library for the IMAPCAR2 Kelly, Benjamin 14 March 2013 (has links) The IMAPCAR2 from Renesas Electronics is an embedded realtime image processor, combining a single core with a 128-way SIMD array. At runtime, sections of the SIMD array can be retasked as additional CPU cores, interconnected via a message passing ring. Using these cores effectively, however, is made difficult by the low-level nature of the message passing API and the lack of cache coherency between processors. Developing and debugging software for this platform is a difficult task. The AutoPilot library addresses this by providing a high-level message-oriented parallel programming model for the IMAPCAR2. AutoPilot's API is closely based on that of Pilot, a wrapper around the Message Passing Interface (MPI) for cluster computing. By reimplementing the Pilot API for the IMAPCAR2, AutoPilot shows that its processes-and-channels architecture is a viable choice for parallel programming on cache-incoherent multicore architectures. At the same time, it provides a simpler API for programmers, with builtin safety checks that eliminate some common sources of errors. Pilot IMAPCAR2 parallel programming embedded multicore cache coherence message passing
267	Reducing the Area and Energy of Coherence Directories in Multicore Processors Zebchuk, Jason 14 January 2014 (has links) A key challenge in architecting a multicore processor is efficiently maintaining cache coherence. Directory protocols offer a scalable, bandwidth-efficient solution to this problem, but unfortunately they incur significant area overheads. This dissertation proposes three novel coherence directory designs that address the challenge of maintaining coherence in multicore processors, while reducing the area and energy overheads of the directory structure. Firstly, I propose the Phantom directory that leverages the abundance of storage in large shared caches to reduce the area devoted to a dedicated coherence directory. This approach faces a significant challenge since an access to the shared cache typically requires more energy than for a smaller dedicated directory. Phantom attempts to overcome this challenge by exploiting the spatial locality common to most applications, and by utilizing a very small dedicated directory cache, but the costs of accessing the shared cache still outweigh Phantom's area savings. Building upon the simple observation that at any point in time, large, continuous chunks of memory are often accessed by only a single core, my second proposed design, the multi-grain directory (MGD), takes advantage of this common application behaviour to reduce the directory size by tracking coherence at multiple different granularities. I demonstrate that a practical dual-grain directory (DGD) provides a robust solution, reducing directory area by 41% while maintaining good performance across a variety of workloads. While MGD provides a practical approach to reducing directory area, my third proposed design, the Tagless directory, takes a more innovative approach to achieving true scalability. Tagless embraces imprecision by embedding sharing information in a number of space-efficient Bloom filters. Careful consideration produces an elegant design with robust performance comparable to an ideal coherence directory. For a sixteen core processor, Tagless reduces directory area by up to 70% while reducing cache and directory energy consumption. My analysis also indicates that Tagless continues to provide an area and energy efficient directory as processors scale to tens or even hundreds of cores. These three innovative designs advance the state-of-the-art by providing more area and energy efficient coherence directories to allow multicore processors to scale to tens or hundreds of cores. computer science multiprocessors computer architecture cache coherence 0984
268	Scalably Verifiable Cache Coherence Zhang, Meng January 2013 (has links) <p>The correctness of a cache coherence protocol is crucial to the system since a subtle bug in the protocol may lead to disastrous consequences. However, the verification of a cache coherence protocol is never an easy task due to the complexity of the protocol. Moreover, as more and more cores are compressed into a single chip, there is an urge for the cache coherence protocol to have higher performance, lower power consumption, and less storage overhead. People perform various optimizations to meet these goals, which unfortunately, further exacerbate the verification problem. The current situation is that there are no efficient and universal methods for verifying a realistic cache coherence protocol for a many-core system. </p><p>We, as architects, believe that we can alleviate the verification problem by changing the traditional design paradigm. We suggest taking verifiability as a first-class design constraint, just as we do with other traditional metrics, such as performance, power consumption, and area overhead. To do this, we need to incorporate verification effort in the early design stage of a cache coherence protocol and make wise design decisions regarding the verifiability. Such a protocol will be amenable to verification and easier to be verified in a later stage. Specifically, we propose two methods in this thesis for designing scalably verifiable cache coherence protocols. </p><p>The first method is Fractal Coherence, targeting verifiable hierarchical protocols. Fractal Coherence leverages the fractal idea to design a cache coherence protocol. The self-similarity of the fractal enables the inductive verification of the protocol. Such a verification process is independent of the number of nodes and thus is scalable. We also design example protocols to show that Fractal Coherence protocols can attain comparable performance compared to a traditional snooping or directory protocol. </p><p>As a system scales hierarchically, Fractal Coherence can perfectly solve the verification problem of the implemented cache coherence protocol. However, Fractal Coherence cannot help if the system scales horizontally. Therefore, we propose the second method, PVCoherence, targeting verifiable flat protocols. PVCoherence is based on parametric verification, a widely used method for verifying the coherence of a flat protocol with infinite number of nodes. PVCoherence captures the fundamental requirements and limitations of parametric verification and proposes a set of guidelines for designing cache coherence protocols that are compatible with parametric verification. As long as designers follow these guidelines, their protocols can be easily verified. </p><p>We further show that Fractal Coherence and PVCoherence can also facilitate the verification of memory consistency, another extremely challenging problem. One piece of previous work proves that the verification of memory consistency can be decomposed into three steps. The most complex and non-scalable step is the verification of the cache coherence protocol. If we design the protocol following the design methodology of Fractal Coherence or PVCoherence, we can easily verify the cache coherence protocol and overcome the biggest obstacle in the verification of memory consistency. </p><p>As system expands and cache coherence protocols get more complex, the verification problem of the protocol becomes more prominent. We believe it is time to reconsider the traditional design flow in which verification is totally separated from the design stage. We show that by incorporating the verifiability in the early design stage and designing protocols to be scalably verifiable in the first place, we can greatly reduce the burden of verification. Meanwhile, we perform various experiments and show that we do not lose benefits in performance as well as in other metrics when we obtain the correctness guarantee.</p> / Dissertation Computer engineering Cache Coherence Many-core Scalability Verification
269	Extending caching for two applications : disseminating live data and accessing data from disks Vellanki, Vivekanand 12 1900 (has links) No description available. Cache memory Memory management (Computer science) World Wide Web
270	COMMERCIALIZATION AND OPTIMIZATION OF THE PIXEL ROUTER Dominick, Steven James 01 January 2010 (has links) The Pixel Router was developed at the University of Kentucky with the intent of supporting multi-projector displays by combining the scalability of commercial software solutions with the flexibility of commercial hardware solutions. This custom hardware solution uses a Look Up Table for an arbitrary input to output pixel mapping, but suffers from high memory latencies due to random SDRAM accesses. In order for this device to achieve marketability, the image interpolation method needed improvement as well. The previous design used the nearest neighbor interpolation method, which produces poor looking results but requires the least amount of memory accesses. A cache was implemented to support bilinear interpolation to simultaneously increase the output frame rate and image quality. A number of software simulations were conducted to test and refine the cache design, and these results were verified by testing the implementation on hardware. The frame rate was improved by a factor of 6 versus bilinear interpolation on the previous design, and by as much as 50% versus nearest neighbor on the previous design. The Pixel Router was also certified for FCC conducted and radiated emissions compliance, and potential commercial market areas were explored. Pixel Router Cache Bilinear Interpolation LUT Commercialization Electrical and Computer Engineering

Search results