• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 342
  • 54
  • 41
  • 39
  • 23
  • 16
  • 15
  • 13
  • 8
  • 8
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 741
  • 289
  • 279
  • 143
  • 99
  • 93
  • 90
  • 86
  • 79
  • 69
  • 64
  • 46
  • 43
  • 43
  • 38
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Low-energy instruction cache architecture /

Ali, Kashif. January 2006 (has links)
Thesis (M.Sc.)--York University, 2006. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 96-107). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19752
52

Cache-oblivious query processing /

He, Bingsheng. January 2008 (has links)
Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2008. / Includes bibliographical references (leaves 89-100). Also available in electronic version.
53

Data prefetching for high-performance processors /

Chen, Tien-Fu, January 1993 (has links)
Thesis (Ph. D.)--University of Washington, 1993. / Vita. Includes bibliographical references (leaves [121]-129).
54

Data caching in wireless mobile networks /

Xu, Ji. January 2004 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004. / Includes bibliographical references (leaves 57-60). Also available in electronic version. Access restricted to campus users.
55

Array restructuring for cache locality /

Leung, Shun-Tak Albert. January 1996 (has links)
Thesis (Ph. D.)--University of Washington, 1996. / Vita. Includes bibliographical references (p. [177]-186).
56

Optimale Cache-Nutzung für Realzeitsoftware auf Multiprozessorsystemen

Bülow, Alexander von. Unknown Date (has links)
Techn. Universiẗat, Diss., 2006--München.
57

Latency reduction techniques for remote memory access in ANEMONE

Lewandowski, Mark. Gopalan, Kartik. January 2006 (has links)
Thesis (M.S.)--Florida State University, 2006. / Advisor: Kartik Gopalan, Florida State University, College of Arts and Sciences, Dept. of Computer Science. Title and description from dissertation home page (viewed June 6, 2006). Document formatted into pages; contains ix, 43 pages. Includes bibliographical references.
58

Scalable primary cache memory architectures

Agarwal, Vikas. John, Lizy Kurian, Keckler, Stephen W., January 2004 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2004. / Supervisors: Lizy K. John and Stephen W. Keckler. Vita. Includes bibliographical references.
59

LEAST-RECENTLY-USED (LRU) CIRCUIT DESIGN FOR PRIORITIZED CACHE

Eaton, Ronald 01 December 2014 (has links)
In modern embedded systems, real-time applications are often executed on multi-core systems that also run non real-time critical applications. It is well known that cache sharing among multi-core systems or concurrent threads running on a single CPU potentially causes real-time application execution delays. This makes the worst-case execution time (WCET) prediction of these real-time applications more difficult. An encouraging approach to address this problem is prioritized cache. Currently, the implementation of prioritized cache is done at the architectural level using cache controllers. This thesis focuses on the implementation of two prioritized LRU (least-recently-used) cache replacement policy circuits inside the cache circuit to support the prioritized cache operation. This will decrease cache latency. The circuits are implemented using the Synopsys 28nm EDK. Based on the circuit implementation, the area and power overheads associated with prioritized cache are investigated. Two prioritized LRU circuit designs are presented.
60

Optimizing cache utilization in modern cache hierarchies

Huang, Cheng-Chieh January 2016 (has links)
Memory wall is one of the major performance bottlenecks in modern computer systems. SRAM caches have been used to successfully bridge the performance gap between the processor and the memory. However, SRAM cache’s latency is inversely proportional to its size. Therefore, simply increasing the size of caches could result in negative impact on performance. To solve this problem, modern processors employ multiple levels of caches, each of a different size, forming the so called memory hierarchy. Upon a miss, the processor will start to lookup the data from the highest level (L1 cache) to the lowest level (main memory). Such a design can effectively reduce the negative performance impact of simply using a large cache. However, because SRAM has lower storage density compared to other volatile storage, the size of an SRAM cache is restricted by the available on-chip area. With modern applications requiring more and more memory, researchers are continuing to look at techniques for increasing the effective cache capacity. In general, researchers are approaching this problem from two angles: maximizing the utilization of current SRAM caches or exploiting new technology to support larger capacity in cache hierarchies. The first part of this thesis focuses on how to maximize the utilization of existing SRAM cache. In our first work, we observe that not all words belonging to a cache block are accessed around the same time. In fact, a subset of words are consistently accessed sooner than others. We call this subset of words as critical words. In our study, we found these critical words can be predicted by using access footprint. Based on this observation, we propose critical-words-only cache (co cache). Unlike the conventional cache which stores all words that belongs to a block, co-cache only stores the words that we predict as critical. In this work, we convert an L2 cache to a co-cache and use L1s access footprint information to predict critical words. Our experiments show the co-cache can outperform a conventional L2 cache in the workloads whose working-set-sizes are greater than the L2 cache size. To handle the workloads whose working-set-sizes fit in the conventional L2, we propose the adaptive co-cache (acocache) which allows the co-cache to be configured back to the conventional cache. The second part of this thesis focuses on how to efficiently enable a large capacity on-chip cache. In the near future, 3D stacking technology will allow us to stack one or multiple DRAM chip(s) onto the processor. The total size of these chips is expected to be on the order of hundreds of megabytes or even few gigabytes. Recent works have proposed to use this space as an on-chip DRAM cache. However, the tags of the DRAM cache have created a classic space/time trade-off issue. On the one hand, we would like the latency of a tag access to be small as it would contribute to both hit and miss latencies. Accordingly, we would like to store these tags in a faster media such as SRAM. However, with hundreds of megabytes of die-stacked DRAM cache, the space overhead of the tags would be huge. For example, it would cost around 12 MB of SRAM space to store all the tags of a 256MB DRAM cache (if we used conventional 64B blocks). Clearly this is too large, considering that some of the current chip multiprocessors have an L3 that is smaller. Prior works have proposed to store these tags along with the data in the stacked DRAM array (tags-in-DRAM). However, this scheme increases the access latency of the DRAM cache. To optimize access latency in the DRAM cache, we propose aggressive tag cache (ATCache). Similar to a conventional cache, the ATCache caches recently accessed tags to exploit temporal locality; it exploits spatial locality by prefetching tags from nearby cache sets. In addition, we also address the high miss latency issue and cache pollution caused by excessive prefetching. To reduce this overhead, we propose a cost-effective prefetching, which is a combination of dynamic prefetching granularity tunning and hit-prefetching, to throttle the number of sets prefetched. Our proposed ATCache (which consumes 0.4% of overall tag size) can satisfy over 60% of DRAM cache tag accesses on average. The last proposed work in this thesis is a DRAM-Cache-Aware (DCA) DRAM controller. In this work, we first address the challenge of scheduling requests in the DRAM cache. While many recent DRAM works have built their techniques based on a tagsin- DRAM scheme, storing these tags in the DRAM array, however, increases the complexity of a DRAM cache request. In contrast to a conventional request to DRAM main memory, a request to the DRAM cache will now translate into multiple DRAM cache accesses (tag and data). In this work, we address challenges of how to schedule these DRAM cache accesses. We start by exploring whether or not a conventional DRAM controller will work well in this scenario. We introduce two potential designs and study their limitations. From this study, we derive a set of design principles that an ideal DRAM cache controller must satisfy. We then propose a DRAM-cache-aware (DCA) DRAM controller that is based on these design principles. Our experimental results show that DCA can outperform the baseline over 14%.

Page generated in 0.0439 seconds