• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 342
  • 54
  • 41
  • 39
  • 23
  • 16
  • 15
  • 13
  • 8
  • 8
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 741
  • 289
  • 279
  • 143
  • 99
  • 93
  • 90
  • 86
  • 79
  • 69
  • 64
  • 46
  • 43
  • 43
  • 38
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

A Population Study of the South Cache Elk Herd

McCormack, Roger James 01 May 1951 (has links)
Elk, because of their wide ranging habit and frequently inaccessible habitat, are among our least known big game animals, As a result the management of elk herds has been handicapped by a lack of essential information, Population numbers remain unknown, ranges undefined, and decimating factors unevaluated to cite but three conditions which hamper efficient management on a long range basis.
142

Directory-based Cache Coherence in SMTp Machines without Memory Overhead using Sparse Directories

Kiriwas, Anton 01 January 2004 (has links)
As computing power has increased over the past few decades, science and engineering have found more and more uses for this new found computing power. With the advent of multiprocessor machines, we are achieving MIPS and FLOPS ratings previously unthought-of. Distributed shared-memory machines (DSM) are quickly becoming a powerful tool for computing, and the ability to build them from commodity off-the-shelf parts would be a great benefit to computing in general. In the paper entitled, "SMTp: An Architecture for Next-generation Scalable Multi-threading", Heinrich, et al. presents an architecture for a scalable DSM built from slightly modified machines capable of simultaneous multi-threading (SMT). In this architecture SMT -based machines are connected together via a high-speed network as DSMs with a directory-based cache coherence protocol. What is unique in SMTp is that the cache coherence protocol runs on the second thread in the SMT processors instead of running on an expensive, specialized memory controller. The results of this work show that SMTp can sometimes be even faster than dedicated hardware. In this thesis I intend to present the work on SMTp and extend its capabilities by removing the necessity for memory based directory backing by leveraging the work of Wolf-Dietrich Weber in sparse directories. The removal of the directory backing store will free a large percentage of main memory for work in the system while having only a minor impact on the cache miss rate of applications and overall system throughout.
143

On the Analysis and Management of Cache Networks

Rosensweig, Elisha 01 September 2012 (has links)
Over the past few years Information-Centric Networking, a networking architecture in which host-to-content communication protocols are introduced, has been gaining much attention. A central component of such an architecture is a large-scale interconnected caching system. To date, the modeling of these cache networks, as well as understanding of how they should be managed, are both in their infancy. This dissertation sets out to consider both of these challenges. We consider approximate and bounding analysis of cache network performance, the convergence of such systems to steady-state, and the manner in which content should be searched for in a cache network. Taken as a whole, the work presented here constitutes an array of fundamental tools for addressing the challenges posed by this new and exciting field.
144

CLIENT-SIDE CACHING: REDUCING SERVER LOAD AND LATENCY IN A NETWORK TRAFFIC ANALYSIS TOOL

Södermark, Oskar January 2023 (has links)
Caching is a fundamental technique widely used in the field of computing to reduce network traffic, server load, and latency. Storing frequently accessed data in a high-speed cache layer can make future requests process faster by involving fewer system components when generating and serving the response. Kalix is a software product that demands a caching solution since it faces latency and is frequently processing partially repeated queries. However, a cache does not guarantee improved performance, which is why the main problems of caching are: determining what content to cache, when to insert or remove cache content, implementing the caching logic, and deciding where to store the cache efficiently. Therefore, this paper theoretically investigates where a cache solution should be implemented within the Kalix system architecture to decrease latency and server load, and evaluates the subsequent cache implementation experimentally. As a result, a client-side cache is implemented which decreases the latency of Kalix by up to 74%, while reducing the I/O load and memory utilization on the server by 98%. The reason for the decrease is that the cache in the client can directly serve the majority of the content, allowing the servers of Kalix to do substantially fewer computations. The evaluation acts as a recommendation for the company behindKalix, Polystar, as to if a cache is beneficial and where the cache can efficiently be deployed, and this paper gives valuable insights into the decision-making of cache placement. Concludingly, implementing the cache positively impacts the Kalix user experience.
145

A Study for Reducing Conflict Misses in Data Cache

Ammari, Rami J 08 May 2004 (has links)
During the last two decades, the performance of CPU has been developed much faster than that of memory. In order to reduce the performance gap between CPU and memory, cache memories should have been used between CPU and memory. In general, cache memory is a small and fast buffer to reduce memory access time by saving data in advance before CPU uses. There are two types of cache memory: instruction cache and data cache. In addition, there can be multi-levels (Level 1, 2, ?etc) in memory hierarchy (memory and cache memories) for system purpose: the level 1 (on-chip) cache is the closest one to CPU and it affects system performance directly. In this study, we evaluated two factors in designing an efficient Level 1 data cache. Those factors are: distance between two data in an array and multi xor mapping functions in a bank. We designed a data cache called SLDC (Store/Load Dependent Cache, Two-way) to implement the first factor. This cache uses the distance between two data addresses of data-transfer instructions (load and store). It groups close data into the same group and places into the same bank. The other cache we designed for the second factor is called Multi-XOR (MXOR). The MXOR splits the cache virtually into several zones (2 to 6 areas); a different xor mapping function per area is used to index data (for better cache utilization). In this study, we used the SimpleScalar simulation program to implement data cache with SPEC2000FP benchmark programs. Based on the experiment results, we recommended considering those factors in designing an efficient cache memory since SLDC and MXOR show some improvement (5-to-10%) compared to a conventional cache memory (two-way set-associative).
146

USING RUNTIME INFORMATION TO IMPROVE MEMORY SYSTEM PERFORMANCE

MIN, RUI January 2005 (has links)
No description available.
147

Design and Analysis of Location Cache in a Network-on-Chip Based Multiprocessor System

Ramakrishnan, Divya 20 April 2009 (has links)
No description available.
148

Heterogeneous Cache Architecture in Network-on-Chips

Pattabiraman, Aishwariya January 2011 (has links)
No description available.
149

Hardware techniques to improve cache efficiency

Liu, Haiming 19 October 2009 (has links)
Modern microprocessors devote a large portion of their chip area to caches in order to bridge the speed and bandwidth gap between the core and main memory. One known problem with caches is that they are usually used with low efficiency; only a small fraction of the cache stores data that will be used before getting evicted. As the focus of microprocessor design shifts towards achieving higher performance-perwatt, cache efficiency is becoming increasingly important. This dissertation proposes techniques to improve both data cache efficiency in general and instruction cache efficiency for Explicit Data Graph Execution (EDGE) architectures. To improve the efficiency of data caches and L2 caches, dead blocks (blocks that will not be referenced again before their eviction from the cache) should be identified and evicted early. Prior schemes predict the death of a block immediately after it is accessed, based on the individual reference history of the block. Such schemes result in lower prediction accuracy and coverage. We delay the prediction to achieve better prediction accuracy and coverage. For the L1 cache, we propose a new class of dead-block prediction schemes that predict dead blocks based on cache bursts. A cache burst begins when a block moves into the MRU position and ends when it moves out of the MRU position. Cache burst history is more predictable than individual reference history and results in better dead-block prediction accuracy and coverage. Experiment results show that predicting the death of a block at the end of a burst gives the best tradeoff between timeliness and prediction accuracy/coverage. We also propose mechanisms to improve counting-based dead-block predictors, which work best at the L2 cache. These mechanisms handle reference-count variations, which cause problems for existing counting-based deadblock predictors. The new schemes can identify the majority of the dead blocks with approximately 90% or higher accuracy. For a 64KB, two-way L1 D-cache, 96% of the dead blocks can be identified with a 96% accuracy, half way into a block’s dead time. For a 64KB, four-way L1 cache, the prediction accuracy and coverage are 92% and 91% respectively. At any moment, the average fraction of the dead blocks that has been correctly detected for a two-way or four-way L1 cache is approximately 49% or 67% respectively. For a 1MB, 16-way set-associative L2 cache, 66% of the dead blocks can be identified with a 89% accuracy, 1/16th way into a block’s dead time. At any moment, 63% of the dead blocks in such an L2 cache, on average, has been correctly identified by the dead-block predictor. The ability to accurately identify the majority of the dead blocks in the cache long before their eviction can lead to not only higher cache efficiency, but also reduced power consumption or higher reliability. In this dissertation, we use the dead-block information to improve cache efficiency and performance by three techniques: replacement optimization, cache bypassing, and prefetching into dead blocks. Replacement optimization evicts blocks that become dead after several reuses, before they reach the LRU position. Cache bypassing identifies blocks that cause cache misses but will not be reused if they are written into the cache and do not store these blocks in the cache. Prefetching into dead blocks replaces dead blocks with prefetched blocks that are likely to be referenced in the future. Simulation results show that replacement optimization or bypassing improves performance by 5% and prefetching into dead blocks improves performance by 12% over the baseline prefetching scheme for the L1 cache and by 13% over the baseline prefetching scheme for the L2 cache. Each of these three techniques can turn part of the identified dead blocks into live blocks. As new techniques that can better utilize the space of the dead blocks are found, the deadblock information is likely to become more valuable. Compared to RISC architectures, the instruction cache in EDGE architectures faces challenges such as higher miss rate, because of the increase in code size, and longer miss penalty, because of the large block size and the distributed microarchitecture. To improve the instruction cache efficiency in EDGE architectures, we decouple the next-block prediction from the instruction fetch so that the nextblock prediction can run ahead of instruction fetch and the predicted blocks can be prefetched into the instruction cache before they cause any I-cache misses. In particular, we discuss how to decouple the next-block prediction from the instruction fetch and how to control the run-ahead distance of the next-block predictor in a fully distributed microarchitecture. The performance benefit of such a look-ahead instruction prefetching scheme is then evaluated and the run-ahead distance that gives the best performance improvement is identified. In addition to prefetching, we also estimate the performance benefit of storing variable-sized blocks in the instruction cache. Such schemes reduce the inefficiency caused by storing NOPs in the I-cache and enable the I-cache to store more blocks with the same capacity. Simulation results show that look-ahead instruction prefetching and storing variable-sized blocks can improve the performance of the benchmarks that have high I-cache miss rates by 17% and 18% respectively, out of an ideal 30% performance improvement only achievable by a perfect I-cache. Such techniques will close the gap in I-cache hit rates between EDGE architectures and RISC architectures, although the latter will still have higher I-cache hit rates because of the smaller code size. / text
150

Split array and scalar data cache: A comprehensive study of data cache organization.

Naz, Afrin 08 1900 (has links)
Existing cache organization suffers from the inability to distinguish different types of localities, and non-selectively cache all data rather than making any attempt to take special advantage of the locality type. This causes unnecessary movement of data among the levels of the memory hierarchy and increases in miss ratio. In this dissertation I propose a split data cache architecture that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition. In this system, because scalar and array references will no longer negatively affect each other, cache-interference is diminished, delivering better performance. Further improvement is achieved by the introduction of victim cache, prefetching, data flattening and reconfigurability to tune the array and scalar caches for specific application. The most significant contribution of my work is the introduction of novel cache architecture for embedded microprocessor platforms. My proposed cache architecture uses reconfigurability coupled with split data caches to reduce area and power consumed by cache memories while retaining performance gains. My results show excellent reductions in both memory size and memory access times, translating into reduced power consumption. Since there was a huge reduction in miss rates at L-1 caches, further power reduction is achieved by partially or completely shutting down L-2 data or L-2 instruction caches. The saving in cache sizes resulting from these designs can be used for other processor activities including instruction and data prefetching, branch-prediction buffers. The potential benefits of such techniques for embedded applications have been evaluated in my work. I also explore how my cache organization performs for non-numeric data structures. I propose a novel idea called "Data flattening" which is a profile based memory allocation technique to compress sparsely scattered pointer data into regular contiguous memory locations and explore the potentials of my proposed Spit cache organization for data treated with data flattening method.

Page generated in 0.0299 seconds