• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 333
  • 50
  • 41
  • 39
  • 23
  • 15
  • 14
  • 13
  • 8
  • 8
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 726
  • 282
  • 277
  • 143
  • 98
  • 86
  • 86
  • 85
  • 78
  • 66
  • 59
  • 43
  • 42
  • 42
  • 37
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Efficient and Hierarchical Architectures for WWW Cache Design

Yang, Chieh-Hsiang 26 July 2000 (has links)
For the past few years, WWW (World Wide Web) traffic has been tremendously growing on the Internet. However, it ironically becomes ¡§World Wide Wait¡¨ due to overloaded server and/or seriously congested network. Almost any computer system that suffers from latency or bandwidth problems can benefit from caching. The introduction of cache concept to WWW server certainly reduce the waiting time at clients by efficiently relieving both the server and network load. The purpose of this thesis is to design a hierarchical cache system so that it can work efficiently. The cache servers being used today have encountered the problems of lack of efficient collaboration due to different configuration flavors of management. In other words, the highest level of a hierarchical cache system may easily become the bottleneck, which in turn slow down the entire cache system. In our cache design, we apply inclusive and exclusive relationship to modify the ICP (Internet Cache Protocol) and use the recursive concept to build a hierarchical architecture to avoid the un-necessary information query. This implies that the lowest level can fetch the data from the destination server directly and perform the update recursively to its upper level. The up-side-down traffic flow, reffered to as reverse traffic flow in this thesis, can substantially release the load of upper levels in the hierarchy. For the purpose of performance evaluation, we derive a general mathematical equation by analyzing the operation procedures step by step. The analytical results have shown that the highest level within this hierarchy can reduce almost 50% of the load under the worst-case assumption. Although the lower levels may slightly increase their work load, it does significantly increase the overall WWW efficiency and avoid the potential bottlenecks by balancing the loads among different levels.

Performance Evaluation of An Allocatable Cache Design

Tsou, Hsiang-Hua 06 September 2000 (has links)
In a single chip multiprocessor, the ratio of off-chip communication time and on-chip processing time become larger and larger along with the advancement of VLSI technology. Hence, the number of off-chip memory accesses will become a dominant factor of system performance. We have developed a hardware/software design of an on-chip allocatable cache. In this design, we take into account the pre-measured cache size requirement of the executed program. The operating system can then allocate proper cache size to the corresponding processor by cache submodule re-allocation. Hence, programs with different cache size requirements can then be adjusted their cache size dynamically for proper cache allocation in order to increase the overall hit ratio of on-chip caches as well as the system performance. To validate the achievable the performance improvement, we designed simulators of the allocatable cache design, the dedicated cache design, and the fully shared cache design together with the multiprocessor simulation environment. We extracted execution traces from a set of real programs and measured their cache hit ratios on different sizes of cache capacities. We performed the single-chip multiprocessor simulation with these data. We randomize the time periods of cache characteristics changes to replace the executed programs in each processor during the multiprocessor simulation. The performance experiments reveal that the allocatable cache design obtains the best overall cache hit ratio and total program execution time. Although the fully shared cache design can have performance near that of the allocatable cache design, it has a draw back of much larger interconnection cost.

Processor memory traffic characteristics for on-chip cache

Ho, Yui Luen, Ho, Jeremy Yui Luen 16 April 1992 (has links)
The motivation of this research is to study different cache designs for on-chip caches that improve processor performance and at the same time minimize the degradation to system performance caused by an increase in the processor memory traffic. As VLSI technology advances we can have bigger and more complex on-chip caches that could not have been possible a few years ago. Results derived from on-chip caches and performance issues are basically similar to off-chip caches. In this study, we will concentrate on single level on-chip caches though there are many interesting issues relating system performance, memory traffic and multi-level caches. / Graduation date: 1992

An experimental system for evaluating cache coherence protocols in shared memory multiprocessors / Peter John Ashenden.

Ashenden, Peter J. January 1997 (has links)
Copy of author's previously published article inserted. / Bibliography: leaves 240-246. / xvi, 246 leaves : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / This thesis examines cache coherence protocols designed for use in bus connected shared memory multiprocessors. / Thesis (Ph.D.)--University of Adelaide, Dept. of Computer Science, 1997?

Improving Instruction Fetch Rate with Code Pattern Cache for Superscalar Architecture

Beg, Azam Muhammad 06 August 2005 (has links)
In the past, instruction fetch speeds have been improved by using cache schemes that capture the actual program flow. In this proposal, we present the architecture of a new instruction cache named code pattern cache (CPC); the cache is used with superscalar processors. CPC?s operation is based on the fundamental principles that: common programs tend to repeat their execution patterns; and efficient storage of a program flow can enhance the performance of an instruction fetch mechanism. CPC saves basic blocks (sets of instructions separated by control instructions) and their boundary addresses while the code is running. Basic blocks and their addresses are stored in two separate structures, called block pointer cache (BPC) and basic block cache (BBC), respectively. Later, if the same basic block sequence is expected to execute, it is fetched from CPC, instead of the instruction cache; this mechanism results in higher likelihood of delivering a larger number of instructions in every clock cycle. We developed single and multi-threaded simulators for TC, BC, and CPC, and used them with 10 SPECint2000 benchmarks. The simulation results demonstrated CPC?s advantage over TC and BC, in terms of trace miss rate and average trace length. Additionally, we used cache models to quantify the timing, area, and power for the three cache schemes. Using an aggregate performance index that combined the simulation and modeling results, CPC was shown to perform better than both TC and BC. During our research, each of the TC-, BC-, or CPC- configurations took 4-6 hours to simulate, so performance comparison of these caches proved to be a very time-consuming process. Neural network models (NNM?s) can be time-efficient alternatives to simulations, so we studied their feasibility to represent the cache behavior. We developed two NNM's, one to predict the trace miss rate and the other to predict the average trace length for the three caches. The NNM's modeled the caches with reasonable accuracy, and produced results in a fraction of a second.


QI, BIN January 2007 (has links)
No description available.


Gopalakrishnan, Lavanya 01 August 2011 (has links)
With the advancement of technology, multi-cores with shared cache have been used in real-time applications. In such systems, some cores run real-time applications and some cores run other non-critical applications that do not have strict deadline. Due to the sharing of cache by multi-core processors, problems predicting the actual execution time and the execution time of real-time applications have emerged. To address these problems, cache memory with prioritized replacement policy is proposed. Most of the work is carried out in high-level hardware designs and software based application level designs. No low-level hardware implementations of cache memory with prioritized replacement circuits have been designed to the best of my knowledge. My thesis focuses on designing a LRU replacement circuit that is prioritized based on the application the processor is running. Real-time applications acquire priority in using the cache memory over other applications which enhance the seamless execution of the real-time application and hence supports execution time predictability which in turn helps improve the potential of multi-core computing of real-time systems. The speed, size and power overhead are analyzed by placing the N-way set associative LRU as a part of cache of size 128KB designed using 65nm CMOS technology.

Improving Energy and Area Scalability of the Cache Hierarchy in CMPs

Valls Mompó, Joan Josep 07 April 2017 (has links)
As the core counts increase in each chip multiprocessor generation, CMPs should improve scalability in performance, area, and energy consumption to meet the demands of larger core counts. Directory-based protocols constitute the most scalable alternative. A conventional directory, however, suffers from an inefficient use of storage and energy. First, the large, non-scalable, sharer vectors consume unnecessary area and leakage, especially considering that most of the blocks tracked in a directory are cached by a single core. Second, although increasing directory size and associativity could boost system performance by reducing the coverage misses, it would come at the expense of area and energy consumption. This thesis focuses and exploits the important differences of behavior between private and shared blocks from the directory point of view. These differences claim for a separate management of both types of blocks at the directory. First, we propose the PS-Directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely Shared Directory Cache, and uses a larger and slower second-level Private Directory Cache to track the large amount of private blocks. Experimental results show that, compared to a conventional directory, the PS-Directory improves performance while also reducing silicon area and energy consumption. In this thesis we also show that the shared/private ratio of entries in the directory varies across applications and across different execution phases within the applications, which encourages us to propose Dynamic Way Partitioning (DWP) Directory. DWP-Directory reduces the number of ways with storage for shared blocks and it allows this storage to be powered off or on at run-time according to the dynamic requirements of the applications following a repartitioning algorithm. Results show similar performance as a traditional directory with high associativity, and similar area requirements as recent state-of-the-art schemes. In addition, DWP-Directory achieves notable static and dynamic power consumption savings. This dissertation also deals with the scalability issues in terms of power found in processor caches. A significant fraction of the total power budget is consumed by on-chip caches which are usually deployed with a high associativity degree (even L1 caches are being implemented with eight ways) to enhance the system performance. On a cache access, each way in the corresponding set is accessed in parallel, which is costly in terms of energy. This thesis presents the PS-Cache architecture, an energy-efficient cache design that reduces the number of accessed ways without hurting the performance. The PS-Cache takes advantage of the private-shared knowledge of the referenced block to reduce energy by accessing only those ways holding the kind of block looked up. Results show significant dynamic power consumption savings. Finally, we propose an energy-efficient architectural design that can be effectively applied to any kind of set-associative cache memory, not only to processor caches. The proposed approach, called the Tag Filter (TF) Architecture, filters the ways accessed in the target cache set, and just a few ways are searched in the tag and data arrays. This allows the approach to reduce the dynamic energy consumption of caches without hurting their access time. For this purpose, the proposed architecture holds the X least significant bits of each tag in a small auxiliary X-bit-wide array. These bits are used to filter the ways where the least significant bits of the tag do not match with the bits in the X-bit array. Experimental results show that this filtering mechanism achieves energy consumption in set-associative caches similar to direct mapped ones. Experimental results show that the proposals presented in this thesis offer a good tradeoff among these three major design axes. / Conforme se incrementa el número de núcleos en las nuevas generaciones de multiprocesadores en chip, los CMPs deben de escalar en prestaciones, área y consumo energético para cumplir con las demandas de un número núcleos mayor. Los protocolos basados en directorio constituyen la alternativa más escalable. Un directorio convencional, no obstante, sufre de una utilización ineficiente de almacenamiento y energía. En primer lugar, los grandes y poco escalables vectores de compartidores consumen una cantidad de energía de fuga y de área innecesaria, especialmente si se tiene en consideración que la mayoría de los bloques en un directorio solo se encuentran en la cache de un único núcleo. En segundo lugar, aunque incrementar el tamaño y la asociatividad del directorio aumentaría las prestaciones del sistema, esto supondría un incremento notable en el consumo energético. Esta tesis estudia las diferencias significativas entre el comportamiento de bloques privados y compartidos en el directorio, lo que nos lleva hacia una gestión separada para cada uno de los tipos de bloque. Proponemos el PS-Directory, una cache de directorio de dos niveles que mantiene el reducido número de las entradas compartidas, que son los que se acceden con más frecuencia, en una estructura pequeña de primer nivel (concretamente, la Shared Directory Cache) y que utiliza una estructura más grande y lenta en el segundo nivel (Private Directory Cache) para poder mantener la información de los bloques privados. Los resultados experimentales muestran que, comparado con un directorio convencional, el PS-Directory consigue mejorar las prestaciones a la vez que reduce el área de silicio y el consumo energético. Ya que el ratio compartido/privado de las entradas en el directorio varia entre aplicaciones y entre las diferentes fases de ejecución dentro de las aplicaciones, proponemos el Dynamic Way Partitioning (DWP) Directory. El DWP-Directory reduce el número de vías que almacenan entradas compartidas y permite que éstas se enciendan o apaguen en tiempo de ejecución según los requisitos dinámicos de las aplicaciones según un algoritmo de reparticionado. Los resultados muestran unas prestaciones similares a un directorio tradicional de alta asociatividad y un área similar a otros esquemas recientes del estado del arte. Adicionalmente, el DWP-Directory obtiene importantes reducciones de consumo estático y dinámico. Esta disertación también se enfrenta a los problemas de escalabilidad que se pueden encontrar en las memorias cache. En un acceso a la cache, se accede a cada vía del conjunto en paralelo, siendo así un acción costosa en energía. Esta tesis presenta la arquitectura PS-Cache, un diseño energéticamente eficiente que reduce el número de vías accedidas sin perjudicar las prestaciones. La PS-Cache utiliza la información del estado privado-compartido del bloque referenciado para reducir la energía, ya que tan solo accedemos a un subconjunto de las vías que mantienen los bloques del tipo solicitado. Los resultados muestran unos importantes ahorros de energía dinámica. Finalmente, proponemos otro diseño de arquitectura energéticamente eficiente que se puede aplicar a cualquier tipo de memoria cache asociativa por conjuntos. La propuesta, la Tag Filter (TF) Architecture, filtra las vías accedidas en el conjunto de la cache, de manera que solo se mira un número reducido de vías tanto en el array de etiquetas como en el de datos. Esto permite que nuestra propuesta reduzca el consumo de energía dinámico de las caches sin perjudicar su tiempo de acceso. Los resultados experimentales muestran que este mecanismo de filtrado es capaz de obtener un consumo energético en caches asociativas por conjunto similar de las caches de mapeado directo. Los resultados experimentales muestran que las propuestas presentadas en esta tesis consiguen un buen compromiso entre estos tres importantes pilares de diseño. / Conforme s'incrementen el nombre de nuclis en les noves generacions de multiprocessadors en xip, els CMPs han d'escalar en prestacions, àrea i consum energètic per complir en les demandes d'un nombre de nuclis major. El protocols basats en directori són l'alternativa més escalable. Un directori convencional, no obstant, pateix una utilització ineficient d'emmagatzematge i energia. En primer lloc, els grans i poc escalables vectors de compartidors consumeixen una quantitat d'energia estàtica i d'àrea innecessària, especialment si es considera que la majoria dels blocs en un directori només es troben en la cache d'un sol nucli. En segon lloc, tot i que incrementar la grandària i l'associativitat del directori augmentaria les prestacions del sistema, això suposaria un increment notable en el consum d'energia. Aquesta tesis estudia les diferències significatives entre el comportament de blocs privats i compartits dins del directori, la qual cosa ens guia cap a una gestió separada per a cada un dels tipus de bloc. Proposem el PS-Directory, una cache de directori de dos nivells que manté el reduït nombre de les entrades de blocs compartits, que són els que s'accedeixen amb més freqüència, en una estructura menuda de primer nivell (concretament, la Shared Directory Cache) i que empra una estructura més gran i lenta en el segon nivell (Private Directory Cache) per poder mantenir la informació dels blocs privats. Els resultats experimentals mostren que, comparat amb un directori convencional, el PS-Directory aconsegueix millorar les prestacions a la vegada que redueix l'àrea de silici i el consum energètic. Ja que la ràtio compartit/privat de les entrades en el directori varia entre aplicacions i entre les diferents fases d'execució dins de les aplicacions, proposem el Dynamic Way Partitioning (DWP) Directory. DWP-Directory redueix el nombre de vies que emmagatzemen entrades compartides i permeten que aquest s'encengui o apagui en temps d'execució segons els requeriments dinàmics de les aplicacions seguint un algoritme de reparticionat. Els resultats mostren unes prestacions similars a un directori tradicional d'alta associativitat i una àrea similar a altres esquemes recents de l'estat de l'art. Adicionalment, el DWP-Directory obté importants reduccions de consum estàtic i dinàmic. Aquesta dissertació també s'enfronta als problemes d'escalabilitat que es poden tro- bar en les memòries cache. Les caches on-chip consumeixen una part significativa del consum total del sistema. Aquestes caches implementen un alt nivell d'associativitat. En un accés a la cache, s'accedeix a cada via del conjunt en paral·lel, essent així una acció costosa en energia. Aquesta tesis presenta l'arquitectura PS-Cache, un disseny energèticament eficient que redueix el nombre de vies accedides sense perjudicar les prestacions. La PS-Cache utilitza la informació de l'estat privat-compartit del bloc referenciat per a reduir energia, ja que només accedim al subconjunt de vies que mantenen blocs del tipus sol·licitat. Els resultats mostren uns importants estalvis d'energia dinàmica. Finalment, proposem un altre disseny d'arquitectura energèticament eficient que es pot aplicar a qualsevol tipus de memòria cache associativa per conjunts. La proposta, la Tag Filter (TF) Architecture, filtra les vies accedides en el conjunt de la cache, de manera que només un reduït nombre de vies es miren tant en el array d'etiquetes com en el de dades. Això permet que la nostra proposta redueixi el consum dinàmic energètic de les caches sense perjudicar el seu temps d'accés. Els resultats experimentals mostren que aquest mecanisme de filtre és capaç d'obtenir un consum energètic en caches associatives per conjunt similar al de les caches de mapejada directa. Els resultats experimentals mostren que les propostes presentades en aquesta tesis conseguixen un bon compromís entre aquestros tres importants pilars de diseny. / Valls Mompó, JJ. (2017). Improving Energy and Area Scalability of the Cache Hierarchy in CMPs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/79551 / TESIS

Kompiuterių hierarchinės atminties sistemos tyrimas / The study of computer hierarchical memory

Rimavičius, Vidmantas 23 May 2005 (has links)
The operating speed of computers tends to increase significantly, however, this process is not simple. It can be explained, that operating speed depends on how fast the computer facilities are as well as their balance. Modern processors can perform operations within several cycles meanwhile the selection time of big size main memory reaches tens and hundreds of cycles. Although the static memory able to operate at speed equal or close to processor’s operating speed exists, it’s using for main memory is expensive. Problem is solved by installing the small size cache between processor and main memory. Relatively small but very fast memory called cache takes a specific position in modern computer memories system. Cache is a highest level of hierarchical memories system. Cache simulator for exploring of cache behaviour was developed. Cache’s influence on computer efficiency from both theoretical and practical point of view, the latter to be supported with simulation results, is analysed in this master thesis. Comparing the theoretical and test results the influence of different factors to the operation of hierarchical memories system is evaluated. The results of cache simulation show that the operation of hierarchical memory system is impacted by functioning of cache levels, the frequence of accesses to the memory, the hit rate (or miss rate), the cache organisation, line replacement algorithm, cache size, cache line size as well as specific properties of program executed.

Volume lease: a scalable cache consistency framework

Yin, Jian 28 August 2008 (has links)
Not available / text

Page generated in 0.0916 seconds