Spelling suggestions: "subject:"cache"" "subject:"vache""
61 |
Optimizing cache utilization in modern cache hierarchiesHuang, Cheng-Chieh January 2016 (has links)
Memory wall is one of the major performance bottlenecks in modern computer systems. SRAM caches have been used to successfully bridge the performance gap between the processor and the memory. However, SRAM cache’s latency is inversely proportional to its size. Therefore, simply increasing the size of caches could result in negative impact on performance. To solve this problem, modern processors employ multiple levels of caches, each of a different size, forming the so called memory hierarchy. Upon a miss, the processor will start to lookup the data from the highest level (L1 cache) to the lowest level (main memory). Such a design can effectively reduce the negative performance impact of simply using a large cache. However, because SRAM has lower storage density compared to other volatile storage, the size of an SRAM cache is restricted by the available on-chip area. With modern applications requiring more and more memory, researchers are continuing to look at techniques for increasing the effective cache capacity. In general, researchers are approaching this problem from two angles: maximizing the utilization of current SRAM caches or exploiting new technology to support larger capacity in cache hierarchies. The first part of this thesis focuses on how to maximize the utilization of existing SRAM cache. In our first work, we observe that not all words belonging to a cache block are accessed around the same time. In fact, a subset of words are consistently accessed sooner than others. We call this subset of words as critical words. In our study, we found these critical words can be predicted by using access footprint. Based on this observation, we propose critical-words-only cache (co cache). Unlike the conventional cache which stores all words that belongs to a block, co-cache only stores the words that we predict as critical. In this work, we convert an L2 cache to a co-cache and use L1s access footprint information to predict critical words. Our experiments show the co-cache can outperform a conventional L2 cache in the workloads whose working-set-sizes are greater than the L2 cache size. To handle the workloads whose working-set-sizes fit in the conventional L2, we propose the adaptive co-cache (acocache) which allows the co-cache to be configured back to the conventional cache. The second part of this thesis focuses on how to efficiently enable a large capacity on-chip cache. In the near future, 3D stacking technology will allow us to stack one or multiple DRAM chip(s) onto the processor. The total size of these chips is expected to be on the order of hundreds of megabytes or even few gigabytes. Recent works have proposed to use this space as an on-chip DRAM cache. However, the tags of the DRAM cache have created a classic space/time trade-off issue. On the one hand, we would like the latency of a tag access to be small as it would contribute to both hit and miss latencies. Accordingly, we would like to store these tags in a faster media such as SRAM. However, with hundreds of megabytes of die-stacked DRAM cache, the space overhead of the tags would be huge. For example, it would cost around 12 MB of SRAM space to store all the tags of a 256MB DRAM cache (if we used conventional 64B blocks). Clearly this is too large, considering that some of the current chip multiprocessors have an L3 that is smaller. Prior works have proposed to store these tags along with the data in the stacked DRAM array (tags-in-DRAM). However, this scheme increases the access latency of the DRAM cache. To optimize access latency in the DRAM cache, we propose aggressive tag cache (ATCache). Similar to a conventional cache, the ATCache caches recently accessed tags to exploit temporal locality; it exploits spatial locality by prefetching tags from nearby cache sets. In addition, we also address the high miss latency issue and cache pollution caused by excessive prefetching. To reduce this overhead, we propose a cost-effective prefetching, which is a combination of dynamic prefetching granularity tunning and hit-prefetching, to throttle the number of sets prefetched. Our proposed ATCache (which consumes 0.4% of overall tag size) can satisfy over 60% of DRAM cache tag accesses on average. The last proposed work in this thesis is a DRAM-Cache-Aware (DCA) DRAM controller. In this work, we first address the challenge of scheduling requests in the DRAM cache. While many recent DRAM works have built their techniques based on a tagsin- DRAM scheme, storing these tags in the DRAM array, however, increases the complexity of a DRAM cache request. In contrast to a conventional request to DRAM main memory, a request to the DRAM cache will now translate into multiple DRAM cache accesses (tag and data). In this work, we address challenges of how to schedule these DRAM cache accesses. We start by exploring whether or not a conventional DRAM controller will work well in this scenario. We introduce two potential designs and study their limitations. From this study, we derive a set of design principles that an ideal DRAM cache controller must satisfy. We then propose a DRAM-cache-aware (DCA) DRAM controller that is based on these design principles. Our experimental results show that DCA can outperform the baseline over 14%.
|
62 |
Collaboration dans une fédération de consommateurs de données liées / Collaboration in a Federation of Linked Data ConsumersFolz, Pauline 12 October 2017 (has links)
Les producteurs de données ont publié des millions de faits RDF sur le Web en suivant les principes des données liées. N’importe qui peut récupérer des informations utiles en interrogeant les données liées avec des requêtes SPARQL. Ces requêtes sont utiles dans plusieurs domaines, comme la santé ou le journalisme des données. Cependant, il y a un compromis entre la performance des requêtes et la disponibilité des données lors de l’exécution des requêtes SPARQL. Dans cette thèse, nous étudions comment la collaboration des consommateurs de données ouvre de nouvelles opportunités concernant ce compromis. Plus précisément, comment la collaboration des consommateurs de données peut : améliorer les performances sans dégrader la disponibilité, ou améliorer la disponibilité sans dégrader les performances. Nous considérons que les données liées permettent à n’importe qui d’exécuter un médiateur compact qui peut interroger des sources de données sur le Web grâce à des requêtes SPARQL. L’idée principale est de connecter ces médiateurs ensemble pour construire une fédération de consommateurs de données liées. Dans cette fédération, chaque médiateur interagit avec un sous-ensemble du réseau. Grâce à cette fédération, nous avons construit : (i) un cache décentralisé hébergé par les médiateurs. Ce cache côté client permet de prendre en charge une part importante des sous-requêtes et d’améliorer la disponibilité des données avec un impact faible sur les performances. (ii) un algorithme de délégation qui permet aux médiateurs de déléguer leurs requêtes à d’autres médiateurs. Nous démontrons que la délégation permet d’exécuter un ensemble de requêtes plus rapidement quand les médiateurs collaborent. Cela améliore les performances sans dégrader la disponibilité des données. / Following the Linked Data principles, data providers have published billions of RDF facts on the web. Anyone can retrieve some relevant information from the Linked Data by executing SPARQL queries. Such queries are useful in many domains including health or data journalism. However, there is a trade-off between performances of the queries and data availability when executing SPARQL queries. In this thesis, we have investigated how the collaboration of data consumers is opening new opportunities in this trade-off. More precisely, how the collaboration of data consumers can improve performances without degrading availability, or can improve availability without degrading performances. We consider that Linked Data can allow anyone to run a compact mediator that executes SPARQL queries over data sources on the web. The main idea is to connect these mediators together to build a federation of Linked Data consumers. In this federation, each mediator interacts with a subset of the network. Thanks to this federation, we have built : (i) a decentralized cache hosted by mediators. This client-side cache is able to handle a significative part of subqueries and then improve data availability without a low impact on performances. (ii) a delegation algorithm that allows mediators to delegate their queries to other mediators. We have demonstrated that delegation allows to run the workloads faster when collaborating. This clearly improves performances without degrading data availability.
|
63 |
Cache Creek group and contiguous rocks, near Cache Creek, B.C.Shannon, Kenneth Robb January 1982 (has links)
The Cache Creek Group in the type area is characterized by oceanic rocks such as radiolarian chert, fusulinid limestone and pillow basalt. Three divisions have been made in the Cache Creek Group in this study: 1) structurally lowest is the melange unit (which has been identified as a subduction complex); 2) an overlying greenstone unit; and 3) the Marble Canyon Formation. Emplacement of the Marble Canyon Formation and greenstone unit on the underlying melange unit is believed to have occurred in the Early to Mid-Jurassic along a shallow dipping thrust fault. This emplacement may have caused soft sediment deformation features in the Early to Mid-Jurassic Ashcroft Formation.
Felsic volcanic rocks and associated tuffs and volcaniclastic sediments are found mainly along the east side of the Cache Creek Group. These felsic rocks have been called the Nicola(?) Group and based on lithological correlation are of probable Late Triassic age. The Nicola(?) Group is correlated both with the western belt of the Nicola Group as described by Preto (1977) and the Pavilion beds as described by Trettin (1961). Blocks of Nicola(?) Group tuffs have been found in the Cache Creek Group melange unit. This indicates that in Late Triassic time the Cache Creek Group and Nicola(?) Group were adjacent to one another.
Paleoenvironmental and geochemical evidence indicate an ocean island or platform depositional environment for the Cache Creek Group. Tropical shallow seas covered most of these islands. Lack of continental sediments indicates that the Cache
Creek Group was distant from any major land masses. / Science, Faculty of / Earth, Ocean and Atmospheric Sciences, Department of / Graduate
|
64 |
Historical Aspects of the Attempt to Meet Mental Health Needs in Cache ValleyWatkins, Patricia 01 January 1973 (has links)
No description available.
|
65 |
Analise e simulação de protocolos de coerencia de cache para sistemas multiprocessadosAtta, Antonio Carlos Fontes 03 February 1994 (has links)
Orientador : Celio Cardoso Guimarães / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Ciencia da Computação / Made available in DSpace on 2018-07-18T21:33:36Z (GMT). No. of bitstreams: 1
Atta_AntonioCarlosFontes_M.pdf: 2874033 bytes, checksum: 354e0760c095cd674547d6934b1b14c1 (MD5)
Previous issue date: 1994 / Resumo: Para garantir um rendimento aceitável dos sistemas multiprocessados de memória compartilhada através da redução das disputas pelo acesso à memória e à rede de interconexão, memórias cache têm sido utilizadas, a exemplo dos sistemas monoprocessados, para armazenar localmente as informações mais freqüentemente requeridas pelos processadores. A possibilidade de existência de diversas cópias de um mesmo dado espalhadas pelos caches do sistema, entretanto, dá origem ao problema da consistência ou coerência da informação armazenada em cache nos sistemas multiprocessados. Nesta dissertação, nós avaliamos conceitualmente algumas das soluções propostas para o problema, explorando tanto as soluções voltadas a sistemas multiprocessados que adotam o barramento como rede de interconexão, quanto as soluções voltadas a redes mais genéricas, como as redes tipo multiestágios. Adicionalmente, o estudo dessa última classe de soluções é aprofundado para 2 soluções básicas da classe, a que emprega diretórios totalmente mapeados e a que emprega diretórios limitados, sendo proposta uma extensão à técnica de diretórios limitados de modo a tornar seu desempenho tão alto quanto o obtido com os diretórios totalmente mapeados - mais caros em termos de espaço - mantendo a mesma eficiência de espaço da solução original. Para comparar as três soluções foi desenvolvido um simulador baseado na geração sintética de referências à memória a partir das estatísticas divulgadas de aplicações paralelas reais. / Abstract: In order to guarantee reasonable performance of shared-memory multiprocessors reducing memory and interconnect network contention, cache memories have been used, as in uniprocessors systems, to keep locally frequently required by processors information. The possibility of existence of many modifiable copies of the same data spread into the caches of the system originates the cache coherence problem though. In this dissertation, we conceptually study some of the proposed solutions to the problem, exploring solutions suitable for shared bus multiprocessors and solutions oriented to systems where the processors and memories are interconnected by more general networks, such as multistage network. Furthemore this last class of solutions is detailed for 2 basic techniques, full map directories and limited directories. We propose an extension to the limited directory technique with the aim of getting performance as high as with full map directories - which are more expensive with regard to space - but with the same space efficiency of the original solution. In order to compare these 3 solutions we developped a simulator based on synthetic trace derived from real applications. / Mestrado / Mestre em Ciência da Computação
|
66 |
Evaluation of Memory Prefetching Techniques for Modem ApplicationsNyholm, Gustav January 2022 (has links)
Processor performance has increased far faster than memories have been able to keep up with, forcing processor designers to use caches in order to bridge the speed difference. This can increase performance significantly for programs that utilize the caches efficiently but results in significant performance penalties when data is not in cache. One way to mitigate this problem is to to make sure that data is cached before it is needed using memory prefetching. This thesis focuses on different ways to perform prefetching in systems with strict area and energy requirements by evaluating a number of prefetch techniques based on performance in two programs as well as metrics such as coverage and accuracy. Both data and instruction prefetching are investigated. The studied techniques include a number of versions of next line prefetching, prefetching based on stride identification and history as well as post-increment based prefetching. While the best increase in program performance is achieved using next 2 lines prefetching it comes at a significant energy cost as well as drastically increased memory traffic making it unsuitable for use in energy-constrained applications. RPT-based prefetching on the other hand gives a good balance between performance and cost managing to improve performance by 4% and 7% for two programs while keeping the impact on both area and energy minimal.
|
67 |
Seepage Evaluations in Cache Valley Irrigation CanalsMolina, Katerine N. 01 May 2008 (has links)
Estimation of seepage was done in 39 selected reaches of 11 irrigation canals in the Logan and Blacksmith Fork Irrigation Systems of Cache Valley, Utah. The measurements were performed from June to October, 2008, which includes part of the irrigation season for these canals. The inflow-outflow method was used to measure seepage, in which area and velocities were measured under steady flow conditions. Velocity measurements were done with an acoustic flow meter and the mean velocity was determined using the reduced-point method (velocity measurements at 0.2, 0.6 and or 0.8 of the depth from the water surface).
As a result, reaches with the highest seepage losses were identified. Gaining streams, losing streams, and gaining-losing streams were also identified. Spatial variation was observed along each canal in which a descending trend of the mean seepage loss was found in the downstream direction. Additionally, spatial variation was found between canals, the reaches located in the east part of Logan city presented higher seepage losses than reaches on the west side of the city. Temporal variations were identified by a monthly comparison of seepage losses within reaches which indicated higher seepage losses during late July and August of 2008. Additionally, this report presents comments about the performance of the FlowTracker® ADV® in the present project.
|
68 |
CONTROLLING CACHE PARTITIONSIZES TO INCREASE APPLICATIONRELIABILITYSuuronen, Janne, Nasiri, Jawid January 2018 (has links)
A problem with multi-core platforms is the competition of shared cache memory which is also knownas cache contention. Cache contention can negatively affect process reliability, since it can increaseexecution time jitter. Cache contention may be caused by inter-process interference in a system.To minimize the negative effects of inter-process interference, cache memory can be partitionedwhich would isolate processes from each other.In this work, two questions related to cache-coloring based cache partition sizes have been inves-tigated. The first question is how we can use knowledge of execution characteristics of an algorithmto create adequate partition sizes. The second question is to investigate if sweet spots can be foundto determine when cache-coloring based cache partitioning is worth using. The investigation ofthe two questions is conducted using two experiments. The first experiment put focus on howstatic partition sizes affect process reliability and isolation. The second experiment investigatesboth questions by using L3 cache misses caused by a running process to determine partition sizesdynamically.Results from the first experiment shows static partition sizes to increase process reliability andisolation compared to a non-isolated system. The second experiment outcomes shows the dynamicpartition sizes to provide even better process reliability compared to the static approach. Collectively,all results have been fairly identical and therefore sweets spots could not be found. Contributionsfrom our work is a cache partitioning controller and metrics showing the effects of static and dy-namic partitions sizes.
|
69 |
Emulating Variable Block Size CachesMuthulaxmi, S 05 1900 (has links) (PDF)
No description available.
|
70 |
An Evaluation of Intel Cache Allocation Technology for Data- Intensive Applications / En utvärdering av Intel Cache Allocation Technology för dataintensiva applikationerIhre Sherif, Alan January 2021 (has links)
On certain CPUs part of the Intel Xeon Scalable CPU family, the level three (L3) cache is shared among the CPU cores residing on the same CPU socket. This has benefits in that a larger and more scalable cache space is available to the CPU cores. However, when the L3 cache is shared between CPU cores and thereby by the applications running there, the applications can affect the performance of each other if some of them have high L3 cache usage. This can be particularly problematic if an application is over-utilizing the L3 cache and effectively evicting the data of other applications, which are more prioritized, from the L3 cache. Such applications are called L3 cache noisy neighbors. The experiments in this thesis study the effect L3 cache noisy neighbors have on other, more prioritized, applications and if Intel Cache Allocation Technology (CAT) can be used to limit the performance impact the noisy neighbors have. Intel CAT provides functionality to control the amount of L3 cache allocated to a CPU core and by allocating less L3 cache to a noisy neighbor it no longer shares as much L3 cache with the prioritized applications and thus the prioritized applications can again utilize more of the L3 cache and regain their performance. The research question of this thesis is to investigate in what cases Intel CAT can provide advantages and where it is a disadvantage to use it by studying its use for three commonly used applications; bzip2, Redis, and Graph500. All the three applications were significantly impacted when running simultaneously with a noisy neighbor and for the Redis application there was a decrease of 49.2% in the number of ’GET’ requests per second that the Redis server could handle and an 18.2% decrease for ’SET’ requests. For the bzip2 and Graph500 applications, there was a 14.7% and 28.1% increase in execution time respectively. Intel CAT was successfully used to limit the impact of the noisy neighbor on the three applications. For the Redis application, the number of requests per second increased by 8.6% for the ’GET’ operation and by 4.2% for the ’SET’ operation. For the bzip2 and Graph500 applications, there was a 5.8% and 12.0% decrease in execution time respectively. Moreover, the thesis studies the scenario when only prioritized applications are running and if their performance can be increased by isolating the L3 cache for each one of them so that they cannot cause L3 cache evictions for each other. The use case of Intel CAT in such a scenario is not as clear as when mitigating the impact of a noisy neighbor but some performance benefits can be observed when running multiple Redis instances on the same machine and isolating some of the L3 cache available to them. / För vissa processorer som tillhör familjen Intel Xeon Scalable är den tredje nivåns cache (L3-cache) delad mellan CPU-kärnorna som befinner sig på samma CPU-sockel. Detta har fördelen att ett större och mer skalbart cacheutrymme blir tillgängligt för CPU-kärnorna. Att L3-cache är delat mellan kärnorna innebär däremot att applikationerna som kör där kan påverka varandras prestanda om någon av dem överutnyttjar L3-cache. När en applikation överutnyttjar L3-cache leder det till att data från andra applikationer, som kan vara mer prioriterade, inte längre får plats i cachen. Sådana applikationer kallas för ”L3-cache noisy neigbors”. Experimenten i denna studie undersöker effekterna av L3-cache noisy neigbors på mer prioriterade applikationer och om Intel Cache Allocation Technology (CAT) kan användas för att begränsa den påverkan som L3-cache noisy neigbors har. Intel CAT har funktionalitet för att kontrollera mängden L3-cache som allokeras till en CPU-kärna och genom att allokera mindre L3-cache till en noisy neigbor så delar den inte lika mycket L3-cache med de prioriterade applikationerna och därmed kan de prioriterade applikationerna återfå sin prestanda. Frågeställningen för denna studie är att undersöka i vilka användningsområden Intel CAT har fördelar och när det är en nackdel att använda det genom att studera dess användning för tre välanvända applikationer, bzip2, Redis och Graph500. Prestandan för alla av dessa tre applikationer blev tydligt påverkad när de kördes samtidigt som en noisy neigbor och Intel CAT kunde användas för att minska den påverkan. För Redis ökade antalet frågor som hanterades av Redis med 8.6% för GET-operationer och 4.2% för SET-operationer. För bzip2 och Graph500 observerades en minskning i exekveringstid på 5.8% och 12.0% respektive. Denna uppsats undersöker även scenariot där bara prioriterade applikationer körs och om deras prestanda kan ökas genom att isolera L3-cache för var och en av dem så att de inte tar plats från varandra i L3-cachen. När Intel CAT användes i ett sådant scenario är fördelarna inte lika tydliga som när påverkan av en noisy neighbor begränsades men en viss förbättring i prestanda går att observera när flera Redisservrar körs på samma maskin och en del av L3-cachen isoleras till var och en av dem.
|
Page generated in 0.0721 seconds