Global ETD Search

381	Návrh a implementace prostředků pro zvýšení výkonu procesoru / Design and Implementation of Mechanisms for Enhancing Performance of CPU Zlatohlávková, Lucie January 2007 (has links) This masters thesis is focused on the issue of processor architecture. The ground of this project is a design of a simple processor, which is enriched by modern components in processor architecture such as pipelining, cache memory and branch prediction. The processor has been made in VHDL programming language and was simulated in ModelSim simulation tool.
382	A Dual-Port Data Cache with Pseudo-Direct Mapping Function Gade, Arul Sandeep 07 May 2005 (has links) Conventional on-chip (L1) data caches such as Direct-Mapped (DM) and 2-way Set-Associative Caches (SAC) have been widely used for high-performance uni (or multi)-processors. Unfortunately, these schemes suffer from high conflict misses since more than one address is mapped onto the same cache line. To reduce the conflict misses, much research has been done in developing different cache architectures such as 2-way Skewed-Associative cache (Skew cache). The 2-way Skew cache has a hardware complexity equivalent to that of 2-way SAC and has a miss-rate approaching that of 4-way SAC. However, the reduction in the miss-rate using a Skew cache is limited by the confined space available to disperse the conflicting accesses over small memory banks. This research proposes a dual-port data cache called Pseudo-Direct Cache (PDC) to minimize the conflict misses by dispersing addresses effectively over a single memory bank. Our simulation results show that PDC reduces those misses significantly compared to any conventional L1 caches and also achieves 10-15% lesser miss-rates than a 2-way Skew cache. SimpleScalar simulator is used for these simulations with SPEC95FP benchmark programs. Similar results were also seen over SPEC2000FP benchmark programs. Simulations over CACTI 3.0 were performed to evaluate the hardware implications of PDC over Skew cache. The simulation results show that the PDC has a simple hardware complexity similar to 2-way SAC and has 4-15% better AMAT compared to 2-way Skew cache. The PDC also reduces execution cycles significantly. Writeback rates Xor mapping functions Skewed mapping Execution time AMAT Energy consumption Data cache Miss rates Pseudo direct cache
383	The Multi-tiered Future of Storage: Understanding Cost and Performance Trade-offs in Modern Storage Systems Iqbal, Muhammad Safdar 19 September 2017 (has links) In the last decade, the landscape of storage hardware and software has changed considerably. Storage hardware has diversified from hard disk drives and solid state drives to include persistent memory (PMEM) devices such as phase change memory (PCM) and Flash-backed DRAM. On the software side, the increasing adoption of cloud services for building and deploying consumer and enterprise applications is driving the use of cloud storage services. Cloud providers have responded by providing a plethora of choices of storage services, each of which have unique performance characteristics and pricing. We argue this variety represents an opportunity for modern storage systems, and it can be leveraged to improve operational costs of the systems. We propose that storage tiering is an effective technique for balancing operational or de- ployment costs and performance in such modern storage systems. We demonstrate this via three key techniques. First, THMCache, which leverages tiering to conserve the lifetime of PMEM devices, hence saving hardware upgrade costs. Second, CAST, which leverages tiering between multiple types of cloud storage to deliver higher utility (i.e. performance per unit of cost) for cloud tenants. Third, we propose a dynamic pricing scheme for cloud storage services, which leverages tiering to increase the cloud provider's profit or offset their management costs. / Master of Science multi-tier storage persistent memory cloud storage in-memory cache multi-level cache MapReduce analytics workflows pricing games dynamic pricing
384	Entropy: algoritmo de substituição de linhas de cache inspirado na entropia da informação. / Entropy: cache line replacement algorithm inspired in information entropy. Kobayashi, Jorge Mamoru 07 June 2010 (has links) Este trabalho apresenta um estudo sobre o problema de substituição de linhas de cache em microprocessadores. Inspirado no conceito de Entropia da Informação proposto em 1948 por Claude E. Shannon, este trabalho propõe uma nova heurística de substituição de linhas de cache. Seu objetivo é capturar e explorar melhor a localidade de referência dos programas e diminuir a taxa de miss rate durante a execução dos programas. O algoritmo proposto, Entropy, utiliza a heurística de entropia da informação para estimar as chances de uma linha ou bloco de cache ser referenciado após ter sido carregado na cache. Uma nova função de decaimento de entropia foi introduzida no algoritmo, otimizando seu funcionamento. Dentre os resultados obtidos, o Entropy conseguiu reduzir em até 50,41% o miss rate em relação ao algoritmo LRU. O trabalho propõe, ainda, uma implementação em hardware com complexidade e custo computacional comparáveis aos do algoritmo LRU. Para uma memória cache de segundo nível com 2-Mbytes e 8-way associative, a área adicional requerida é da ordem de 0,61% de bits adicionais. O algoritmo proposto foi simulado no SimpleScalar e comparado com o algoritmo LRU utilizando-se os benchmarks SPEC CPU2000. / This work presents a study about cache line replacement problem for microprocessors. Inspired in the Information Entropy concept stated by Claude E. Shannon in 1948, this work proposes a novel heuristic to replace cache lines in microprocessors. The major goal is to capture the referential locality of programs and to reduce the miss rate for cache access during programs execution. The proposed algorithm, Entropy, employs that new entropy heuristic to estimate the chances of a cache line to be referenced after it has been loaded into cache. A novel decay function has been introduced to optimize its operation. Results show that Entropy could reduce miss rate up to 50.41% in comparison to LRU. This work also proposes a hardware implementation which keeps computation and complexity costs comparable to the most employed algorithm, LRU. To a 2-Mbytes and 8-way associative cache memory, the required storage area is 0.61% of the cache size. The Entropy algorithm was simulated using SimpleScalar ISA simulator and compared to LRU using SPEC CPU2000 benchmark programs. Cache line Engenharia de computadores Entropia da informação Information entropy Linha de cache Localidade Locality LRU LRU Processador Processor SimpleScalar SimpleScalar
385	Stratégies de Cache basées sur la popularité pour Content Centric Networking / Popularity-Based Caching Strategies for Content Centric Networking Bernardini, César 05 May 2015 (has links) Content Centric Networking (CCN) est une architecture pour l'Internet du futur. CCN inclut des fonctionnalités de cache dans tous les noeuds du réseau. Son efficacité dépend largement de la performance de ses stratégies de cache. C'est pour cela que plusieurs études proposent des nouvelles stratégies de cache pour améliorer la performance d'un réseau CCN. Cependant parmi toutes ces stratégies, ce n'est pas évident de décider laquelle fonctionne le mieux. Il manque un environnement commun pour comparer ces stratégies. De plus, il n'est pas certain que ces approches soient les meilleures alternatives pour améliorer la performance du réseau. Dans cette thèse, on vise le problème de choisir les meilleures stratégies de caches pour CCN et les contributions sont les suivantes. On construit un environnement commun d'évaluation dans lequel on compare via simulation les stratégies de caches disponibles: Leave Copy Everywhere (LCE), Leave Copy Down (LCD), ProbCache, Cache "Less For More" et MAGIC. On analyse la performance de toutes ces stratégies et on décide la meilleure stratègie de cache pour chaque scénario. Ensuite, on propose deux stratégies de cache basées sur la popularité pour CCN. On commence avec un étude de la popularité de contenu et on présent la stratégie Most Popular Caching (MPC). MPC privilèges la distribution de contenu populaire dans les caches afin d'ameliorer les autres stratégies de cache. Dans une deuxième étape, on présent une stratègie de cache basé dans l'information des réseaux sociaux: Socially-Aware Caching Strategy (SACS). SACS privilèges la distribution de contenu publié par les utilisateurs les plus importantes / Content Centric Networking (CCN) is a new architecture for a future Internet. CCN includes in-network caching capabilities at every node. Its effciency depends drastically on performances of caching strategies. A lot of studies proposing new caching strategies to improve the performances of CCN. However, among all these strategies, it is still unclear which one performs better as there is a lack of common environment to compare these strategies. In this thesis, we address the challenge of selecting the best caching strategies for CCN. The contribution of this thesis are the following. We build a common evaluation scenario and we compare via simulation the state of the art caching strategies: Leave Copy Everywhere (LCE), Leave Copy Down (LCD), ProbCache, Cache "Less" For More and MAGIC. We analyze the performance of all the strategies in terms of Cache Hit, Stretch, Diversity and Complexity, and determine the cache strategy that fits the best with every scenario. Later on, we propose two novel caching strategies for CCN based on popularity. First, we study popularity of content and we present Most Popular Caching (MPC) strategy. MPC privileges distribution of popular caches into the caches and thus, it overcomes other caching strategies. Second, we present an alternative caching strategy based on social networks: Socially-Aware Caching Strategy (SACS). SACS privileges distribution of content published by influential users into the network. Both caching strategies overcome state of the art mechanisms and, to the best of our knowledge, we are the first to use social information to build caching strategies CCN ICN Caching Stratégies Cache Popularité Content Centric Information Réseau social OSN CCN ICN Caching Strategies Cache Popularity Content Centric Information Social networks OSN 004.5 004.6
386	Deterministic Object Management in Large Distributed Systems Mikhailov, Mikhail 05 March 2003 (has links) Caching is a widely used technique to improve the scalability of distributed systems. A central issue with caching is maintaining object replicas consistent with their master copies. Large distributed systems, such as the Web, typically deploy heuristic-based consistency mechanisms, which increase delay and place extra load on the servers, while not providing guarantees that cached copies served to clients are up-to-date. Server-driven invalidation has been proposed as an approach to strong cache consistency, but it requires servers to keep track of which objects are cached by which clients. We propose an alternative approach to strong cache consistency, called MONARCH, which does not require servers to maintain per-client state. Our approach builds on a few key observations. Large and popular sites, which attract the majority of the traffic, construct their pages from distinct components with various characteristics. Components may have different content types, change characteristics, and semantics. These components are merged together to produce a monolithic page, and the information about their uniqueness is lost. In our view, pages should serve as containers holding distinct objects with heterogeneous type and change characteristics while preserving the boundaries between these objects. Servers compile object characteristics and information about relationships between containers and embedded objects into explicit object management commands. Servers piggyback these commands onto existing request/response traffic so that client caches can use these commands to make object management decisions. The use of explicit content control commands is a deterministic, rather than heuristic, object management mechanism that gives content providers more control over their content. The deterministic object management with strong cache consistency offered by MONARCH allows content providers to make more of their content cacheable. Furthermore, MONARCH enables content providers to expose internal structure of their pages to clients. We evaluated MONARCH using simulations with content collected from real Web sites. The results show that MONARCH provides strong cache consistency for all objects, even for unpredictably changing ones, and incurs smaller byte and message overhead than heuristic policies. The results also show that as the request arrival rate or the number of clients increases, the amount of server state maintained by MONARCH remains the same while the amount of server state incurred by server invalidation mechanisms grows. Change Characteristics Web Caching Object Relationships Distributed Object Management Server Invalidation Object Composition Cache Consistency Cache memory Electronic data processing Distributed processing
387	On models for performance evaluation and cache resources placement in multi-cache networks / Sur des modèles pour l'évaluation de performance et le placement des ressources de cache dans les réseaux multi-cache Ben Ammar, Hamza 19 March 2019 (has links) Au cours des dernières années, les fournisseurs de contenu ont connu une forte augmentation des demandes de contenus vidéo et de services riches en média. Compte tenu des limites de la mise à l'échelle du réseau et au-delà des réseaux de diffusion de contenu, les fournisseurs de services Internet développent leurs propres systèmes de mise en cache afin d'améliorer la performance du réseau. Ces facteurs expliquent l'enthousiasme à l'égard du concept de réseau centré sur le contenu et de sa fonction de mise en cache en réseau. La quantification analytique de la performance de la mise en cache n'est toutefois pas suffisamment explorée dans la littérature. De plus, la mise en place d'un système de caching efficace au sein d'une infrastructure réseau est très complexe et demeure une problématique ouverte. Pour traiter ces questions, nous présentons d'abord dans cette thèse un modèle générique et précis de cache nommé MACS (Markov chain-based Approximation of Caching Systems) qui peut être adapté très facilement pour représenter différents schémas de mise en cache et qui peut être utilisé pour calculer différentes mesures de performance des réseaux multi-cache. Nous avons ensuite abordé le problème de l'allocation des ressources de cache dans les réseaux avec capacité de caching. Moyennant notre outil analytique MACS, nous présentons une approche permettant de résoudre le compromis entre différentes mesures de performance en utilisant l'optimisation multi-objectif et nous proposons une adaptation de la métaheuristique GRASP pour résoudre le problème d'optimisation. / In the last few years, Content Providers (CPs) have experienced a high increase in requests for video contents and rich media services. In view of the network scaling limitations and beyond Content Delivery Networks (CDNs), Internet Service Providers (ISPs) are developing their own caching systems in order to improve the network performance. These factors explain the enthusiasm around the Content-Centric Networking (CCN) concept and its in-network caching feature. The analytical quantification of caching performance is, however, not sufficiently explored in the literature. Moreover, setting up an efficient caching system within a network infrastructure is very complex and remains an open problem. To address these issues, we provide first in this thesis a fairly generic and accurate model of caching nodes named MACS (Markov chain-based Approximation of Caching Systems) that can be adapted very easily to represent different caching schemes and which can be used to compute different performance metrics of multi-cache networks. We tackled after that the problem of cache resources allocation in cache-enabled networks. By means of our analytical tool MACS, we present an approach that solves the trade-off between different performance metrics using multi-objective optimization and we propose an adaptation of the metaheuristic GRASP to solve the optimization problem. Réseau centré sur le contenu Mise en cache Chaînes de Markov Allocation de caches Optimisation multi-Objectif GRASP Content-Centric Networking Caching Markov chains Cache allocation Multi-Objective optimization GRASP
388	Managing the memory hierarchy in GPUs Dublish, Saumay Kumar January 2018 (has links) Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU architectures to address the needs of upcoming application domains. One such vital improvement is the introduction of the on-chip cache hierarchy, used primarily to filter the high bandwidth demand to the off-chip memory. However, in contrast to traditional CPUs, the cache hierarchy in GPUs is presented with significantly different challenges such as cache thrashing and bandwidth bottlenecks, arising due to small caches and high levels of memory traffic. These challenges lead to severe congestion across the memory hierarchy, resulting in high memory access latencies. In memory-intensive applications, such high memory access latencies often get exposed and can no longer be hidden through multithreading, and therefore adversely impact system performance. In this thesis, we address the inefficiencies across the memory hierarchy in GPUs that lead to such high levels of congestion. We identify three major factors contributing to poor memory system performance: first, disproportionate and insufficient bandwidth resources in the cache hierarchy; second, poor cache management policies; and third, high levels of multithreading. In order to revitalize the memory hierarchy by addressing the above limitations, we propose a three-pronged approach. First, we characterize the bandwidth bottlenecks present across the memory hierarchy in GPUs and identify the architectural parameters that are most critical in alleviating congestion. Subsequently, we explore the architectural design space to mitigate the bandwidth bottlenecks in a cost-effective manner. Second, we identify significant inter-core reuse in GPUs, presenting an opportunity to reuse data among the L1s. We exploit this reuse by connecting the L1 caches with a lightweight ring network to facilitate inter-core communication of shared data. We show that this technique reduces traffic to the L2 cache, freeing up the bandwidth for other accesses. Third, we present Poise, a machine learning approach to mitigate cache thrashing and bandwidth bottlenecks by altering the levels of multi-threading. Poise comprises a supervised learning model that is trained offline on a set of profiled kernels to make good warp scheduling decisions. Subsequently, a hardware inference engine is used to predict good warp scheduling decisions at runtime using the model learned during training. In summary, we address the problem of bandwidth bottlenecks across the memory hierarchy in GPUs by exploring how to best scale, supplement and utilize the existing bandwidth resources. These techniques provide an effective and comprehensive methodology to mitigate the bandwidth bottlenecks in the GPU memory hierarchy.
389	Hierarquia de memória configurável para redução energética no codificador de vídeo HEVC / Configurable memory hierarchy for energy reduction in HEVC video encoder Martins, Anderson da Silva 29 September 2017 (has links) Submitted by Aline Batista (alinehb.ufpel@gmail.com) on 2018-04-18T14:40:46Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Anderson_Martins.pdf: 8654389 bytes, checksum: f6e25bd57867fb8466bfe88dcf25afb3 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2018-04-19T14:42:52Z (GMT) No. of bitstreams: 2 Dissertacao_Anderson_Martins.pdf: 8654389 bytes, checksum: f6e25bd57867fb8466bfe88dcf25afb3 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-19T14:43:00Z (GMT). No. of bitstreams: 2 Dissertacao_Anderson_Martins.pdf: 8654389 bytes, checksum: f6e25bd57867fb8466bfe88dcf25afb3 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-09-29 / Sem bolsa / Dados recentes mostram que há uma demanda crescente de aplicações de vídeo em dispositivos móveis, sendo este um grande desafio para pesquisas em arquiteturas de codificadores de vídeo de alto desempenho como o padrão HEVC. Em um sistema embarcado o consumo de energia e o desempenho estão diretamente ligados ao sistema de memória. No codificador de vídeo não é diferente, e no HEVC a etapa de estimação de movimento (ME) é conhecida por ser responsável pela maior parte do tempo de processamento e acesso à memória. Portanto, este trabalho apresenta uma exploração do espaço de projeto para definir configurações de memória cache eficientes em energia para o processo da ME e, propor uma hierarquia de memória cache configurável, considerando diferentes sequências de vídeo e configurações do codificador HEVC. A avaliação considerou o algoritmo TZ Search, amplamente utilizado, 23 sequências de vídeo com resoluções distintas e quatro Parâmetros de Quantização (QPs) sob 32 configurações de cache diferentes. Um simulador de cache foi desenvolvido e a ferramenta CACTI foi utilizada para obter parâmetros de tempo e energia. Assim, foi possível identificar configurações de cache ótimas para cada cenário, visto que não existe uma única configuração de memória cache que satisfaça todos os cenários ao mesmo tempo quando o objetivo é redução de energia. Considerando a configuração ótima de cache para cada cenário, o uso de cache pode levar a uma economia de largura de banda da memória externa de até 97,37%, que corresponde a uma redução de 25,48GB/s para 548,53MB/s em um caso. A redução de energia chega a 93,95%, o que corresponde, uma redução de energia de 5,02mJ para 0,30mJ, ao comparar diferentes configurações de cache. Estes resultados possibilitaram propor uma hierarquia de memória cache configurável para o processo de estimação de movimento que é capaz de atender eficientemente todos os cenários testados. Para a arquitetura configurável proposta foram encontradas economia de energia de até 78,09% quando as configurações ótimas são comparadas com o pior caso dentro da cache configurável (16KB-8). Já quando comparada com Level-C, foram alcançadas economia de energia de até 86,91%. Além disso, a economia de largura de banda alcançada ficou entre 90,21% e 96,84% com uma média de 94,97%. / Recent data show that there is a growing demand for video applications on mobile devices, which is a major challenge for research into high performance video encoder architectures such as the HEVC standard. In an embedded system, power consumption and performance are directly connected to the memory system. In the video encoder it is no different, and in the HEVC the motion estimation (ME) step is known to be responsible for most of the processing time and memory access. Therefore, this work presents an exploration of the design space to define energy-efficient cache memory configurations for the ME process and propose a configurable cache memory hierarchy considering different video sequences and HEVC encoder configurations. The evaluation considered the widely used TZ Search algorithm, 23 video sequences with distinct resolutions, and four Quantization Parameters (QPs) under 32 different cache configurations. A cache simulator was developed and the CACTI tool was used to obtain time and energy parameters. Thus, it was possible to identify optimal cache configurations for each scenario, since there is no single cache configuration that satisfies all scenarios at the same time when the goal is to reduce power. Considering the optimal cache configuration for each scenario, cache usage can lead to external memory bandwidth savings of up to 97.37%, which corresponds to a reduction of 25.48GB/s to 548.53MB/s in one case. The energy reduction comes to 93.95%, which corresponds to an energy reduction of 5.02mJ to 0.30mJ when comparing different cache configurations. These results have made it possible to propose a configurable cache memory hierarchy for motion estimation process that is capable of efficiently satisfying all scenarios tested. For the proposed configurable architecture, energy savings of up to 78.09% were found when the optimal configurations were compared to the worst case within the configurable cache (16KB-8). When compared to Level-C, energy savings of up to 86.91% were achieved. In addition, the external memory bandwidth savings achieved was between 90.21% and 96.84% with an average of 94.97%. Memória cache Economia de energia HEVC Estimação de movimento Cache memory Energy saving Memory bandwidth reduction Motion estimation
390	Entropy: algoritmo de substituição de linhas de cache inspirado na entropia da informação. / Entropy: cache line replacement algorithm inspired in information entropy. Jorge Mamoru Kobayashi 07 June 2010 (has links) Este trabalho apresenta um estudo sobre o problema de substituição de linhas de cache em microprocessadores. Inspirado no conceito de Entropia da Informação proposto em 1948 por Claude E. Shannon, este trabalho propõe uma nova heurística de substituição de linhas de cache. Seu objetivo é capturar e explorar melhor a localidade de referência dos programas e diminuir a taxa de miss rate durante a execução dos programas. O algoritmo proposto, Entropy, utiliza a heurística de entropia da informação para estimar as chances de uma linha ou bloco de cache ser referenciado após ter sido carregado na cache. Uma nova função de decaimento de entropia foi introduzida no algoritmo, otimizando seu funcionamento. Dentre os resultados obtidos, o Entropy conseguiu reduzir em até 50,41% o miss rate em relação ao algoritmo LRU. O trabalho propõe, ainda, uma implementação em hardware com complexidade e custo computacional comparáveis aos do algoritmo LRU. Para uma memória cache de segundo nível com 2-Mbytes e 8-way associative, a área adicional requerida é da ordem de 0,61% de bits adicionais. O algoritmo proposto foi simulado no SimpleScalar e comparado com o algoritmo LRU utilizando-se os benchmarks SPEC CPU2000. / This work presents a study about cache line replacement problem for microprocessors. Inspired in the Information Entropy concept stated by Claude E. Shannon in 1948, this work proposes a novel heuristic to replace cache lines in microprocessors. The major goal is to capture the referential locality of programs and to reduce the miss rate for cache access during programs execution. The proposed algorithm, Entropy, employs that new entropy heuristic to estimate the chances of a cache line to be referenced after it has been loaded into cache. A novel decay function has been introduced to optimize its operation. Results show that Entropy could reduce miss rate up to 50.41% in comparison to LRU. This work also proposes a hardware implementation which keeps computation and complexity costs comparable to the most employed algorithm, LRU. To a 2-Mbytes and 8-way associative cache memory, the required storage area is 0.61% of the cache size. The Entropy algorithm was simulated using SimpleScalar ISA simulator and compared to LRU using SPEC CPU2000 benchmark programs. Engenharia de computadores Entropia da informação Linha de cache Localidade LRU Processador SimpleScalar Cache line Information entropy Locality LRU Processor SimpleScalar

Search results