Global ETD Search

381	Instruction Timing Analysis for Linux/x86-based Embedded and Desktop Systems John, Tobias 19 October 2005 (has links) Real-time aspects are becoming more important in standard desktop PC environments and x86 based processors are being utilized in embedded systems more often. While these processors were not created for use in hard real time systems, they are fast and inexpensive and can be used if it is possible to determine the worst case execution time. Information on CPU caches (L1, L2) and branch prediction architecture is necessary to simulate best and worst cases in execution timing, but is often not detailed enough and sometimes not published at all. This document describes how the underlying hardware can be analysed to obtain this information. branch prediction cache architecture info:eu-repo/classification/ddc/004 ddc:004 Analyse Benchmark Cache-Speicher LINUX Timing
382	Návrh a implementace prostředků pro zvýšení výkonu procesoru / Design and Implementation of Mechanisms for Enhancing Performance of CPU Zlatohlávková, Lucie January 2007 (has links) This masters thesis is focused on the issue of processor architecture. The ground of this project is a design of a simple processor, which is enriched by modern components in processor architecture such as pipelining, cache memory and branch prediction. The processor has been made in VHDL programming language and was simulated in ModelSim simulation tool.
383	A Dual-Port Data Cache with Pseudo-Direct Mapping Function Gade, Arul Sandeep 07 May 2005 (has links) Conventional on-chip (L1) data caches such as Direct-Mapped (DM) and 2-way Set-Associative Caches (SAC) have been widely used for high-performance uni (or multi)-processors. Unfortunately, these schemes suffer from high conflict misses since more than one address is mapped onto the same cache line. To reduce the conflict misses, much research has been done in developing different cache architectures such as 2-way Skewed-Associative cache (Skew cache). The 2-way Skew cache has a hardware complexity equivalent to that of 2-way SAC and has a miss-rate approaching that of 4-way SAC. However, the reduction in the miss-rate using a Skew cache is limited by the confined space available to disperse the conflicting accesses over small memory banks. This research proposes a dual-port data cache called Pseudo-Direct Cache (PDC) to minimize the conflict misses by dispersing addresses effectively over a single memory bank. Our simulation results show that PDC reduces those misses significantly compared to any conventional L1 caches and also achieves 10-15% lesser miss-rates than a 2-way Skew cache. SimpleScalar simulator is used for these simulations with SPEC95FP benchmark programs. Similar results were also seen over SPEC2000FP benchmark programs. Simulations over CACTI 3.0 were performed to evaluate the hardware implications of PDC over Skew cache. The simulation results show that the PDC has a simple hardware complexity similar to 2-way SAC and has 4-15% better AMAT compared to 2-way Skew cache. The PDC also reduces execution cycles significantly. Writeback rates Xor mapping functions Skewed mapping Execution time AMAT Energy consumption Data cache Miss rates Pseudo direct cache
384	The Multi-tiered Future of Storage: Understanding Cost and Performance Trade-offs in Modern Storage Systems Iqbal, Muhammad Safdar 19 September 2017 (has links) In the last decade, the landscape of storage hardware and software has changed considerably. Storage hardware has diversified from hard disk drives and solid state drives to include persistent memory (PMEM) devices such as phase change memory (PCM) and Flash-backed DRAM. On the software side, the increasing adoption of cloud services for building and deploying consumer and enterprise applications is driving the use of cloud storage services. Cloud providers have responded by providing a plethora of choices of storage services, each of which have unique performance characteristics and pricing. We argue this variety represents an opportunity for modern storage systems, and it can be leveraged to improve operational costs of the systems. We propose that storage tiering is an effective technique for balancing operational or de- ployment costs and performance in such modern storage systems. We demonstrate this via three key techniques. First, THMCache, which leverages tiering to conserve the lifetime of PMEM devices, hence saving hardware upgrade costs. Second, CAST, which leverages tiering between multiple types of cloud storage to deliver higher utility (i.e. performance per unit of cost) for cloud tenants. Third, we propose a dynamic pricing scheme for cloud storage services, which leverages tiering to increase the cloud provider's profit or offset their management costs. / Master of Science / Storage and retrival of data is one of the key functions of any computer system. Improvements in hardware and software related to data storage can help computer users store (a) store the data faster, which makes for overall faster performance; and (b) increase the storage capacity, which helps store the increasing amount of data generated by modern computer users. Typically, most computers are equipped with either a hard disk drive (HDD) or, the newer and faster, solid state drive (SSD) for data storage. In the last decade however, the landscape of data storage hardware and software has advanced considerably. On the hardware side, several hardware makers are introducing persistent memory (PMEM) devices, which provide very high speed, high capacity storage at reasonable price points. On the software side, the increasing adoption of cloud services by software developers that are building and operating consumer and enterprise applications is driving the use of cloud storage services. These services allow the developers to store a large amount of data without having to manage any physical hardware, paying for the service on a usage-based pricing structure. However, every application’s speed and capacity needs are not the same; hence, cloud service providers have responded by providing a plethora of choices of storage services, each of which have unique performance characteristics and pricing. We argue this variety represents an opportunity for modern storage systems, and it can be leveraged to improve the operating costs of the systems. Storage tiering is a classical technique that involves partitioning the stored data and placing each partition in a different storage device. This lets the applications use mulitple devices at once, taking advantage of each’s sterngths and mitigating their weaknesses. We propose that storage tiering is a relevant and effective technique for balancing operational or deployment costs and performance in modern storage systems such as PMEM devices and cloud storage services. We demonstrate this via three key techniques. First, THMCACHE, which leverages tiering between multiple types of storage hardware to conserve the lifetime of PMEM devices, hence saving hardware upgrade costs. Second, CAST, which leverages tiering between multiple types of cloud storage services to deliver higher utility (i.e. performance per unit of cost) for software developers using these services. Third, we propose a dynamic pricing scheme for cloud storage services, which leverages tiering between multiple cloud storage services to increase the cloud service provider’s profit or offset their management costs. multi-tier storage persistent memory cloud storage in-memory cache multi-level cache MapReduce analytics workflows pricing games dynamic pricing
385	A Study of Mitigation Methods for Speculative Cache Side Channel Attacks Mosquera Ferrandiz, Fernando 05 1900 (has links) Side channels give attackers the opportunity to reveal private information without accessing it directly. In this study, several novel approaches are presented to mitigate cache side channel attacks including Spectre attack and its variants, resulting in several contributions. CHASM shows the information leakage in several new cache mapping schemes, where different cache address mappings may provide higher or lower protection against cache side channel attacks. GuardCache creates a noisy cache side-channel, making it more difficult for the attacker to determine if an access is a hit or miss (which is the basis for most side channel attacks). SecurityCloak is a framework that encompasses GuardCache with SafeLoadOnMiss whereby cache load misses during speculative execution are delayed until the speculation is resolved, thus preventing attacks that rely on accessing data in during (mis) speculated executions. To search for a compromise between security and performance, it is recommended not always to use protections such as SecurityCloak protections, but also to activate the protection only while executing critical sections of code or on-demand when an attack is detected (or suspected). Our experimental results show a high degree of obfuscation (and prevention of side channels) with a minimal impact on the performance. Side-channel attacks Speculative execution Victim Cache Guard Cache Flush & Reload Prime & Probe Computer Science
386	Entropy: algoritmo de substituição de linhas de cache inspirado na entropia da informação. / Entropy: cache line replacement algorithm inspired in information entropy. Kobayashi, Jorge Mamoru 07 June 2010 (has links) Este trabalho apresenta um estudo sobre o problema de substituição de linhas de cache em microprocessadores. Inspirado no conceito de Entropia da Informação proposto em 1948 por Claude E. Shannon, este trabalho propõe uma nova heurística de substituição de linhas de cache. Seu objetivo é capturar e explorar melhor a localidade de referência dos programas e diminuir a taxa de miss rate durante a execução dos programas. O algoritmo proposto, Entropy, utiliza a heurística de entropia da informação para estimar as chances de uma linha ou bloco de cache ser referenciado após ter sido carregado na cache. Uma nova função de decaimento de entropia foi introduzida no algoritmo, otimizando seu funcionamento. Dentre os resultados obtidos, o Entropy conseguiu reduzir em até 50,41% o miss rate em relação ao algoritmo LRU. O trabalho propõe, ainda, uma implementação em hardware com complexidade e custo computacional comparáveis aos do algoritmo LRU. Para uma memória cache de segundo nível com 2-Mbytes e 8-way associative, a área adicional requerida é da ordem de 0,61% de bits adicionais. O algoritmo proposto foi simulado no SimpleScalar e comparado com o algoritmo LRU utilizando-se os benchmarks SPEC CPU2000. / This work presents a study about cache line replacement problem for microprocessors. Inspired in the Information Entropy concept stated by Claude E. Shannon in 1948, this work proposes a novel heuristic to replace cache lines in microprocessors. The major goal is to capture the referential locality of programs and to reduce the miss rate for cache access during programs execution. The proposed algorithm, Entropy, employs that new entropy heuristic to estimate the chances of a cache line to be referenced after it has been loaded into cache. A novel decay function has been introduced to optimize its operation. Results show that Entropy could reduce miss rate up to 50.41% in comparison to LRU. This work also proposes a hardware implementation which keeps computation and complexity costs comparable to the most employed algorithm, LRU. To a 2-Mbytes and 8-way associative cache memory, the required storage area is 0.61% of the cache size. The Entropy algorithm was simulated using SimpleScalar ISA simulator and compared to LRU using SPEC CPU2000 benchmark programs. Cache line Engenharia de computadores Entropia da informação Information entropy Linha de cache Localidade Locality LRU LRU Processador Processor SimpleScalar SimpleScalar
387	Stratégies de Cache basées sur la popularité pour Content Centric Networking / Popularity-Based Caching Strategies for Content Centric Networking Bernardini, César 05 May 2015 (has links) Content Centric Networking (CCN) est une architecture pour l'Internet du futur. CCN inclut des fonctionnalités de cache dans tous les noeuds du réseau. Son efficacité dépend largement de la performance de ses stratégies de cache. C'est pour cela que plusieurs études proposent des nouvelles stratégies de cache pour améliorer la performance d'un réseau CCN. Cependant parmi toutes ces stratégies, ce n'est pas évident de décider laquelle fonctionne le mieux. Il manque un environnement commun pour comparer ces stratégies. De plus, il n'est pas certain que ces approches soient les meilleures alternatives pour améliorer la performance du réseau. Dans cette thèse, on vise le problème de choisir les meilleures stratégies de caches pour CCN et les contributions sont les suivantes. On construit un environnement commun d'évaluation dans lequel on compare via simulation les stratégies de caches disponibles: Leave Copy Everywhere (LCE), Leave Copy Down (LCD), ProbCache, Cache "Less For More" et MAGIC. On analyse la performance de toutes ces stratégies et on décide la meilleure stratègie de cache pour chaque scénario. Ensuite, on propose deux stratégies de cache basées sur la popularité pour CCN. On commence avec un étude de la popularité de contenu et on présent la stratégie Most Popular Caching (MPC). MPC privilèges la distribution de contenu populaire dans les caches afin d'ameliorer les autres stratégies de cache. Dans une deuxième étape, on présent une stratègie de cache basé dans l'information des réseaux sociaux: Socially-Aware Caching Strategy (SACS). SACS privilèges la distribution de contenu publié par les utilisateurs les plus importantes / Content Centric Networking (CCN) is a new architecture for a future Internet. CCN includes in-network caching capabilities at every node. Its effciency depends drastically on performances of caching strategies. A lot of studies proposing new caching strategies to improve the performances of CCN. However, among all these strategies, it is still unclear which one performs better as there is a lack of common environment to compare these strategies. In this thesis, we address the challenge of selecting the best caching strategies for CCN. The contribution of this thesis are the following. We build a common evaluation scenario and we compare via simulation the state of the art caching strategies: Leave Copy Everywhere (LCE), Leave Copy Down (LCD), ProbCache, Cache "Less" For More and MAGIC. We analyze the performance of all the strategies in terms of Cache Hit, Stretch, Diversity and Complexity, and determine the cache strategy that fits the best with every scenario. Later on, we propose two novel caching strategies for CCN based on popularity. First, we study popularity of content and we present Most Popular Caching (MPC) strategy. MPC privileges distribution of popular caches into the caches and thus, it overcomes other caching strategies. Second, we present an alternative caching strategy based on social networks: Socially-Aware Caching Strategy (SACS). SACS privileges distribution of content published by influential users into the network. Both caching strategies overcome state of the art mechanisms and, to the best of our knowledge, we are the first to use social information to build caching strategies CCN ICN Caching Stratégies Cache Popularité Content Centric Information Réseau social OSN CCN ICN Caching Strategies Cache Popularity Content Centric Information Social networks OSN 004.5 004.6
388	Deterministic Object Management in Large Distributed Systems Mikhailov, Mikhail 05 March 2003 (has links) Caching is a widely used technique to improve the scalability of distributed systems. A central issue with caching is maintaining object replicas consistent with their master copies. Large distributed systems, such as the Web, typically deploy heuristic-based consistency mechanisms, which increase delay and place extra load on the servers, while not providing guarantees that cached copies served to clients are up-to-date. Server-driven invalidation has been proposed as an approach to strong cache consistency, but it requires servers to keep track of which objects are cached by which clients. We propose an alternative approach to strong cache consistency, called MONARCH, which does not require servers to maintain per-client state. Our approach builds on a few key observations. Large and popular sites, which attract the majority of the traffic, construct their pages from distinct components with various characteristics. Components may have different content types, change characteristics, and semantics. These components are merged together to produce a monolithic page, and the information about their uniqueness is lost. In our view, pages should serve as containers holding distinct objects with heterogeneous type and change characteristics while preserving the boundaries between these objects. Servers compile object characteristics and information about relationships between containers and embedded objects into explicit object management commands. Servers piggyback these commands onto existing request/response traffic so that client caches can use these commands to make object management decisions. The use of explicit content control commands is a deterministic, rather than heuristic, object management mechanism that gives content providers more control over their content. The deterministic object management with strong cache consistency offered by MONARCH allows content providers to make more of their content cacheable. Furthermore, MONARCH enables content providers to expose internal structure of their pages to clients. We evaluated MONARCH using simulations with content collected from real Web sites. The results show that MONARCH provides strong cache consistency for all objects, even for unpredictably changing ones, and incurs smaller byte and message overhead than heuristic policies. The results also show that as the request arrival rate or the number of clients increases, the amount of server state maintained by MONARCH remains the same while the amount of server state incurred by server invalidation mechanisms grows. Change Characteristics Web Caching Object Relationships Distributed Object Management Server Invalidation Object Composition Cache Consistency Cache memory Electronic data processing Distributed processing
389	On models for performance evaluation and cache resources placement in multi-cache networks / Sur des modèles pour l'évaluation de performance et le placement des ressources de cache dans les réseaux multi-cache Ben Ammar, Hamza 19 March 2019 (has links) Au cours des dernières années, les fournisseurs de contenu ont connu une forte augmentation des demandes de contenus vidéo et de services riches en média. Compte tenu des limites de la mise à l'échelle du réseau et au-delà des réseaux de diffusion de contenu, les fournisseurs de services Internet développent leurs propres systèmes de mise en cache afin d'améliorer la performance du réseau. Ces facteurs expliquent l'enthousiasme à l'égard du concept de réseau centré sur le contenu et de sa fonction de mise en cache en réseau. La quantification analytique de la performance de la mise en cache n'est toutefois pas suffisamment explorée dans la littérature. De plus, la mise en place d'un système de caching efficace au sein d'une infrastructure réseau est très complexe et demeure une problématique ouverte. Pour traiter ces questions, nous présentons d'abord dans cette thèse un modèle générique et précis de cache nommé MACS (Markov chain-based Approximation of Caching Systems) qui peut être adapté très facilement pour représenter différents schémas de mise en cache et qui peut être utilisé pour calculer différentes mesures de performance des réseaux multi-cache. Nous avons ensuite abordé le problème de l'allocation des ressources de cache dans les réseaux avec capacité de caching. Moyennant notre outil analytique MACS, nous présentons une approche permettant de résoudre le compromis entre différentes mesures de performance en utilisant l'optimisation multi-objectif et nous proposons une adaptation de la métaheuristique GRASP pour résoudre le problème d'optimisation. / In the last few years, Content Providers (CPs) have experienced a high increase in requests for video contents and rich media services. In view of the network scaling limitations and beyond Content Delivery Networks (CDNs), Internet Service Providers (ISPs) are developing their own caching systems in order to improve the network performance. These factors explain the enthusiasm around the Content-Centric Networking (CCN) concept and its in-network caching feature. The analytical quantification of caching performance is, however, not sufficiently explored in the literature. Moreover, setting up an efficient caching system within a network infrastructure is very complex and remains an open problem. To address these issues, we provide first in this thesis a fairly generic and accurate model of caching nodes named MACS (Markov chain-based Approximation of Caching Systems) that can be adapted very easily to represent different caching schemes and which can be used to compute different performance metrics of multi-cache networks. We tackled after that the problem of cache resources allocation in cache-enabled networks. By means of our analytical tool MACS, we present an approach that solves the trade-off between different performance metrics using multi-objective optimization and we propose an adaptation of the metaheuristic GRASP to solve the optimization problem. Réseau centré sur le contenu Mise en cache Chaînes de Markov Allocation de caches Optimisation multi-Objectif GRASP Content-Centric Networking Caching Markov chains Cache allocation Multi-Objective optimization GRASP
390	Managing the memory hierarchy in GPUs Dublish, Saumay Kumar January 2018 (has links) Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU architectures to address the needs of upcoming application domains. One such vital improvement is the introduction of the on-chip cache hierarchy, used primarily to filter the high bandwidth demand to the off-chip memory. However, in contrast to traditional CPUs, the cache hierarchy in GPUs is presented with significantly different challenges such as cache thrashing and bandwidth bottlenecks, arising due to small caches and high levels of memory traffic. These challenges lead to severe congestion across the memory hierarchy, resulting in high memory access latencies. In memory-intensive applications, such high memory access latencies often get exposed and can no longer be hidden through multithreading, and therefore adversely impact system performance. In this thesis, we address the inefficiencies across the memory hierarchy in GPUs that lead to such high levels of congestion. We identify three major factors contributing to poor memory system performance: first, disproportionate and insufficient bandwidth resources in the cache hierarchy; second, poor cache management policies; and third, high levels of multithreading. In order to revitalize the memory hierarchy by addressing the above limitations, we propose a three-pronged approach. First, we characterize the bandwidth bottlenecks present across the memory hierarchy in GPUs and identify the architectural parameters that are most critical in alleviating congestion. Subsequently, we explore the architectural design space to mitigate the bandwidth bottlenecks in a cost-effective manner. Second, we identify significant inter-core reuse in GPUs, presenting an opportunity to reuse data among the L1s. We exploit this reuse by connecting the L1 caches with a lightweight ring network to facilitate inter-core communication of shared data. We show that this technique reduces traffic to the L2 cache, freeing up the bandwidth for other accesses. Third, we present Poise, a machine learning approach to mitigate cache thrashing and bandwidth bottlenecks by altering the levels of multi-threading. Poise comprises a supervised learning model that is trained offline on a set of profiled kernels to make good warp scheduling decisions. Subsequently, a hardware inference engine is used to predict good warp scheduling decisions at runtime using the model learned during training. In summary, we address the problem of bandwidth bottlenecks across the memory hierarchy in GPUs by exploring how to best scale, supplement and utilize the existing bandwidth resources. These techniques provide an effective and comprehensive methodology to mitigate the bandwidth bottlenecks in the GPU memory hierarchy.

Search results