Global ETD Search

201	Sur des modèles pour l’évaluation de performance des caches dans un réseau cœur et de la consommation d’énergie dans un réseau d’accès sans-fil / On models for performance analysis of a core cache network and power save of a wireless access network Choungmo Fofack, Nicaise Éric 21 February 2014 (has links) Internet est un véritable écosystème. Il se développe, évolue et s’adapte aux besoins des utilisateurs en termes de communication, de connectivité et d’ubiquité. Dans la dernière décennie, les modèles de communication ont changé passant des interactions machine-à-machine à un modèle machine-à-contenu. Cependant, différentes technologies sans-fil et de réseaux (tels que les smartphones et les réseaux 3/4G, streaming en ligne des médias, les réseaux sociaux, réseaux-orientés contenus) sont apparues pour améliorer la distribution de l’information. Ce développement a mis en lumière les problèmes liés au passage à l’échelle et à l’efficacité énergétique; d’où la question: Comment concevoir ou optimiser de tels systèmes distribués qui garantissent un accès haut débit aux contenus tout en (i) réduisant la congestion et la consommation d’énergie dans le réseau et (ii) s’adaptant à la demande des utilisateurs dans un contexte connectivité quasi-permanente? Dans cette thèse, nous nous intéressons à deux solutions proposées pour répondre à cette question: le déploiement des réseaux de caches et l’implantation des protocoles économes en énergie. Précisément, nous proposons des modèles analytiques pour la conception de ces réseaux de stockage et la modélisation de la consommation d’énergie dans les réseaux d’accès sans fil. Nos études montrent que la prédiction de la performance des réseaux de caches réels peut être faite avec des erreurs relatives absolues de l’ordre de 1% à 5% et qu’une proportion importante soit 70% à 90% du coût de l’énergie dans les cellules peut être économisée au niveau des stations de base et des mobiles sous des conditions réelles de trafic. / Internet is a real ecosystem. It grows, evolves and adapts to the needs of users in terms of communication, connectivity and ubiquity of users. In the last decade, the communication paradigm has shifted from traditional host-to-host interactions to the recent host-to-content model; while various wireless and networking technologies (such as 3/4G smartphones and networks, online media streaming, social networks, clouds, Big-Data, information-centric networks) emerged to enhance content distribution. This development shed light on scalability and energy efficiency issues which can be formulated as follows. How can we design or optimize such large scale distributed systems in order to achieve and maintain high-speed access to contents while (i) reducing congestion and energy consumption in the network and (ii) adapting to the temporal locality of users demand in a continuous connectivity paradigm? In this thesis we focus on two solutions proposed to answer this question: In-network caching and Power save protocols for scalability and energy efficiency issues respectively. Precisely, we propose analytic models for designing core cache networks and modeling energy consumption in wireless access networks. Our studies show that the prediction of the performance of general core cache networks in real application cases can be done with absolute relative errors of order of 1%–5%; meanwhile, dramatic energy save can be achieved by mobile devices and base stations, e.g., as much as 70%–90% of the energy cost in cells with realistic traffic load and the considered parameter settings. Réseaux de caches Architecture de cache Étude de performance Économie d'énergie Politiques de caches Durée de vie Cache networks Cache architecture Performance evaluation Replacement policies Time-To-Live (TTL) LRU FIFO RND
202	Performance Modeling of Multi-core Systems : Caches and Locks Pan, Xiaoyue January 2016 (has links) Performance is an important aspect of computer systems since it directly affects user experience. One way to analyze and predict performance is via performance modeling. In recent years, multi-core systems have made processors more powerful while keeping power consumption relatively low. However the complicated design of these systems makes it difficult to analyze performance. This thesis presents performance modeling techniques for cache performance and synchronization cost on multi-core systems. A cache can be designed in many ways with different configuration parameters including cache size, associativity and replacement policy. Understanding cache performance under different configurations is useful to explore the design choices. We propose a general modeling framework for estimating the cache miss ratio under different cache configurations, based on the reuse distance distribution. On multi-core systems, each core usually has a private cache. Keeping shared data in private caches coherent has an extra cost. We propose three models to estimate this cost, based on information that can be gathered when running the program on a single core. Locks are widely used as a synchronization primitive in multi-threaded programs on multi-core systems. While they are often necessary for protecting shared data, they also introduce lock contention, which causes performance issues. We present a model to predict how much contention a lock has on multi-core systems, based on information obtainable from profiling a run on a single core. If lock contention is shown to be a performance bottleneck, one of the ways to mitigate it is to use another lock implementation. However, it is costly to investigate if adopting another lock implementation would reduce lock contention since it requires reimplementation and measurement. We present a model for forecasting lock contention with another lock implementation without replacing the current lock implementation. performance modeling performance analysis multi-core cache lock
203	The doubly-linked list protocol family for distributed shared memory multiprocessor systems 劉宗國, Lau, Chung-kwok, Albert. January 1996 (has links) published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy Cache memory. Multiprocessors.
204	Effective use of partitioned cache memories Page, Daniel Stephen January 2001 (has links) No description available. 621.39
205	On the Prevention of Cache-Based Side-Channel Attacks in a Cloud Environment Godfrey, Michael 26 September 2013 (has links) As Cloud services become more commonplace, recent works have uncovered vulnerabilities unique to such systems. Specifi cally, the paradigm promotes a risk of information leakage across virtual machine isolation via side-channels. Unlike conventional computing, the infrastructure supporting a Cloud environment allows mutually dis- trusting clients simultaneous access to the underlying hardware, a seldom met requirement for a side-channel attack. This thesis investigates the current state of side-channel vulnerabilities involving the CPU cache, and identifi es the shortcomings of traditional defenses in a Cloud environment. It explores why solutions to non-Cloud cache-based side-channels cease to work in Cloud environments, and describes new mitigation techniques applicable for Cloud security. Speci cally, it separates canonical cache-based side-channel attacks into two categories, Sequential and Parallel attacks, based on their implementation and devises a unique mitigation technique for each. Applying these solutions to a canonical Cloud environment, this thesis demonstrates the validity of these Cloud-specifi c, cache-based side-channel mitigation techniques. Furthermore, it shows that they can be implemented, together, as a server-side approach to improve security without inconveniencing the client. Finally, it conducts a comparison of our solutions to the current state-of-the-art. / Thesis (Master, Computing) -- Queen's University, 2013-09-25 18:03:47.737 CPU Cache Server Side Defense Cloud Computing Security Side Channel
206	A REUSED DISTANCE BASED ANALYSIS AND OPTIMIZATION FOR GPU CACHE Wang, Dongwei 01 January 2016 (has links) As a throughput-oriented device, Graphics Processing Unit(GPU) has already integrated with cache, which is similar to CPU cores. However, the applications in GPGPU computing exhibit distinct memory access patterns. Normally, the cache, in GPU cores, suffers from threads contention and resources over-utilization, whereas few detailed works excavate the root of this phenomenon. In this work, we adequately analyze the memory accesses from twenty benchmarks based on reuse distance theory and quantify their patterns. Additionally, we discuss the optimization suggestions, and implement a Bypassing Aware(BA) Cache which could intellectually bypass the thrashing-prone candidates. BA cache is a cost efficient cache design with two extra bits in each line, they are flags to make the bypassing decision and find the victim cache line. Experimental results show that BA cache can improve the system performance around 20\% and reduce the cache miss rate around 11\% compared with traditional design. Reuse distance GPU cache performance evaluation Computer and Systems Architecture
207	Improving Energy-Efficiency of Multicores using First-Order Modeling Spiliopoulos, Vasileios January 2016 (has links) In the recent decades, power consumption has evolved to one of the most critical resources in a computer system. In the form of electricity bill in data centers, battery life in mobile devices, or thermal constraints in desktops and laptops, power consumption imposes several limitations in today’s processors and improving power and energy efficiency is one of the most urgent research topics of Computer Architecture. Dynamic Voltage and Frequency Scaling (DVFS) and Cache Resizing are among the most popular energy saving techniques. Previous work, however, has focused on developing heuristics and trial-and-error methods that yield acceptable savings, but fail to provide insight and understanding of how these techniques affect power and performance of a computer system. In contrast, this Thesis proposes the use of first-order modeling to improve the energy efficiency of computer systems. A first-order model needs to be (i) accurate enough to efficiently drive DVFS and Cache Resizing decisions, and (ii) simple enough to eliminate the overhead of collecting the required inputs to the model. We show that such models can be constructed and successfully applied in modern systems. For DVFS, we propose to scale frequency down to exploit applications’ memory slack, i.e., periods that the processor spends waiting for data to be fetched from the main memory. In such cases, the processor frequency can be scaled down to save energy without inordinate performance penalty. Our DVFS models can detect slack and predict the impact of DVFS in both power and performance with great accuracy. Cache Resizing, on the other hand, relies on the fact that many applications do not benefit from the vast amount of cache that modern processors are equipped with. In such cases, the cache can be resized to save static energy consumption at limited performance cost. Since both techniques are related with the memory behavior of applications, we propose a unified model to manage the two techniques in tandem and maximize energy efficiency through synergistic DVFS and Cache Resizing. Finally, our experience with DVFS in real systems motivated us to contribute to the integration of DVFS into the gem5 simulator. Unlike other simulators that ignore the role of OS in DVFS, we extend the gem5 simulator by developing the hardware and software components that allow existing Linux DVFS infrastructure to be seamlessly integrated in the simulator. Computer Architecture DVFS Cache Resizing Interval modeling Power modeling
208	Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing Wu, Lan 04 December 2013 (has links) Real-time computing is not just fast computing but time-predictable computing. Many tasks in safety-critical embedded real-time systems have hard real-time characteristics. Failure to meet deadlines may result in the loss of life or in large damages. Known of Worst Case Execution Time (WCET) is important for reliability or correct functional behavior of the system. As multi-core processors are increasingly adopted in industry, it has become a great challenge to accurately bound the worst-case execution time (WCET) for real-time systems running on multi-core chips. This is particularly true because of the inter-thread interferences in accessing shared resources on multi-cores, such as shared L2 caches, which can significantly affect the performance but are very difficult to be estimate statically. We propose an approach to analyzing Worst Case Execution Time (WCET) for multi-core processors with shared L2 instruction caches by using a model checking based method. Our experiments indicate that compared to the static analysis technique based on extended ILP (Integer Linear Programming), our approach improves the tightness of WCET estimation more than 31.1% for the benchmarks we studied. However, due to the inherent complexity of multi-core timing analysis and the state explosion problem, the model checking based approach currently can only work with small real-time kernels for dual-core processors. At the same time, improving the average-case performance and energy efficiency has also been important for real-time systems. Recently, Hybrid SPM-Cache (HSC) architectures by combining caches and Scratch-Pad Memories (SPMs) have been increasingly used in commercial processors and research prototypes. Our research explores HSC architectures for real-time systems to reconcile time predictability, performance, and energy consumption. We study the energy dissipation of a number of hybrid on-chip memory architectures by combining both caches and Scratch-Pad Memories (SPM) without increasing the total on-chip memory size. Our experimental results indicate that with the equivalent total on-chip memory size, several hybrid SPM-Cache architectures are more energy-efficient than either pure software controlled SPMs or pure hardware-controlled caches. In particular, using the hybrid SPM-cache to store both instructions and data can achieve the best energy efficiency. However, the SPM allocation for the HSC architecture must be aware of the cache to harness the full potential of the HSC architecture. First, we propose and evaluate four SPM allocation strategies to reduce WCET for hybrid SPM-Caches with different complexities. These algorithms differ by whether or not they can cooperate with the cache or be aware of the WCET. Our evaluation shows that the cache aware and WCET-oriented SPM allocation can maximally reduce the WCET with minimum or even positive impact on the average-case execution time (ACET). Moreover, we explore four SPM allocation algorithms to maximize performance on the HSC architecture, including three heuristic-based algorithms, and an optimal algorithm based on model checking. Our experiments indicate that the Greedy Stack Distance based Allocation (GSDA) can run efficiently while achieving performance either the same as or close to the optimal results got by the Optimal Stack Distance based Allocation (OSDA). Last but not the least, we extend the two stack distance based allocation algorithms to GSDA-E and OSDA-E to minimize the energy consumption of the HSC architecture. Our experimental results show that the GSDA-E can also reduce the energy either the same as or close to the optimal results attained by the OSDA-E, while achieving performance close to the OSDA and GSDA. Hybrid SPM-Cache Performance Energy Real-time Engineering
209	On the simulation and design of manycore CMPs Thompson, Christopher Callum January 2015 (has links) The progression of Moore’s Law has resulted in both embedded and performance computing systems which use an ever increasing number of processing cores integrated in a single chip. Commercial systems are now available which provide hundreds of cores, and academics have proposed architectures for up to 1024 cores. Embedded multicores are increasingly popular as it is easier to guarantee hard-realtime constraints using individual cores dedicated for tasks, than to use traditional time-multiplexed processing. However, finding the optimal hardware configuration to meet these requirements at minimum cost requires extensive trial and error approaches to investigate the design space. This thesis tackles the problems encountered in the design of these large scale multicore systems by first addressing the problem of fast, detailed micro-architectural simulation. Initially addressing embedded systems, this work exploits the lack of hardware cache-coherence support in many deeply embedded systems to increase the available parallelism in the simulation. Then, through partitioning the NoC and using packet counting and cycle skipping reduces the amount of computation required to accurately model the NoC interconnect. In combination, this enables simulation speeds significantly higher than the state of the art, while maintaining less error, when compared to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS (Million (target) Instructions Per Second), or 110MHz, which is better than typical FPGA prototypes, and approaching final ASIC production speeds. This is achieved while maintaining an error of only 2.1%, significantly lower than other similar simulators. The thesis continues by scaling the simulator past large embedded systems up to 64-1024 core processors, adding support for coherent architectures using the same packet counting techniques along with low overhead context switching to enable the simulation of such large systems with stricter synchronisation requirements. The new interconnect model was partitioned to enable parallel simulation to further improve simulation speeds in a manner which did not sacrifice any accuracy. These innovations were leveraged to investigate significant novel energy saving optimisations to the coherency protocol, processor ISA, and processor micro-architecture. By introducing a new instruction, with the name wait-on-address, the energy spent during spin-wait style synchronisation events can be significantly reduced. This functions by putting the core into a low-power idle state while the cache line of the indicated address is monitored for coherency action. Upon an update or invalidation (or traditional timer or external interrupts) the core will resume execution, but the active energy of running the core pipeline and repeatedly accessing the data and instruction caches is effectively reduced to static idle power. The thesis also shows that existing combined software-hardware schemes to track data regions which do not require coherency can adequately address the directory-associativity problem, and introduces a new coherency sharer encoding which reduces the energy consumed by sharer invalidations when sharers are grouped closely together, such as would be the case with a system running many tasks with a small degree of parallelism in each. The research concludes by using the extremely fast simulation speeds developed to produce a large set of training data, collecting various runtime and energy statistics for a wide range of embedded applications on a huge diverse range of potential MPSoC designs. This data was used to train a series of machine learning based models which were then evaluated on their capacity to predict performance characteristics of unseen workload combinations across the explored MPSoC design space, using only two sample simulations, with promising results from some of the machine learning techniques. The models were then used to produce a ranking of predicted performance across the design space, and on average Random Forest was able to predict the best design within 89% of the runtime performance of the actual best tested design, and better than 93% of the alternative design space. When predicting for a weighted metric of energy, delay and area, Random Forest on average produced results within 93% of the optimum result. In summary this thesis improves upon the state of the art for cycle accurate multicore simulation, introduces novel energy saving changes the the ISA and microarchitecture of future multicore processors, and demonstrates the viability of machine learning techniques to significantly accelerate the design space exploration required to bring a new manycore design to market. 004
210	Memcached och Redis cachning på lokalt system i en dator / Memcached and Redis caching on a local system in a computer Hallberg, Jonathan January 2019 (has links) I denna studie utförs ett experiment med en empirisk mätning på svarstiderna från Redis och Memcached som cachningstekniker med en MySQL-databas. En webbapplikation hämtar data från databasen, som lagras på en cacheserver. Experimentet utförs på en laptop som hanterar alla delar i systemet. Tiden det tar för webbapplikationen att få den begärda informationen från cache eller databasen, kallas för svarstid. Hypotesen antar att Redis skall producera ett bättre resultat än Memcached. En webbplats skapades som artefakt och används som testmiljö där experimentet utförs. Användningen av Memcached ändrades till Memcache på grund av kompatibilitetsproblem mellan Memcached och Windows. Memcache är en äldre version av Memcached, men som innehåller samma funktioner. De resultaten som framförs av mätserierna presenteras i linjediagram och stapeldiagram, där medelvärdet av svarstider samt standardavvikelse. I en analys visas Redis utföra en bättre cachning än Memcache. I alla tre mätserier visar Redis ett lägre medelvärde av svarstider än Memcache. Cache MySQL Redis Memcache Computer and Information Sciences Data- och informationsvetenskap

Search results