Global ETD Search

551	Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors Esteve García, Albert 01 September 2017 (has links) Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far either do not consider either thread migration or the private use of data within different application phases, or do entail high overhead. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, this thesis proposes a TLB-based mechanism that is able to account for both thread migration and private application phases with low overhead. Classification status in the proposed TLB-based classification mechanisms is determined by the presence of the page translation stored in other core's TLBs. The classification schemes are analyzed in multilevel TLB hierarchies, for systems with both private and distributed shared last-level TLBs. This thesis introduces a page classification approach based on inspecting other core's TLBs upon every TLB miss. In particular, the proposed classification approach is based on exchange and count of tokens. Token counting on TLBs is a natural and efficient way for classifying memory pages. It does not require the use of complex and undesirable persistent requests or arbitration, since when two ormore TLBs race for accessing a page, tokens are appropriately distributed classifying the page as shared. However, TLB-based ability to classify private pages is strongly dependent on TLB size, as it relies on the presence of a page translation in the system TLBs. To overcome that, different TLB usage predictors (UP) have been proposed, which allow a page classification unaffected by TLB size. Specifically, this thesis introduces a predictor that obtains system-wide page usage information by either employing a shared last-level TLB structure (SUP) or cooperative TLBs working together (CUP). / La mayor parte de los datos referenciados por aplicaciones paralelas y secuenciales que se ejecutan enCMPs actuales son referenciadas por un único hilo, es decir, son privados. Recientemente, algunas propuestas aprovechan esta observación para mejorar muchos aspectos de los CMPs, como por ejemplo reducir el sobrecoste de la coherencia o la latencia de los accesos a cachés distribuidas. La efectividad de estas propuestas depende en gran medida de la cantidad de datos que son considerados privados. Sin embargo, los mecanismos propuestos hasta la fecha no consideran la migración de hilos de ejecución ni las fases de una aplicación. Por tanto, una cantidad considerable de datos privados no se detecta apropiadamente. Con el fin de aumentar la detección de datos privados, proponemos un mecanismo basado en las TLBs, capaz de reclasificar los datos a privado, y que detecta la migración de los hilos de ejecución sin añadir complejidad al sistema. Los mecanismos de clasificación en las TLBs se han analizado en estructuras de varios niveles, incluyendo TLBs privadas y con un último nivel de TLB compartido y distribuido. Esta tesis también presenta un mecanismo de clasificación de páginas basado en la inspección de las TLBs de otros núcleos tras cada fallo de TLB. De forma particular, el mecanismo propuesto se basa en el intercambio y el cuenteo de tokens (testigos). Contar tokens en las TLBs supone una forma natural y eficiente para la clasificación de páginas de memoria. Además, evita el uso de solicitudes persistentes o arbitraje alguno, ya que si dos o más TLBs compiten para acceder a una página, los tokens se distribuyen apropiadamente y la clasifican como compartida. Sin embargo, la habilidad de los mecanismos basados en TLB para clasificar páginas privadas depende del tamaño de las TLBs. La clasificación basada en las TLBs se basa en la presencia de una traducción en las TLBs del sistema. Para evitarlo, se han propuesto diversos predictores de uso en las TLBs (UP), los cuales permiten una clasificación independiente del tamaño de las TLBs. En concreto, esta tesis presenta un sistema mediante el que se obtiene información de uso de página a nivel de sistema con la ayuda de un nivel de TLB compartida (SUP) o mediante TLBs cooperando juntas (CUP). / La major part de les dades referenciades per aplicacions paral·leles i seqüencials que s'executen en CMPs actuals són referenciades per un sol fil, és a dir, són privades. Recentment, algunes propostes aprofiten aquesta observació per a millorar molts aspectes dels CMPs, com és reduir el sobrecost de la coherència o la latència d'accés a memòries cau distribuïdes. L'efectivitat d'aquestes propostes depen en gran mesura de la quantitat de dades detectades com a privades. No obstant això, els mecanismes proposats fins a la data no consideren la migració de fils d'execució ni les fases d'una aplicació. Per tant, una quantitat considerable de dades privades no es detecta apropiadament. A fi d'augmentar la detecció de dades privades, aquesta tesi proposa un mecanisme basat en les TLBs, capaç de reclassificar les dades com a privades, i que detecta la migració dels fils d'execució sense afegir complexitat al sistema. Els mecanismes de classificació en les TLBs s'han analitzat en estructures de diversos nivells, incloent-hi sistemes amb TLBs d'últimnivell compartides i distribuïdes. Aquesta tesi presenta un mecanisme de classificació de pàgines basat en inspeccionar les TLBs d'altres nuclis després de cada fallada de TLB. Concretament, el mecanisme proposat es basa en l'intercanvi i el compte de tokens. Comptar tokens en les TLBs suposa una forma natural i eficient per a la classificació de pàgines de memòria. A més, evita l'ús de sol·licituds persistents o arbitratge, ja que si dues o més TLBs competeixen per a accedir a una pàgina, els tokens es distribueixen apropiadament i la classifiquen com a compartida. No obstant això, l'habilitat dels mecanismes basats en TLB per a classificar pàgines privades depenen de la grandària de les TLBs. La classificació basada en les TLBs resta en la presència d'una traducció en les TLBs del sistema. Per a evitar-ho, s'han proposat diversos predictors d'ús en les TLBs (UP), els quals permeten una classificació independent de la grandària de les TLBs. Específicament, aquesta tesi introdueix un predictor que obté informació d'ús de la pàgina a escala de sistema mitjançant un nivell de TLB compartida (SUP) or mitjançant TLBs cooperant juntes (CUP). / Esteve García, A. (2017). Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86136 / TESIS Data classification Cache coherence TLB Private-shared Read-only data
552	The Performance Cost of Security Bowen, Lucy R 01 June 2019 (has links) Historically, performance has been the most important feature when optimizing computer hardware. Modern processors are so highly optimized that every cycle of computation time matters. However, this practice of optimizing for performance at all costs has been called into question by new microarchitectural attacks, e.g. Meltdown and Spectre. Microarchitectural attacks exploit the effects of microarchitectural components or optimizations in order to leak data to an attacker. These attacks have caused processor manufacturers to introduce performance impacting mitigations in both software and silicon. To investigate the performance impact of the various mitigations, a test suite of forty-seven different tests was created. This suite was run on a series of virtual machines that tested both Ubuntu 16 and Ubuntu 18. These tests investigated the performance change across version updates and the performance impact of CPU core number vs. default microarchitectural mitigations. The testing proved that the performance impact of the microarchitectural mitigations is non-trivial, as the percent difference in performance can be as high as 200%. Security Microarchitecture Attack Cache Attack Timing Attack Meltdown Spectre Information Security Risk Analysis Systems Architecture
553	Podpora pro vyrovnávací paměť pro systém GVFS / A Support of GVFS Caching Holý, Ondřej January 2014 (has links) The master's thesis deals with a support of caching in GVfs (Gnome Virtual filesystem). Basics of caches, cache invalidation, and cache replacement algorithms are described. Description of GIO filesystem abstraction and communication of modules within GVfs is provided next. The disadvantages of individual GVfs modules and, if present, their internal caches are discussed. The thesis proposes three types of cache. The first type of cache is for storing file metadata, the second one for directory listings, and the latter for file content. These caches have been implemented in a prototype and verified with respect to the functionality and performance. The main benefits of the proposed solution are faster work with virtual filesystems and provided missing functionality of lower-level virtual filesystem to GIO abstraction (namely, for instance seek operation).
554	Aplikace pro podporu geocachingu / Information System for Geocaching Kuchta, Michal January 2012 (has links) The work contains presentation of the Geocaching game, mainly from the view of an information system. It contains study of existing applications with their advantages and disadvantages. There is a specification of new application, that should solve major disadvantages of existing products in second part of the work. Design of such application, that meets the requirements, follows. The application is implemented, which is described in following chapters, with the aim to target specific problems that were solved during the implementation. There is summary and future plans cotained in the closing chapter.
555	Návrh pokročilé architektury procesoru v jazyce VHDL / VHDL Design of Advanced CPU Slavík, Daniel January 2010 (has links) The goal of this project was to study pipelined processor architectures along with instruction and data cache. Chosen pipelined architecture should be designed and implemented using VHDL language. Firstly, I decided to implement the subscalar architecture first, secondly, three versions of scalar architecture. For these architectures synthesis into FPGA was done and performance of these architectures was compared on chosen algorithm. In the next part of this thesis I designed and implemented instruction and data cache logic for both architectures. However I was not able to synthetise these caches. Last chapter of this thesis deals with the superscalar architecture, which is the architecture of nowadays.
556	Limites fondamentales de stockage pour les réseaux de diffusion de liens partagés et les réseaux de combinaison / Fundamental Limits of Cache-aided Shared-link Broadcast Networks and Combination Networks Wan, Kai 29 June 2018 (has links) Dans cette thèse, nous avons étudié le problème de cache codée en construisant la connexion entre le problème de cache codée avec placement non-codé et codage d'index, et en tirant parti des résultats de codage d'index pour caractériser les limites fondamentales du problème de cache codée. Nous avons principalement analysé le problème de cache codée dans le modèle de diffusion à liaison partagée et dans les réseaux combinés. Dans la première partie de cette thèse, pour les réseaux de diffusion de liens partagés, nous avons considéré la contrainte que le contenu placé dans les caches est non-codé. Lorsque le contenu du cache est non-codé et que les demandes de l'utilisateur sont révélées, le problème de cache peut être lié à un problème de codage d'index. Nous avons dérivé des limites fondamentales pour le problème de cache en utilisant des outils pour le problème de codage d'index. Nous avons dérivé un nouveau schéma réalisable de codage d'index en base d'un codage de source distribué. Cette borne interne est strictement meilleure que la borne interne du codage composite largement utilisée. Pour le problème de cache centralisée, une borne externe sous la contrainte de placement de cache non-codé est proposée en base de une borne externe “acyclic” de codage d’index. Il est prouvé que cette borne externe est atteinte par le schéma cMAN lorsque le nombre de fichiers n'est pas inférieur au nombre d'utilisateurs, et par le nouveau schéma proposé pour le codage d’index, sinon. Pour le problème de cache décentralisée, cette thèse propose une borne externe sous la contrainte que chaque utilisateur stocke des bits uniformément et indépendamment au hasard. Cette borne externe est atteinte par le schéma dMAN lorsque le nombre de fichiers n'est pas inférieur au nombre d'utilisateurs, et par notre codage d'index proposé autrement. Dans la deuxième partie de cette thèse, nous avons considéré le problème de cache dans les réseaux de relais, où le serveur communique avec les utilisateurs aidés par le cache via certains relais intermédiaires. En raison de la dureté de l'analyse sur les réseaux généraux, nous avons principalement considéré un réseau de relais symétrique bien connu, `réseaux de combinaison’, y compris H relais et binom {H} {r} utilisateurs où chaque utilisateur est connecté à un r-sous-ensemble de relais différent. Nous avons cherché à minimiser la charge de liaison maximale pour les cas les plus défavorables. Nous avons dérivé des bornes externes et internes dans cette thèse. Pour la borne externes, la méthode directe est que chaque fois que nous considérons une coupure de x relais et que la charge totale transmise à ces x relais peut être limitée à l'extérieur par la borne externes du modèle de lien partagé, y compris binom {x} {r} utilisateurs. Nous avons utilisé cette stratégie pour étendre les bornes externes du modèle de lien partagé et la borne externe “acyclic” aux réseaux de combinaison. Dans cette thèse, nous avons également resserré la borne externe “acyclic” dans les réseaux de combinaison en exploitant davantage la topologie du réseau et l'entropie conjointe des diverses variables aléatoires. Pour les schémas réalisables, il existe deux approches, la séparation et la non-séparation. De plus, nous avons étendu nos résultats à des modèles plus généraux, tels que des réseaux combinés où tous les relais et utilisateurs sont équipés par cache, et des systèmes de cache dans des réseaux relais plus généraux. Les résultats d'optimisation ont été donnés sous certaines contraintes et les évaluations numériques ont montré que nos schémas proposés surpassent l'état de l'art. / In this thesis, we investigated the coded caching problem by building the connection between coded caching with uncoded placement and index coding, and leveraging the index coding results to characterize the fundamental limits of coded caching problem. We mainly analysed the caching problem in shared-link broadcast model and in combination networks. In the first part of this thesis, for cache-aided shared-link broadcast networks, we considered the constraint that content is placed uncoded within the caches. When the cache contents are uncoded and the user demands are revealed, the caching problem can be connected to an index coding problem. We derived fundamental limits for the caching problem by using tools for the index coding problem. A novel index coding achievable scheme was first derived based on distributed source coding. This inner bound was proved to be strictly better than the widely used “composite (index) coding” inner bound by leveraging the ignored correlation among composites and the non-unique decoding. For the centralized caching problem, an outer bound under the constraint of uncoded cache placement is proposed based on the “acyclic index coding outer bound”. This outer bound is proved to be achieved by the cMAN scheme when the number of files is not less than the number of users, and by the proposed novel index coding achievable scheme otherwise. For the decentralized caching problem, this thesis proposes an outer bound under the constraint that each user stores bits uniformly and independently at random. This outer bound is achieved by dMAN when the number of files is not less than the number of users, and by our proposed novel index coding inner bound otherwise. In the second part of this thesis, we considered the centralized caching problem in two-hop relay networks, where the server communicates with cache-aided users through some intermediate relays. Because of the hardness of analysis on the general networks, we mainly considered a well-known symmetric relay networks, combination networks, including H relays and binom{H}{r} users where each user is connected to a different r-subset of relays. We aimed to minimize the max link-load for the worst cases. We derived outer and inner bounds in this thesis. For the outer bound, the straightforward way is that each time we consider a cut of x relays and the total load transmitted to these x relays could be outer bounded by the outer bound for the shared-link model including binom{x}{r} users. We used this strategy to extend the outer bounds for the shared-link model and the acyclic index coding outer bound to combination networks. In this thesis, we also tightened the extended acyclic index coding outer bound in combination networks by further leveraging the network topology and joint entropy of the various random variables. For the achievable schemes, there are two approaches, separation and non-separation. In the separation approach, we use cMAN cache placement and multicast message generation independent of the network topology. We then deliver cMAN multicast messages based on the network topology. In the non-separation approach, we design the placement and/or the multicast messages on the network topology. We proposed four delivery schemes on separation approach. On non-separation approach, firstly for any uncoded cache placement, we proposed a delivery scheme by generating multicast messages on network topology. Moreover, we also extended our results to more general models, such as combination networks with cache-aided relays and users, and caching systems in more general relay networks. Optimality results were given under some constraints and numerical evaluations showed that our proposed schemes outperform the state-of-the-art. Théorie d'information Codage d'index Réseaux de combinaison Cache Combination Networks Information Theory Index Coding Caching
557	Side-channel Threats on Modern Platforms: Attacks and Countermeasures Zhang, Xiaokuan January 2021 (has links) No description available. Computer Science Computer Engineering side channels ARM cache attacks smartphones differential privacy
558	Micro-architectural Attacks and Countermeasures Lu, Shiting January 2011 (has links) Micro-architectural analysis (MA) is a fast evolving area of side-channel cryptanalysis. This new area focuses on the effects of common processor components and their functionalities on the security of software cryptosystems. The main characteristic of micro-architectural attacks, which sets them aside from classical side-channel attacks, is the simple fact that they exploit the micro-architectural behavior of modern computer systems. Attackers could get running information through malicious software, then get some sensitive information through off-line analysis. This kind of attack has the following features: 1.) side channel information are acquired through software measurement on target machine with no need to use sophisticated devices. 2.) non-privilege process could get the running information of the privilege process. 3.) people can mount both a remote attack and local attack. This thesis mainly focuses one kinds of these attacks, data cache based timing attacks(CBTA). First, the main principle of CBTA is introduced, and several kinds of CBTA technique are discussed. Moreover, theoretical model is given under some attacks. Second, various countermeasures are described and their advantages and disadvantages are pointed out. Based on these discussions, the author proposes two anti-attack measures using hardware modification. Aiming at access-driven attacks, a XOR address remapping technique is proposed, which could obfuscate the mapping relationship between cache line and memory block. Aiming at timing-driven attacks, the IPMG mechanism is proposed innovatively. This mechanism could generate cache miss dynamically through observing the historic miss rate. These two mechanisms are realized on the MIPS processor and their effectiveness is verified on the FPGA board. At last, performance penalty and hardware cost are evaluated. The result shows that the proposed solution is effective with very low performance penalty and area cost Cache attack AES countermeasures side channel security Engineering and Technology Teknik och teknologier
559	A Proxy for Distributed Hash Table based Machine-to-Machine Networks Li, Daoyuan January 2011 (has links) Wireless sensor networks (WSNs) have been an increasingly interest for both researchers and entrepreneurs. As WSN technologies gradually matured and more and more use is reported, we find that most of current WSNs are still designed only for specific purposes. For example, one WSN may be used to gather information from a field and the collected data is not shared with other parties. We propose a distributed hash table (DHT) based machine-to-machine (M2M) system for connecting different WSNs together in order to fully utilize information collected from currently available WSNs. This thesis specifically looks at how to design and implement a proxy for such a system. We discuss why such a proxy can be useful for DHT-based M2M systems, what the proxy should consist of, and what kind of architecture is suitable. We also look into different communication protocols that can be used in these systems and discuss which ones best suit our purposes. The design of the proxy focuses on network management and service discovery of WSNs, and security considerations as well as caching mechanisms in order to improve performance. A prototype is implemented based on our design and evaluated. We find it feasible to implement such a DHT-based M2M system and a proxy in the system can be necessary and useful. Finally, we draw conclusions and discuss what future work remains to be done. M2M P2P DHT Proxy Gateway Zigbee 6LoWPAN CoAP Cache Engineering and Technology Teknik och teknologier
560	Real-time systems on multicore platforms: managing hardware resources for predictable execution Ye, Ying 22 February 2018 (has links) Shared hardware resources in commodity multicore processors are subject to contention from co-running threads. The resultant interference can lead to highly-variable performance for individual applications. This is particularly problematic for real-time applications, which require predictable timing guarantees. It also leads to a pessimistic estimate of the Worst Case Execution Time (WCET) for every real-time application. More CPU time needs to be reserved, thus less applications can enter the system. As the average execution time is usually far less than the WCET, a significant amount of reserved CPU resource would be wasted. Previous works have attempted partitioning the shared resources, amongst either CPUs or processes, to improve performance isolation. However, they have not proven to be both efficient and effective. In this thesis, we propose several mechanisms and frameworks that manage the shared caches and memory buses on multicore platforms. Firstly, we introduce a multicore real-time scheduling framework with the foreground/background scheduling model. Combining real-time load balancing with background scheduling, CPU utilization is greatly improved. Besides, a memory bus management mechanism is implemented on top of the background scheduling, making sure bus contention is under control while utilizing unused CPU cycles. Also, cache partitioning is thoroughly studied in this thesis, with a cache-aware load balancing algorithm and a dynamic cache partitioning framework proposed. Lastly, we describe a system architecture to integrate the above solutions all together. It tackles one of the toughest problems in OS innovation, legacy support, by converting existing OSes into libraries in a virtualized environment. Thus, within a single multicore platform, we benefit from the fine-grained resource control of a real-time OS and the richness of functionality of a general-purpose OS. Computer science Cache Memory Multicore resource management Operating system Real-time Virtualization

Search results