Global ETD Search

1	Lightweight speculative support for aggressive auto-parallelisation tools Powell, Daniel Christopher January 2015 (has links) With the recent move to multi-core architectures it has become important to create the means to exploit the performance made available to us by these architectures. Unfortunately parallel programming is often a difficult and time-intensive process, even to expert programmers. Auto-parallelisation tools have aimed to fill the performance gap this has created, but static analysis commonly employed by such tools are unable to provide the performance improvements required due to lack of information at compile-time. More recent aggressive parallelisation tools use profiled-execution to discover new parallel opportunities, but these tools are inherently unsafe. They require either manual confirmation that their changes are safe, completely ruling out auto-parallelisation, or they rely upon speculative execution such as software thread-level speculation (SW-TLS) to confirm safe execution at runtime. SW-TLS schemes are currently very heavyweight and often fail to provide speedups for a program. Performance gains are dependent upon suitable parallel opportunities, correct selection and configuration, and appropriate execution platforms. Little research has been completed into the automated implemention of SW-TLS programs. This thesis presents an automated, machine-learning based technique to select and configure suitable speculation schemes when appropriate. This is performed by extracting metrics from potential parallel opportunities and using them to determine if a loop is suitable for speculative execution and if so, which speculation policy should be used. An extensive evaluation of this technique is presented, verifying that SW-TLS configuration can indeed be automated and provide reliable performance gains. This work has shown that on an 8-core machine, up to 7.75X and a geometric mean of 1.64X speedups can be obtained through automatic configuration, providing on average 74% of the speedup obtainable through manual configuration. Beyond automated configuration, this thesis explores the idea that many SW-TLS schemes focus too heavily on recovery from detecting a dependence violation. Doing so often results in worse than sequential performance for many real-world applications, therefore this work hypothesises that for many highly-likely parallel candidates, discovered through aggressive parallelisation techniques, would benefit from a simple dependence check without the ability to roll back. Dependence violations become extremely expensive in this scenario, however this would be incredibly rare. With a thorough evaluation of the technique this thesis confirms the hypothesis whilst achieving speedups of up to 22.53X, and a geometric mean of 2.16X on a 32-core machine. In a competitive scheduling scenario performance loss can be restricted to at least sequential speeds, even when a dependence has been detected. As a means to lower costs further this thesis explores other platforms to aid in the execution of speculative error checking. Introduced is the use of a GPU to offload some of the costs to during execution that confirms that using an auxiliary device is a legitimate means to obtain further speedup. Evaluation demonstrates that doing so can achieve up to 14.74X and a geometric mean of 1.99X speedup on a 12-core hyperthreaded machine. Compared to standard CPU-only techniques this performs slightly slower with a geometric mean of 0.96X speedup, however this is likely to improve with upcoming GPU designs. With the knowledge that GPU’s can be used to reduce speculation costs, this thesis also investigates their use to speculatively improve execution times also. Presented is a novel SW-TLS scheme that targets GPU-based execution for use with aggressive auto-parallelisers. This scheme is executed using a competitive scheduling model, ensuring performance is no lower than sequential execution, whilst being able to provide speedups of up to 99X and on average 3.2X over sequential. On average this technique outperformed static analysis alone by a factor of 7X and achieved approximately 99% of the speedup obtained from manual parallel implementations and outperformed the state-of-the-art in GPU SW-TLS by a factor of 1.45. 005.2 parallelism ; speculative execution
2	Accelerating Markov chain Monte Carlo via parallel predictive prefetching Angelino, Elaine Lee 21 October 2014 (has links) We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. This dissertation demonstrates that MCMC inference can be accelerated in a model of parallel computation that uses speculation to predict and complete computational work ahead of when it is known to be useful. By exploiting fast, iterative approximations to the target density, we can speculatively evaluate many potential future steps of the chain in parallel. In Bayesian inference problems, this approach can accelerate sampling from the target distribution, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores. / Engineering and Applied Sciences Computer science Bayesian inference Markov chain Monte Carlo MCMC parallel prefetching speculative execution
3	Speculative Interference: A Modern Spectre Attack / Spekulativ Interferens: En Modern Spectre-attack Borg, Isak January 2021 (has links) Since the Spectre family of attacks were made public knowledge in January of 2018, researchers, manufacturers and interested individuals have experimented a lot with creating defences against it. But there have also been a lot of research aimed at circumventing these defences and finding alternative side-channels and mechanisms for performing Spectre-type attacks. This thesis implements and demonstrates a proof of concept of one of these newfound attacks known as a Speculative interference attack. This is done in a simulated environment, which to our knowledge has not been done before at the time of writing this report. After the 'basic' version of a Spectre attack has been explained, the thesis will explain how the more advanced interference attack works and how it is implemented in the simulated environment. In the end the results gained with the attack will be presented, which should convince the reader of the relevance and possibilities of the attack. / Efter att säkerhetsattackerna kända som Spectre offentliggjordes i Januari 2018 har bådeforskare, utvecklare och intresserade individer experimenterat med att ta fram försvar mot dem. Det har också spenderats mycket resurser och tid på att finna sätt att kringgå dessa försvar och att hitta alternativa sido-kanaler och mekanismer som kan utnyttjas för att genomföra en Spectre-attack. Den här uppsatsen demonstrerar en fungerande implementation av en av dessa nyfunna attacker, känd som en ’Speculative interference attack’. Detta görs i en simulerad miljö, vilken enligt vår kännedom inte tidigare har gjorts vid genomförandet av detta arbete. Efter att en mer grundläggande version av en Spectre-attack har förklarats kommer uppsatsen att gå igenom hur den mer avancerade ’interference’ attacken fungerar och hur den är implementerad. I slutändan kommer de resultat attacken tagit fram att redogöras, vilket bör övertyga läsaren om attackens relevans och möjligheter. Spectre spectre attack speculative interference speculative execution invisible speculation gem5 Computer and Information Sciences Data- och informationsvetenskap
4	Exploitable Hardware Features and Vulnerabilities Enhanced Side-Channel Attacks on Intel SGX and Their Countermeasures Chen, Guoxing 29 August 2019 (has links) No description available. Computer Engineering Computer Science Intel SGX computer engineering hyper-threading speculative execution
5	Analysis and Enforcement of Properties in Software Systems Wu, Meng 02 July 2019 (has links) Due to the lack of effective techniques for detecting and mitigating property violations, existing approaches to ensure the safety and security of software systems are often labor intensive and error prone. Furthermore, they focus primarily on functional correctness of the software code while ignoring micro-architectural details of the underlying processor, such as cache and speculative execution, which may undermine their soundness guarantees. To fill the gap, I propose a set of new methods and tools for ensuring the safety and security of software systems. Broadly speaking, these methods and tools fall into three categories. The first category is concerned with static program analysis. Specifically, I develop a novel abstract interpretation framework that considers both speculative execution and a cache model, and guarantees to be sound for estimating the execution time of a program and detecting side-channel information leaks. The second category is concerned with static program transformation. The goal is to eliminate side channels by equalizing the number of CPU cycles and the number of cache misses along all program paths for all sensitive variables. The third category is concerned with runtime safety enforcement. Given a property that may be violated by a reactive system, the goal is to synthesize an enforcer, called the shield, to correct the erroneous behaviors of the system instantaneously, so that the property is always satisfied by the combined system. I develop techniques to make the shield practical by handling both burst error and real-valued signals. The proposed techniques have been implemented and evaluated on realistic applications to demonstrate their effectiveness and efficiency. / Doctor of Philosophy / It is important for everything around us to follow some rules to work correctly. That is the same for our software systems to follow the security and safety properties. Especially, softwares may leak information via unexpected ways, e.g. the program timing, which makes it more difficult to be detected or mitigated. For instance, if the execution time of a program is related to the sensitive value, the attacker may obtain information about the sensitive value. On the other side, due to the complexity of software, it is nearly impossible to fully test or verify them. However, the correctness of software systems at runtime is crucial for critical applications. While existing approaches to find or resolve properties violation problem are often labor intensive and error prone, in this dissertation, I first propose an automated tool for detecting and mitigating the security vulnerability through program timing. Programs processed by the tool are guaranteed to be time constant under any sensitive values. I have also taken the influence of speculative execution, which is the cause behind recent Spectre and Meltdown attack, into consideration for the first time. To enforce the correctness of programs at runtime, I introduce an extra component that can be attached to the original system to correct any violation if it happens, thus the entire system will still be correct. All proposed methods have been evaluated on a variety of real world applications. The results show that these methods are effective and efficient in practice. Shield Synthesis Program Analysis Timing Side Channel Cache Timing Leak Speculative Execution Abstract Interpretation
6	Improving Input Prediction in Online Fighting Games Ehlert, Anton January 2021 (has links) Many online fighting games use rollback netcode in order to compensate for network delay. Rollback netcode allows players to experience the game as having reduced delay. A drawback of this is that players will sometimes see the game quickly ”jump” to a different state to adjust for the the remote player’s actions. Rollback netcode implementations require a method for predicting the remote player’s next button inputs. Current implementations use a naive repeatlastframe policy for such prediction. There is a possibility that alternative methods may lead to improved user experience. This project examines the problem of improving input prediction in fighting games. It details the development of a new prediction model based on recurrent neural networks. The model was trained and evaluated using a dataset of several thousand recorded player input sequences. The results show that the new model slightly outperforms the naive method in prediction accuracy, with the difference being greater for longer predictions. However, it has far higher requirements both in terms of memory and computation cost. It seems unlikely that the model would significantly improve on current rollback netcode implementations. However, there may be ways to improve predictions further, and the effects on user experience remains unknown. / Många online fightingspel använder rollback netcode för att kompensera för nätverksfördröjning. Rollback netcode låter spelare uppleva spelet med mindre fördröjning. En nackdel av detta är att spelare ibland ser spelet snabbt ”hoppa” till ett annat tillstånd för att justera för motspelarens handlingar. Rollback netcode implementationer behöver en policy för att förutsäga motspelarens nästa knapptryckningar. Nuvarande implementationer använder en naiv repetera-senaste-frame policy för förutsägelser. Det finns en möjlighet att alternativa metoder kan leda till förbättrad användarupplevelse. Det här projektet undersöker problemet att förbättra förutsägelser av knapptryckningar i fightingspel. Det beskriver utvecklingen av en ny förutsägelsemodell baserad på rekursiva neuronnät. Modellen tränades och evaluerades med ett dataset av flera tusen inspelade knappsekvenser. Resultaten visar att den nya modellen överträffar den naiva metoden i noggrannhet, med större skillnad för längre förutsägelser. Dock har den mycket högre krav i både minne och beräkningskostad. Det verkar osannolikt att modellen skulle avsevärt förbättra nuvarande rollback netcode implementationer. Men det kan finnas sätt att förbättra förutsägelser ytterligare, och påverkan på användarupplevelsen förblir okänd. Fighting games sequence prediction machine learning speculative execution Fightingspel sekvensförutsägelse maskininlärning spekulativ exekvering Computer and Information Sciences Data- och informationsvetenskap
7	Practical Mitigations Against Memory Corruption and Transient Execution Attacks Ismail, Mohannad Adel Abdelmoniem Ahmed 31 May 2024 (has links) Memory corruption attacks have existed in C and C++ for more than 30 years, and over the years many defenses have been proposed. In addition to that, a new class of attacks, Spectre, has emerged that abuse speculative execution to leak secrets and sensitive data through micro-architectural side channels. Many defenses have been proposed to mitigate Spectre as well. However, with every new defense a new attack emerges, and then a new defense is proposed. This is an ongoing cycle between attackers and defenders. There exists many defenses for many different attack avenues. However, many suffer from either practicality or effectiveness issues, and security researchers need to balance out their compromises. Recently, many hardware vendors, such as Intel and ARM, have realized the extent of the issue of memory corruption attacks and have developed hardware security mechanisms that can be utilized to defend against these attacks. ARM, in particular, has released a mechanism called Pointer Authentication in which its main intended use is to protect the integrity of pointers by generating a Pointer Authentication Code (PAC) using a cryptographic hash function, as a Message Authentication Code (MAC), and placing it on the top unused bits of a 64-bit pointer. Placing the PAC on the top unused bits of the pointer changes its semantics and the pointer cannot be used unless it is properly authenticated. Hardware security features such as PAC are merely mechanisms not full fledged defences, and their effectiveness and practicality depends on how they are being utililzed. Naive use of these defenses doesn't alleviate the issues that exist in many state-of-the-art software defenses. The design of the defense that utilizes these hardware security features needs to have practicality and effectiveness in mind. Having both practicality and effectiveness is now a possible reality with these new hardware security features. This dissertation describes utilizing hardware security features, namely ARM PAC, to build effective and practical defense mechanisms. This dissertation first describes my past work called PACTight, a PAC based defense mechanism that defends against control-flow hijack- ing attacks. PACTight defines three security properties of a pointer such that, if achieved, prevent pointers from being tampered with. They are: 1) unforgeability: A pointer p should always point to its legitimate object; 2) non-copyability: A pointer p can only be used when it is at its specific legitimate location; 3) non-dangling: A pointer p cannot be used after it has been freed. PACTight tightly seals pointers and guarantees that a sealed pointer cannot be forged, copied, or dangling. PACTight protects all sensitive pointers, which are code pointers and pointers that point to code pointers. This completely prevents control-flow hijacking attacks, all while having low performance overhead. In addition to that, this dissertation proposes Scope-Type Integrity (STI), a new defense policy that enforces pointers to conform to the programmer's intended manner, by utilizing scope, type, and permission information. STI collects information offline about the type, scope, and permission (read/write) of every pointer in the program. This information can then be used at runtime to ensure that pointers comply with their intended purpose. This allows STI to defeat advanced pointer attacks since these attacks typically violate either the scope, type, or permission. We present Runtime Scope-Type Integrity (RSTI). RSTI leverages ARM Pointer Authentication (PA) to generate Pointer Authentication Codes (PACs), based on the information from STI, and place these PACs at the top bits of the pointer. At runtime, the PACs are then checked to ensure pointer usage complies with STI. RSTI overcomes two drawbacks that were present in PACTight: 1) PACTight relied on a large external metadata for protection, whereas RSTI uses very little metadata. 2) PACTight only protected a subset of pointers, whereas RSTI protects all pointers in a program. RSTI has large coverage with relatively low overhead. Also, this dissertation proposes sPACtre, a new and novel defense mechanism that aims to prevent Spectre control-flow attacks on existing hardware. sPACtre is an ARM-based defense mechanism that prevents Spectre control-flow attacks by relying on ARM's Pointer Authentication hardware security feature, annotations added to the program on the secrets that need to be protected from leakage and a dynamic tag-based bounds checking mechanism for arrays. We show that sPACtre can defend against these attacks. We evaluate sPACtre on a variety of cryptographic libraries with several cryptographic algorithms, as well as a synthetic benchmark, and show that it is eﬀicient and has low performance overhead Finally, this dissertation explains a new direction for utilizing hardware security features to protect energy harvesting devices from checkpoint-recovery errors and malicious attackers. / Doctor of Philosophy / In recent years, cyber-threats against computer systems have become more and more preva- lent. In spite of many recent advancements in defenses, these attacks are becoming more threatening. However, many of these defenses are not implemented in the real-world. This is due to their high performance overhead. This limited eﬀiciency is not acceptable in the real-world. In addition to that, many of these defenses have limited coverage and do not cover a wide variety of attacks. This makes the performance tradeoff even less convincing. Thus, there is a need for effective and practical defenses that can cover a wide variety of attacks. This dissertation first provides a comprehensive overview of the current state-of-the-art and most dangerous attacks. More specifically, three types of attacks are examined. First, control-flow hijacking attacks, which are attacks that divert the proper execution of a pro- gram to a malicious execution. Second, data oriented attacks. These are attacks that leak sensitive data in a program. Third, Spectre attacks, which are attacks that rely on sup- posedly hidden processor features to leak sensitive data. These "hidden" features are not entirely hidden. This dissertation explains these attacks in detail and the corresponding state-of-the-art defenses that have been proposed by the security research community to mitigate them. This dissertation then discusses effective and practical defense mechanisms that can mitigate these attacks. The dissertation discusses past work, PACTight, as well as its contributions, RSTI and sPACtre, presenting the full design, threat model, implementation, security eval- uation and performance evaluation of each one of these mechanisms. The dissertation relies on insights derived from the nature of the attack and compiler techniques. A compiler is a tool that transforms human-written code into machine code that is understandable by the computer. The compiler can be modified and used to make programs more secure with compiler techniques. The past work, PACTight, is a defense mechanism that defends against the first type of attacks, control-flow hijacking attacks, by preventing an attacker from abusing specific code in the program to divert the program to a malicious execution. Then, this dissertation presents RSTI, a new defense mechanism that overcomes the limitations of PACTight and extends it to cover data oriented attacks and prevent attackers from leaking sensitive data from the program. In addition to that, this dissertation presents sPACtre, a novel defesnse mechanism that defends against Spectre attacks, and prevents an attacker from abusing a processor's hidden features. Finally, this dissertation briefly discusses a possible future direction to protect a different class of devices, referred to as energy-harvesting devices, from attackers. Hijacking Data Oriented Attacks Hardware Security Features Speculative Execution Compiler-enabled Defenses Practical Design System Software Control-flow Spectre
8	Extracting Parallelism from Legacy Sequential Code Using Transactional Memory Saad Ibrahim, Mohamed Mohamed 26 July 2016 (has links) Increasing the number of processors has become the mainstream for the modern chip design approaches. However, most applications are designed or written for single core processors; so they do not benefit from the numerous underlying computation resources. Moreover, there exists a large base of legacy software which requires an immense effort and cost of rewriting and re-engineering to be made parallel. In the past decades, there has been a growing interest in automatic parallelization. This is to relieve programmers from the painful and error-prone manual parallelization process, and to cope with new architecture trend of multi-core and many-core CPUs. Automatic parallelization techniques vary in properties such as: the level of paraellism (e.g., instructions, loops, traces, tasks); the need for custom hardware support; using optimistic execution or relying on conservative decisions; online, offline or both; and the level of source code exposure. Transactional Memory (TM) has emerged as a powerful concurrency control abstraction. TM simplifies parallel programming to the level of coarse-grained locking while achieving fine-grained locking performance. This dissertation exploits TM as an optimistic execution approach for transforming a sequential application into parallel. The design and the implementation of two frameworks that support automatic parallelization: Lerna and HydraVM, are proposed, along with a number of algorithmic optimizations to make the parallelization effective. HydraVM is a virtual machine that automatically extracts parallelism from legacy sequential code (at the bytecode level) through a set of techniques including code profiling, data dependency analysis, and execution analysis. HydraVM is built by extending the Jikes RVM and modifying its baseline compiler. Correctness of the program is preserved through exploiting Software Transactional Memory (STM) to manage concurrent and out-of-order memory accesses. Our experiments show that HydraVM achieves speedup between 2×-5× on a set of benchmark applications. Lerna is a compiler framework that automatically and transparently detects and extracts parallelism from sequential code through a set of techniques including code profiling, instrumentation, and adaptive execution. Lerna is cross-platform and independent of the programming language. The parallel execution exploits memory transactions to manage concurrent and out-of-order memory accesses. This scheme makes Lerna very effective for sequential applications with data sharing. This thesis introduces the general conditions for embedding any transactional memory algorithm into Lerna. In addition, the ordered version of four state-of-art algorithms have been integrated and evaluated using multiple benchmarks including RSTM micro benchmarks, STAMP and PARSEC. Lerna showed great results with average 2.7× (and up to 18×) speedup over the original (sequential) code. While prior research shows that transactions must commit in order to preserve program semantics, placing the ordering enforces scalability constraints at large number of cores. In this dissertation, we eliminates the need for commit transactions sequentially without affecting program consistency. This is achieved by building a cooperation mechanism in which transactions can forward some changes safely. This approach eliminates some of the false conflicts and increases the concurrency level of the parallel application. This thesis proposes a set of commit order algorithms that follow the aforementioned approach. Interestingly, using the proposed commit-order algorithms the peak gain over the sequential non-instrumented execution in RSTM micro benchmarks is 10× and 16.5× in STAMP. Another main contribution is to enhance the concurrency and the performance of TM in general, and its usage for parallelization in particular, by extending TM primitives. The extended TM primitives extracts the embedded low level application semantics without affecting TM abstraction. Furthermore, as the proposed extensions capture common code patterns, it is possible to be handled automatically through the compilation process. In this work, that was done through modifying the GCC compiler to support our TM extensions. Results showed speedups of up to 4× on different applications including micro benchmarks and STAMP. Our final contribution is supporting the commit-order through Hardware Transactional Memory (HTM). HTM contention manager cannot be modified because it is implemented inside the hardware. Given such constraint, we exploit HTM to reduce the transactional execution overhead by proposing two novel commit order algorithms, and a hybrid reduced hardware algorithm. The use of HTM improves the performance by up to 20% speedup. / Ph. D. Transaction Memory Automatic Parallelization Low-Level Virtual Machine Optimistic Concurrency Speculative Execution Legacy Systems Age Commitment Order Low-Level TM Semantics TM Friendly Semantics
9	A Study of Mitigation Methods for Speculative Cache Side Channel Attacks Mosquera Ferrandiz, Fernando 05 1900 (has links) Side channels give attackers the opportunity to reveal private information without accessing it directly. In this study, several novel approaches are presented to mitigate cache side channel attacks including Spectre attack and its variants, resulting in several contributions. CHASM shows the information leakage in several new cache mapping schemes, where different cache address mappings may provide higher or lower protection against cache side channel attacks. GuardCache creates a noisy cache side-channel, making it more difficult for the attacker to determine if an access is a hit or miss (which is the basis for most side channel attacks). SecurityCloak is a framework that encompasses GuardCache with SafeLoadOnMiss whereby cache load misses during speculative execution are delayed until the speculation is resolved, thus preventing attacks that rely on accessing data in during (mis) speculated executions. To search for a compromise between security and performance, it is recommended not always to use protections such as SecurityCloak protections, but also to activate the protection only while executing critical sections of code or on-demand when an attack is detected (or suspected). Our experimental results show a high degree of obfuscation (and prevention of side channels) with a minimal impact on the performance. Computer Science
10	Increasing the performance of superscalar processors through value prediction / La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalaires Perais, Arthur 24 September 2015 (has links) Bien que les processeurs actuels possèdent plus de 10 cœurs, de nombreux programmes restent purement séquentiels. Cela peut être dû à l'algorithme que le programme met en œuvre, au programme étant vieux et ayant été écrit durant l'ère des uni-processeurs, ou simplement à des contraintes temporelles, car écrire du code parallèle est notoirement long et difficile. De plus, même pour les programmes parallèles, la performance de la partie séquentielle de ces programmes devient rapidement le facteur limitant l'augmentation de la performance apportée par l'augmentation du nombre de cœurs disponibles, ce qui est exprimé par la loi d'Amdahl. Conséquemment, augmenter la performance séquentielle reste une approche valide même à l'ère des multi-cœurs.Malheureusement, la façon conventionnelle d'améliorer la performance (augmenter la taille de la fenêtre d'instructions) contribue à l'augmentation de la complexité et de la consommation du processeur. Dans ces travaux, nous revisitons une technique visant à améliorer la performance de façon orthogonale : La prédiction de valeurs. Au lieu d'augmenter les capacités du moteur d'exécution, la prédiction de valeurs améliore l'utilisation des ressources existantes en augmentant le parallélisme d'instructions disponible.En particulier, nous nous attaquons aux trois problèmes majeurs empêchant la prédiction de valeurs d'être mise en œuvre dans les processeurs modernes. Premièrement, nous proposons de déplacer la validation des prédictions depuis le moteur d'exécution vers l'étage de retirement des instructions. Deuxièmement, nous proposons un nouveau modèle d'exécution qui exécute certaines instructions dans l'ordre soit avant soit après le moteur d'exécution dans le désordre. Cela réduit la pression exercée sur ledit moteur et permet de réduire ses capacités. De cette manière, le nombre de ports requis sur le fichier de registre et la complexité générale diminuent. Troisièmement, nous présentons un mécanisme de prédiction imitant le mécanisme de récupération des instructions : La prédiction par blocs. Cela permet de prédire plusieurs instructions par cycle tout en effectuant une unique lecture dans le prédicteur. Ces trois propositions forment une mise en œuvre possible de la prédiction de valeurs qui est réaliste mais néanmoins performante. / Although currently available general purpose microprocessors feature more than 10 cores, many programs remain mostly sequential. This can either be due to an inherent property of the algorithm used by the program, to the program being old and written during the uni-processor era, or simply to time to market constraints, as writing and validating parallel code is known to be hard. Moreover, even for parallel programs, the performance of the sequential part quickly becomes the limiting improvement factor as more cores are made available to the application, as expressed by Amdahl's Law. Consequently, increasing sequential performance remains a valid approach in the multi-core era. Unfortunately, conventional means to do so - increasing the out-of-order window size and issue width - are major contributors to the complexity and power consumption of the chip. In this thesis, we revisit a previously proposed technique that aimed to improve performance in an orthogonal fashion: Value Prediction (VP). Instead of increasing the execution engine aggressiveness, VP improves the utilization of existing resources by increasing the available Instruction Level Parallelism. In particular, we address the three main issues preventing VP from being implemented. First, we propose to remove validation and recovery from the execution engine, and do it in-order at Commit. Second, we propose a new execution model that executes some instructions in-order either before or after the out-of-order engine. This reduces pressure on said engine and allows to reduce its aggressiveness. As a result, port requirement on the Physical Register File and overall complexity decrease. Third, we propose a prediction scheme that mimics the instruction fetch scheme: Block Based Prediction. This allows predicting several instructions per cycle with a single read, hence a single port on the predictor array. This three propositions form a possible implementation of Value Prediction that is both realistic and efficient. Architecture des processeurs Processeurs à hautes performances Exécution spéculative Prédiction de valeurs Processeurs superscalaires Exécution dans le désordre Processor architecture High performance processors Speculative execution Value prediction Superscalar processors Out-Of-Order execution

Search results