• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 1
  • Tagged with
  • 10
  • 10
  • 6
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Lightweight speculative support for aggressive auto-parallelisation tools

Powell, Daniel Christopher January 2015 (has links)
With the recent move to multi-core architectures it has become important to create the means to exploit the performance made available to us by these architectures. Unfortunately parallel programming is often a difficult and time-intensive process, even to expert programmers. Auto-parallelisation tools have aimed to fill the performance gap this has created, but static analysis commonly employed by such tools are unable to provide the performance improvements required due to lack of information at compile-time. More recent aggressive parallelisation tools use profiled-execution to discover new parallel opportunities, but these tools are inherently unsafe. They require either manual confirmation that their changes are safe, completely ruling out auto-parallelisation, or they rely upon speculative execution such as software thread-level speculation (SW-TLS) to confirm safe execution at runtime. SW-TLS schemes are currently very heavyweight and often fail to provide speedups for a program. Performance gains are dependent upon suitable parallel opportunities, correct selection and configuration, and appropriate execution platforms. Little research has been completed into the automated implemention of SW-TLS programs. This thesis presents an automated, machine-learning based technique to select and configure suitable speculation schemes when appropriate. This is performed by extracting metrics from potential parallel opportunities and using them to determine if a loop is suitable for speculative execution and if so, which speculation policy should be used. An extensive evaluation of this technique is presented, verifying that SW-TLS configuration can indeed be automated and provide reliable performance gains. This work has shown that on an 8-core machine, up to 7.75X and a geometric mean of 1.64X speedups can be obtained through automatic configuration, providing on average 74% of the speedup obtainable through manual configuration. Beyond automated configuration, this thesis explores the idea that many SW-TLS schemes focus too heavily on recovery from detecting a dependence violation. Doing so often results in worse than sequential performance for many real-world applications, therefore this work hypothesises that for many highly-likely parallel candidates, discovered through aggressive parallelisation techniques, would benefit from a simple dependence check without the ability to roll back. Dependence violations become extremely expensive in this scenario, however this would be incredibly rare. With a thorough evaluation of the technique this thesis confirms the hypothesis whilst achieving speedups of up to 22.53X, and a geometric mean of 2.16X on a 32-core machine. In a competitive scheduling scenario performance loss can be restricted to at least sequential speeds, even when a dependence has been detected. As a means to lower costs further this thesis explores other platforms to aid in the execution of speculative error checking. Introduced is the use of a GPU to offload some of the costs to during execution that confirms that using an auxiliary device is a legitimate means to obtain further speedup. Evaluation demonstrates that doing so can achieve up to 14.74X and a geometric mean of 1.99X speedup on a 12-core hyperthreaded machine. Compared to standard CPU-only techniques this performs slightly slower with a geometric mean of 0.96X speedup, however this is likely to improve with upcoming GPU designs. With the knowledge that GPU’s can be used to reduce speculation costs, this thesis also investigates their use to speculatively improve execution times also. Presented is a novel SW-TLS scheme that targets GPU-based execution for use with aggressive auto-parallelisers. This scheme is executed using a competitive scheduling model, ensuring performance is no lower than sequential execution, whilst being able to provide speedups of up to 99X and on average 3.2X over sequential. On average this technique outperformed static analysis alone by a factor of 7X and achieved approximately 99% of the speedup obtained from manual parallel implementations and outperformed the state-of-the-art in GPU SW-TLS by a factor of 1.45.
2

Accelerating Markov chain Monte Carlo via parallel predictive prefetching

Angelino, Elaine Lee 21 October 2014 (has links)
We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. This dissertation demonstrates that MCMC inference can be accelerated in a model of parallel computation that uses speculation to predict and complete computational work ahead of when it is known to be useful. By exploiting fast, iterative approximations to the target density, we can speculatively evaluate many potential future steps of the chain in parallel. In Bayesian inference problems, this approach can accelerate sampling from the target distribution, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores. / Engineering and Applied Sciences
3

Speculative Interference: A Modern Spectre Attack / Spekulativ Interferens: En Modern Spectre-attack

Borg, Isak January 2021 (has links)
Since the Spectre family of attacks were made public knowledge in January of 2018, researchers, manufacturers and interested individuals have experimented a lot with creating defences against it. But there have also been a lot of research aimed at circumventing these defences and finding alternative side-channels and mechanisms for performing Spectre-type attacks. This thesis implements and demonstrates a proof of concept of one of these newfound attacks known as a Speculative interference attack. This is done in a simulated environment, which to our knowledge has not been done before at the time of writing this report. After the 'basic' version of a Spectre attack has been explained, the thesis will explain how the more advanced interference attack works and how it is implemented in the simulated environment. In the end the results gained with the attack will be presented, which should convince the reader of the relevance and possibilities of the attack. / Efter att säkerhetsattackerna kända som Spectre offentliggjordes i Januari 2018 har bådeforskare, utvecklare och intresserade individer experimenterat med att ta fram försvar mot dem. Det har också spenderats mycket resurser och tid på att finna sätt att kringgå dessa försvar och att hitta alternativa sido-kanaler och mekanismer som kan utnyttjas för att genomföra en Spectre-attack. Den här uppsatsen demonstrerar en fungerande implementation av en av dessa nyfunna attacker, känd som en ’Speculative interference attack’. Detta görs i en simulerad miljö, vilken enligt vår kännedom inte tidigare har gjorts vid genomförandet av detta arbete. Efter att en mer grundläggande version av en Spectre-attack har förklarats kommer uppsatsen att gå igenom hur den mer avancerade ’interference’ attacken fungerar och hur den är implementerad. I slutändan kommer de resultat attacken tagit fram att redogöras, vilket bör övertyga läsaren om attackens relevans och möjligheter.
4

Analysis and Enforcement of Properties in Software Systems

Wu, Meng 02 July 2019 (has links)
Due to the lack of effective techniques for detecting and mitigating property violations, existing approaches to ensure the safety and security of software systems are often labor intensive and error prone. Furthermore, they focus primarily on functional correctness of the software code while ignoring micro-architectural details of the underlying processor, such as cache and speculative execution, which may undermine their soundness guarantees. To fill the gap, I propose a set of new methods and tools for ensuring the safety and security of software systems. Broadly speaking, these methods and tools fall into three categories. The first category is concerned with static program analysis. Specifically, I develop a novel abstract interpretation framework that considers both speculative execution and a cache model, and guarantees to be sound for estimating the execution time of a program and detecting side-channel information leaks. The second category is concerned with static program transformation. The goal is to eliminate side channels by equalizing the number of CPU cycles and the number of cache misses along all program paths for all sensitive variables. The third category is concerned with runtime safety enforcement. Given a property that may be violated by a reactive system, the goal is to synthesize an enforcer, called the shield, to correct the erroneous behaviors of the system instantaneously, so that the property is always satisfied by the combined system. I develop techniques to make the shield practical by handling both burst error and real-valued signals. The proposed techniques have been implemented and evaluated on realistic applications to demonstrate their effectiveness and efficiency. / Doctor of Philosophy / It is important for everything around us to follow some rules to work correctly. That is the same for our software systems to follow the security and safety properties. Especially, softwares may leak information via unexpected ways, e.g. the program timing, which makes it more difficult to be detected or mitigated. For instance, if the execution time of a program is related to the sensitive value, the attacker may obtain information about the sensitive value. On the other side, due to the complexity of software, it is nearly impossible to fully test or verify them. However, the correctness of software systems at runtime is crucial for critical applications. While existing approaches to find or resolve properties violation problem are often labor intensive and error prone, in this dissertation, I first propose an automated tool for detecting and mitigating the security vulnerability through program timing. Programs processed by the tool are guaranteed to be time constant under any sensitive values. I have also taken the influence of speculative execution, which is the cause behind recent Spectre and Meltdown attack, into consideration for the first time. To enforce the correctness of programs at runtime, I introduce an extra component that can be attached to the original system to correct any violation if it happens, thus the entire system will still be correct. All proposed methods have been evaluated on a variety of real world applications. The results show that these methods are effective and efficient in practice.
5

Exploitable Hardware Features and Vulnerabilities Enhanced Side-Channel Attacks on Intel SGX and Their Countermeasures

Chen, Guoxing 29 August 2019 (has links)
No description available.
6

Improving Input Prediction in Online Fighting Games

Ehlert, Anton January 2021 (has links)
Many online fighting games use rollback netcode in order to compensate for network delay. Rollback netcode allows players to experience the game as having reduced delay. A drawback of this is that players will sometimes see the game quickly ”jump” to a different state to adjust for the the remote player’s actions. Rollback netcode implementations require a method for predicting the remote player’s next button inputs. Current implementations use a naive repeatlastframe policy for such prediction. There is a possibility that alternative methods may lead to improved user experience. This project examines the problem of improving input prediction in fighting games. It details the development of a new prediction model based on recurrent neural networks. The model was trained and evaluated using a dataset of several thousand recorded player input sequences. The results show that the new model slightly outperforms the naive method in prediction accuracy, with the difference being greater for longer predictions. However, it has far higher requirements both in terms of memory and computation cost. It seems unlikely that the model would significantly improve on current rollback netcode implementations. However, there may be ways to improve predictions further, and the effects on user experience remains unknown. / Många online fightingspel använder rollback netcode för att kompensera för nätverksfördröjning. Rollback netcode låter spelare uppleva spelet med mindre fördröjning. En nackdel av detta är att spelare ibland ser spelet snabbt ”hoppa” till ett annat tillstånd för att justera för motspelarens handlingar. Rollback netcode implementationer behöver en policy för att förutsäga motspelarens nästa knapptryckningar. Nuvarande implementationer använder en naiv repetera-senaste-frame policy för förutsägelser. Det finns en möjlighet att alternativa metoder kan leda till förbättrad användarupplevelse. Det här projektet undersöker problemet att förbättra förutsägelser av knapptryckningar i fightingspel. Det beskriver utvecklingen av en ny förutsägelsemodell baserad på rekursiva neuronnät. Modellen tränades och evaluerades med ett dataset av flera tusen inspelade knappsekvenser. Resultaten visar att den nya modellen överträffar den naiva metoden i noggrannhet, med större skillnad för längre förutsägelser. Dock har den mycket högre krav i både minne och beräkningskostad. Det verkar osannolikt att modellen skulle avsevärt förbättra nuvarande rollback netcode implementationer. Men det kan finnas sätt att förbättra förutsägelser ytterligare, och påverkan på användarupplevelsen förblir okänd.
7

Extracting Parallelism from Legacy Sequential Code Using Transactional Memory

Saad Ibrahim, Mohamed Mohamed 26 July 2016 (has links)
Increasing the number of processors has become the mainstream for the modern chip design approaches. However, most applications are designed or written for single core processors; so they do not benefit from the numerous underlying computation resources. Moreover, there exists a large base of legacy software which requires an immense effort and cost of rewriting and re-engineering to be made parallel. In the past decades, there has been a growing interest in automatic parallelization. This is to relieve programmers from the painful and error-prone manual parallelization process, and to cope with new architecture trend of multi-core and many-core CPUs. Automatic parallelization techniques vary in properties such as: the level of paraellism (e.g., instructions, loops, traces, tasks); the need for custom hardware support; using optimistic execution or relying on conservative decisions; online, offline or both; and the level of source code exposure. Transactional Memory (TM) has emerged as a powerful concurrency control abstraction. TM simplifies parallel programming to the level of coarse-grained locking while achieving fine-grained locking performance. This dissertation exploits TM as an optimistic execution approach for transforming a sequential application into parallel. The design and the implementation of two frameworks that support automatic parallelization: Lerna and HydraVM, are proposed, along with a number of algorithmic optimizations to make the parallelization effective. HydraVM is a virtual machine that automatically extracts parallelism from legacy sequential code (at the bytecode level) through a set of techniques including code profiling, data dependency analysis, and execution analysis. HydraVM is built by extending the Jikes RVM and modifying its baseline compiler. Correctness of the program is preserved through exploiting Software Transactional Memory (STM) to manage concurrent and out-of-order memory accesses. Our experiments show that HydraVM achieves speedup between 2×-5× on a set of benchmark applications. Lerna is a compiler framework that automatically and transparently detects and extracts parallelism from sequential code through a set of techniques including code profiling, instrumentation, and adaptive execution. Lerna is cross-platform and independent of the programming language. The parallel execution exploits memory transactions to manage concurrent and out-of-order memory accesses. This scheme makes Lerna very effective for sequential applications with data sharing. This thesis introduces the general conditions for embedding any transactional memory algorithm into Lerna. In addition, the ordered version of four state-of-art algorithms have been integrated and evaluated using multiple benchmarks including RSTM micro benchmarks, STAMP and PARSEC. Lerna showed great results with average 2.7× (and up to 18×) speedup over the original (sequential) code. While prior research shows that transactions must commit in order to preserve program semantics, placing the ordering enforces scalability constraints at large number of cores. In this dissertation, we eliminates the need for commit transactions sequentially without affecting program consistency. This is achieved by building a cooperation mechanism in which transactions can forward some changes safely. This approach eliminates some of the false conflicts and increases the concurrency level of the parallel application. This thesis proposes a set of commit order algorithms that follow the aforementioned approach. Interestingly, using the proposed commit-order algorithms the peak gain over the sequential non-instrumented execution in RSTM micro benchmarks is 10× and 16.5× in STAMP. Another main contribution is to enhance the concurrency and the performance of TM in general, and its usage for parallelization in particular, by extending TM primitives. The extended TM primitives extracts the embedded low level application semantics without affecting TM abstraction. Furthermore, as the proposed extensions capture common code patterns, it is possible to be handled automatically through the compilation process. In this work, that was done through modifying the GCC compiler to support our TM extensions. Results showed speedups of up to 4× on different applications including micro benchmarks and STAMP. Our final contribution is supporting the commit-order through Hardware Transactional Memory (HTM). HTM contention manager cannot be modified because it is implemented inside the hardware. Given such constraint, we exploit HTM to reduce the transactional execution overhead by proposing two novel commit order algorithms, and a hybrid reduced hardware algorithm. The use of HTM improves the performance by up to 20% speedup. / Ph. D.
8

Increasing the performance of superscalar processors through value prediction / La prédiction de valeurs comme moyen d'augmenter la performance des processeurs superscalaires

Perais, Arthur 24 September 2015 (has links)
Bien que les processeurs actuels possèdent plus de 10 cœurs, de nombreux programmes restent purement séquentiels. Cela peut être dû à l'algorithme que le programme met en œuvre, au programme étant vieux et ayant été écrit durant l'ère des uni-processeurs, ou simplement à des contraintes temporelles, car écrire du code parallèle est notoirement long et difficile. De plus, même pour les programmes parallèles, la performance de la partie séquentielle de ces programmes devient rapidement le facteur limitant l'augmentation de la performance apportée par l'augmentation du nombre de cœurs disponibles, ce qui est exprimé par la loi d'Amdahl. Conséquemment, augmenter la performance séquentielle reste une approche valide même à l'ère des multi-cœurs.Malheureusement, la façon conventionnelle d'améliorer la performance (augmenter la taille de la fenêtre d'instructions) contribue à l'augmentation de la complexité et de la consommation du processeur. Dans ces travaux, nous revisitons une technique visant à améliorer la performance de façon orthogonale : La prédiction de valeurs. Au lieu d'augmenter les capacités du moteur d'exécution, la prédiction de valeurs améliore l'utilisation des ressources existantes en augmentant le parallélisme d'instructions disponible.En particulier, nous nous attaquons aux trois problèmes majeurs empêchant la prédiction de valeurs d'être mise en œuvre dans les processeurs modernes. Premièrement, nous proposons de déplacer la validation des prédictions depuis le moteur d'exécution vers l'étage de retirement des instructions. Deuxièmement, nous proposons un nouveau modèle d'exécution qui exécute certaines instructions dans l'ordre soit avant soit après le moteur d'exécution dans le désordre. Cela réduit la pression exercée sur ledit moteur et permet de réduire ses capacités. De cette manière, le nombre de ports requis sur le fichier de registre et la complexité générale diminuent. Troisièmement, nous présentons un mécanisme de prédiction imitant le mécanisme de récupération des instructions : La prédiction par blocs. Cela permet de prédire plusieurs instructions par cycle tout en effectuant une unique lecture dans le prédicteur. Ces trois propositions forment une mise en œuvre possible de la prédiction de valeurs qui est réaliste mais néanmoins performante. / Although currently available general purpose microprocessors feature more than 10 cores, many programs remain mostly sequential. This can either be due to an inherent property of the algorithm used by the program, to the program being old and written during the uni-processor era, or simply to time to market constraints, as writing and validating parallel code is known to be hard. Moreover, even for parallel programs, the performance of the sequential part quickly becomes the limiting improvement factor as more cores are made available to the application, as expressed by Amdahl's Law. Consequently, increasing sequential performance remains a valid approach in the multi-core era. Unfortunately, conventional means to do so - increasing the out-of-order window size and issue width - are major contributors to the complexity and power consumption of the chip. In this thesis, we revisit a previously proposed technique that aimed to improve performance in an orthogonal fashion: Value Prediction (VP). Instead of increasing the execution engine aggressiveness, VP improves the utilization of existing resources by increasing the available Instruction Level Parallelism. In particular, we address the three main issues preventing VP from being implemented. First, we propose to remove validation and recovery from the execution engine, and do it in-order at Commit. Second, we propose a new execution model that executes some instructions in-order either before or after the out-of-order engine. This reduces pressure on said engine and allows to reduce its aggressiveness. As a result, port requirement on the Physical Register File and overall complexity decrease. Third, we propose a prediction scheme that mimics the instruction fetch scheme: Block Based Prediction. This allows predicting several instructions per cycle with a single read, hence a single port on the predictor array. This three propositions form a possible implementation of Value Prediction that is both realistic and efficient.
9

Maintaining Security in the Era of Microarchitectural Attacks

Oleksenko, Oleksii 16 November 2021 (has links)
Shared microarchitectural state is a target for side-channel attacks that leverage timing measurements to leak information across security domains. These attacks are further enhanced by speculative execution, which transiently distorts the control and data flow of applications, and by untrusted environments, where the attacker may have complete control over the victim program. Under these conditions, microarchitectural attacks can bypass software isolation mechanisms, and hence they threaten the security of virtually any application running in a shared environment. Numerous approaches have been proposed to defend against microarchitectural attacks, but we lack the means to test them and ensure their effectiveness. The users cannot test them manually because the effects of the defences are not visible to software. Testing the defences by attempting attacks is also suboptimal because the attacks are inherently unstable, and a failed attack is not always an indicator of a successful defence. Moreover, some classes of defences can be disabled at runtime. Hence, we need automated tools that would check the effectiveness of defences, both at design time and at runtime. Yet, as it is common in security, the existing solutions lag behind the developments in attacks. In this thesis, we propose three techniques that check the effectiveness of defences against modern microarchitectural attacks. Revizor is an approach to automatically detect microarchitectural information leakage in commercial black-box CPUs. SpecFuzz is a technique for dynamic testing of applications to find instances of speculative vulnerabilities. Varys is an approach to runtime monitoring of system defences against microarchitectural attacks. We show that with these techniques, we can successfully detect microarchitectural vulnerabilities in hardware and flaws in defences against them; find unpatched instances of speculative vulnerabilities in software; and detect attempts to invalidate system defences.
10

High-performant, Replicated, Queue-oriented Transaction Processing Systems on Modern Computing Infrastructures

Thamir Qadah (11132985) 27 July 2021 (has links)
With the shifting landscape of computing hardware architectures and the emergence of new computing environments (e.g., large main-memory systems, hundreds of CPUs, distributed and virtualized cloud-based resources), state-of-the-art designs of transaction processing systems that rely on conventional wisdom suffer from lost performance optimization opportunities. This dissertation challenges conventional wisdom to rethink the design and implementation of transaction processing systems for modern computing environments.<div><br></div><div>We start by tackling the vertical hardware scaling challenge, and propose a deterministic approach to transaction processing on emerging multi-sockets, many-core, shared memory architecture to harness its unprecedented available parallelism. Our proposed priority-based queue-oriented transaction processing architecture eliminates the transaction contention footprint and uses speculative execution to improve the throughput of centralized deterministic transaction processing systems. We build QueCC and demonstrate up to two orders of magnitude better performance over the state-of-the-art.<br></div><div><br></div><div>We further tackle the horizontal scaling challenge and propose a distributed queue-oriented transaction processing engine that relies on queue-oriented communication to eliminate the traditional overhead of commitment protocols for multi-partition transactions. We build Q-Store, and demonstrate up to 22x improvement in system throughput over the state-of-the-art deterministic transaction processing systems.<br></div><div><br></div><div>Finally, we propose a generalized framework for designing distributed and replicated deterministic transaction processing systems. We introduce the concept of speculative replication to hide the latency overhead of replication. We prototype the speculative replication protocol in QR-Store and perform an extensive experimental evaluation using standard benchmarks. We show that QR-Store can achieve a throughput of 1.9 million replicated transactions per second in under 200 milliseconds and a replication overhead of 8%-25%compared to non-replicated configurations.<br></div>

Page generated in 0.5414 seconds