31 |
A Tool For Run Time Soft Error Fault Injection Into FPGA CircuitsZuzarte, Marvin 06 1900 (has links)
Safety and mission critical systems are currently deployed in many different fields where there is a greater presence of high energy particles (e.g.\ aerospace). The use of field programmable gate arrays (FPGAs) within safety critical systems is becoming more prevalent due to the design and cost benefits their use provides. The effects of externally caused faults on these safety critical systems cannot be neglected. In particular, high energy particle striking a circuit can cause a voltage change in the circuit known as a soft error. The effects these soft errors will have on the circuit needs to be understood in order to ensure that the systems will function properly in the event soft errors do occur.
In this thesis a tool is designed to facilitate the run-time injection of soft errors into a hardware circuit running on a FPGA. The tool allows for the control over the number of injections that can be performed and control over the rate that the injections will occur at. Additionally the tool records time stamps of when injections occur and time stamps of when errors are detected. This recorded data allows for the analysis of designs in conditions prone to soft errors.
The implemented tool allows for design time parametrization and run time configuration, allowing a multitude of tests to be run for a single compiled design. The tool also eliminates the need for a host computer after configuration by generating the injection locations and times on the FPGA. Eliminating the host computer allows for faster testing when compared to other methods as data transfer times are greatly reduced.
The implemented tool was run on classical examples of redundant structures, such as duplication with comparison and triple modular redundancy as well as a non-redundant structure to establish a baseline. The results of multiple tests run on each structure are analyzed to illustrate the uses of the tool and how the tool may be used to test other designs. / Thesis / Master of Applied Science (MASc)
|
32 |
Runtime of WebAssembly : A study into WebAssembly runtimeEriksson, Adam January 2023 (has links)
WebAssembly is Assembly-like code that is created by compiling other languages into Wasm. The Wasm file can then be run on the web at near native speed. The objective of this study is to find how WebAssemblys runtime compares to JavaScript and native. The study will also see if different browsers impact WebAssembly runtime. To get the information two different methods were used. Firstly literature and articles were used to gather data on JavaScript and native runtime compared to WebAssembly. Secondly an empirical study was conducted to compare four different browsers WebAssembly runtime. When comparing WebAssembly and JavaScript it was found that WebAssembly isn't always the fastest alternative due to many reasons but some major ones were how they were compiled and optimised. When looking at WebAssembly compared to native we could clearly see that WebAssembly was slower. These slowdowns came primarily from the increase in code size but the virtual environment and security checks also contributed to this. After the empirical study we could see some differences between browsers both in compilation speed and execution time. Between the chromium browsers the difference in execution time was very small and Firefox was always faster. But when looking at compilation time Chrome was faster with the other browsers having varying results. The research could conclude that WebAssembly can provide a useful boost to runtime on websites when used correctly. It is not something that is going to replace JavaScript but can be used together with it. We could also conclude that the user's choice of browser has a small impact on WebAssembly and can cause differences in runtime.
|
33 |
Hardening Software Against Memory Errors and AttacksNovark, Albert Eugene 01 February 2011 (has links)
Programs written in C and C++ are susceptible to a number of memory errors, including buffer overflows and dangling pointers. At best, these errors cause crashes or performance degradation. At worst, they enable security vulnerabilities, allowing denial-of-service or remote code execution. Existing runtime systems provide little protection against these errors. They allow minor errors to cause crashes and allow attackers to consistently exploit vulnerabilities. In this thesis, we introduce a series of runtime systems that protect deployed applications from memory errors. To guide the design of our systems, we analyze how errors interact with memory allocators to allow consistent exploitation of vulnerabilities. Our systems improve software in two ways: first, they tolerate memory errors, allowing programs to continue proper execution. Second, they decrease the probability of successfully exploiting security vulnerabilities caused by memory errors. Our first system, Archipelago, protects exceptionally sensitive server applications against severe errors using an object-per-page randomized allocator. It provides near-100% protection against most buffer overflows. Our second system, DieHarder, combines ideas from Archipelago, DieHard, and other systems to enable maximal protection against attacks while incurring minimal runtime and memory overhead. Our final system, Exterminator, automatically corrects heap-based buffer overflows and dangling pointers without requiring programmer intervention. Exterminator relies on both a low-overhead randomized allocator and statistical inference techniques to automatically isolate and correct errors in deployed applications.
|
34 |
Decentralized Crash-Resilient Runtime VerificationKazemlou, Shokoufeh January 2017 (has links)
This is the final revision of my M.Sc. Thesis. / Runtime Verification is a technique to extract information from a running system in order to detect executions violating a given correctness specification. In this thesis, we study distributed synchronous/asynchronous runtime verification of systems. In our setting, there is a set of distributed monitors that have only partial views of a large system and are subject to failures. In this context, it is unavoidable that monitors may have different views of the underlying system, and therefore may have different valuations of the correctness property. In this thesis, we propose an automata-based synchronous monitoring algorithm that copes with f crash failures in a distrbuted setting. The algorithm solves the synchronous monitoring problem in f + 1 rounds of communication, and significantly reduces the message size overhead. We also propose an algorithm for distributed crash-resilient asynchronous monitoring that consistently monitors the system under inspection without any communication between monitors. Each local monitor emits a verdict set solely based on its own partial observation, and the intersection of the verdict sets will be the same as the verdict computed by a centralized monitor that has full view of the system. / Thesis / Master of Science (MSc)
|
35 |
An Evaluation of the Linux Virtual Memory Manager to Determine Suitability for Runtime Variation of MemoryMuthukumaraswamy Sivakumar, Vijay 01 June 2007 (has links)
Systems that support virtual memory virtualize the available physical memory such that the applications running on them operate under the assumption that these systems have a larger amount of memory available than is actually present. The memory managers of these systems manage the virtual and the physical address spaces and are responsible for converting the virtual addresses used by the applications to the physical addresses used by the hardware. The memory managers assume that the amount of physical memory is constant and does not change during their period of operation. Some operating scenarios however, such as the power conservation mechanisms and virtual machine monitors, require the ability to vary the physical memory available at runtime, thereby making invalid the assumptions made by these memory managers.
In this work we evaluate the suitability of the Linux Memory Manager, which assumes that the available physical memory is constant, for the purposes of varying the memory at run time. We have implemented an infrastructure over the Linux 2.6.11 kernel that enables the user to vary the physical memory available to the system. The available physical memory is logically divided into banks and each bank can be turned on or off independent of the others, using the new system calls we have added to the kernel. Apart from adding support for the new system calls, other changes had to be made to the Linux memory manager to support the runtime variation of memory. To evaluate the suitability for varying memory we have performed experiments with varying memory sizes on both the modified and the unmodified kernels. We have observed that the design of the existing memory manager is not well suited to support the runtime variation of memory; we provide suggestions to make it better suited for such purposes.
Even though applications running on systems that support virtual memory do not use the physical memory directly and are not aware of the physical addresses they use, the amount of physical memory available for use affects the performance of the applications. The results of our experiments have helped us study the influence the amount of physical memory available for use has on the performance of various types of applications. These results can be used in scenarios requiring the ability to vary the memory at runtime to do so with least degradation in the application performance. / Master of Science
|
36 |
Automatic Scheduling of Compute Kernels Across Heterogeneous ArchitecturesLyerly, Robert Frantz 24 June 2014 (has links)
The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types of applications — traditional "big" CPUs (like the Intel Xeon) are optimized for low latency while other architectures (such as the NVidia Tesla K20x) are optimized for high-throughput. These architectures have different tradeoffs and different performance profiles, meaning fantastic performance gains for the right types of applications. However applications that are ill-suited for a given architecture may experience significant slowdown; therefore, it is imperative that applications are scheduled onto the correct processor.
In order to perform this scheduling, applications must be analyzed to determine their execution characteristics. Traditionally this application-to-hardware mapping was determined statically by the programmer. However, this requires intimate knowledge of the application and underlying architecture, and precludes load-balancing by the system. We demonstrate and empirically evaluate a system for automatically scheduling compute kernels by extracting program characteristics and applying machine learning techniques. We develop a machine learning process that is system-agnostic, and works for a variety of contexts (e.g. embedded, desktop/workstation, server). Finally, we perform scheduling in a workload-aware and workload-adaptive manner for these compute kernels. / Master of Science
|
37 |
Impact of Increased Cache Misses on Runtime Performance of MPX-enabled ProgramsSharma, Niti 10 June 2019 (has links)
Low level languages like C and C++ provide high performance and direct control over memory management. But these languages are prone to memory safety violations. Intel introduced a new ISA extension-Memory Protection Extension(MPX), a hardware-assisted full-stack solution, to protect against the memory safety violations. While MPX efficiently prevents memory errors like buffer overflows and out of bound memory accesses, it comes at the cost of high performance overheads. Also, the cache locality worsens in MPX protected applications.
In our research, we analyze if there is a correlation between increase in cache misses and runtime degradation in programs compiled with MPX support. We analyze 15 SPEC CPU benchmark programs for different input sizes on Windows platform, compiled with Intel's ICC compiler. We find that for input sizes train(medium) and ref(large), the average performance overheads are 140% and 144% respectively. We find that 5 out of 15 benchmarks do not have any runtime overheads and also, do not have any change in cache misses at any level. However for rest of the 10 benchmarks, we find a strong correlation between runtime overheads and cache misses overheads, with the correlation coefficients ranging from 0.8 to 0.36 for different input sizes. Based on our findings, we conclude that there is a direct correlation between runtime overheads and increase in cache misses. We also find that instructions overheads and runtime overheads have a positive correlation, with the coefficient values ranging from 0.7 to 0.33 for different input sizes. / Master of Science / Low level programming languages like C and C++ are primary choices to write low-level systems software such as operating systems, virtual machines, embedded software, and performance-critical applications. But these languages are considered as unsafe and prone to memory safety errors. Intel introduced a new technique- Memory Protection Extensions (MPX) to protect against these memory errors. But prior research found that applications supported with MPX have increased runtimes (slowdowns).
In our research, we analyze these slowdowns for different input sizes(medium and large) in 15 benchmark applications. Based on the input sizes, the average slowdowns range from 140% to 144%. We then examine if there is a correlation between increase in cache misses under MPX and the slowdowns. A hardware cache is a component that stores data so that future requests for that data can be served faster. Hence, cache miss is a state where the data requested for processing by a component or application is not found in the cache. Whenever a cache miss happen, the processor waits for the data to be fetched from the next cache level or from main memory before it can continue to execute. This wait influences the runtime performance of the application. Our evaluations find that 10 out of 15 applications which have increased runtimes, also have increase in cache misses. This shows a positive correlation between these two parameters. Along with that, we also found that increase in instruction size in MPX protected applications also has a direct correlation with the runtime degradation. We also quantify these relationships with a statistical measure called correlation coefficient.
|
38 |
Replication of Concurrent Applications in a Shared Memory MultikernelWen, Yuzhong 19 July 2016 (has links)
State Machine Replication (SMR) has become the de-facto methodology of building a replication based fault-tolerance system. Current SMR systems usually have multiple machines involved, each of the machines in the SMR system acts as the replica of others. However having multiple machines leads to more cost to the infrastructure, in both hardware cost and power consumption. For tolerating non-critical CPU and memory failure that will not crash the entire machine, there is no need to have extra machines to do the job.
As a result, intra-machine replication is a good fit for this scenario. However, current intra-machine replication approaches do not provide strong isolation among the replicas, which allows the faults to be propagated from one replica to another.
In order to provide an intra-machine replication technique with strong isolation, in this thesis we present a SMR system on a multi-kernel OS. We implemented a replication system that is capable of replicating concurrent applications on different kernel instances of a multi-kernel OS. Modern concurrent application can be deployed on our system with minimal code modification. Additionally, our system provides two different replication modes that allows the user to switch freely according to the application type.
With the evaluation of multiple real world applications, we show that those applications can be easily deployed on our system with 0 to 60 lines of code changes to the source code. From the performance perspective, our system only introduces 0.23\% to 63.39\% overhead compared to non-replicated execution. / Master of Science
|
39 |
A Compiler Framework to Support and Exploit Heterogeneous Overlapping-ISA Multiprocessor PlatformsJelesnianski, Christopher Stanisław 15 December 2015 (has links)
As the demand for ever increasingly powerful machines continues, new architectures are sought to be the next route of breaking past the brick wall that currently stagnates the performance growth of modern multi-core CPUs. Due to physical limitations, scaling single-core performance any further is no longer possible, giving rise to modern multi-cores. However, the brick wall is now limiting the scaling of general-purpose multi-cores. Heterogeneous-core CPUs have the potential to continue scaling by reducing power consumption through exploitation of specialized and simple cores within the same chip.
Heterogeneous-core CPUs join fundamentally different processors each which their own peculiar features, i.e., fast execution time, improved power efficiency, etc; enabling the building of versatile computing systems. To make heterogeneous platforms permeate the computer market, the next hurdle to overcome is the ability to provide a familiar programming model and environment such that developers do not have to focus on platform details. Nevertheless, heterogeneous platforms integrate processors with diverse characteristics and potentially a different Instruction Set Architecture (ISA), which exacerbate the complexity of the software. A brave few have begun to tread down the heterogeneous-ISA path, hoping to prove that this avenue will yield the next generation of super computers. However, many unforeseen obstacles have yet to be discovered.
With this new challenge comes the clear need for efficient, developer-friendly, adaptable system software to support the efforts of making heterogeneous-ISA the golden standard for future high-performance and general-purpose computing. To foster rapid development of this technology, it is imperative to put the proper tools into the hands of developers, such as application and architecture profiling engines, in order to realize the best heterogeneous-ISA platform possible with available technology. In addition, it would be in the best interest to create tools to be as "timeless" as possible to expose fundamental concepts industry could benefit from and adopt in future designs.
We demonstrate the feasibility of a compiler framework and runtime for an existing heterogeneous-ISA operating system (Popcorn Linux) for automatically scheduling compute blocks within an application on a given heterogeneous-ISA high-performance platform (in our case a platform built with Intel Xeon - Xeon Phi). With the introduced Profiler, Partitioner, and Runtime support, we prove to be able to automatically exploit the heterogeneity in an overlapping-ISA platform, being faster than native execution and other parallelism programming models.
Empirically evaluating our compiler framework, we show that application execution on Popcorn Linux can be up to 52% faster than the most performant native execution for Xeon or Xeon Phi. Using our compiler framework relieves the developer from manual scheduling and porting of applications, requiring only a single profiling run per application. / Master of Science
|
40 |
Measuring, modeling, and optimizing counterintuitive performance phenomena in power-scalable, parallel systemsChang, Hung-Ching 09 April 2015 (has links)
The demands of exascale computing systems and applications have pushed for a rapid, continual design paradigm coupled with increasing design complexities from the interaction between the application, the middleware, and the underlying system hardware, which forms a breeding ground for inefficiency. This work seeks to improve system efficiency by exposing the root causes of unexpected performance slowdowns (e.g., lower performance at higher processor speeds) that occur more frequently in power-scalable systems where raw processor speed varies. More precisely, we perform an exhaustive empirical study that conclusively shows that increasing processor speed often reduces performance and wastes energy. Our experimental work shows that the frequency of occurrence and magnitude of slowdowns grow with clock frequency and parallelism, indicating that such slowdowns will increasingly be observed with trends in processor and system design.
Performance speedups at lower frequencies (or slowdowns at higher frequencies) have been anecdotally observed in the prevailing literature since 2004, but no research has explained nor exploited this phenomenon. This work conclusively demonstrates that performance slowdowns during processor speedup phases can exceed 47% in common I/O workloads. Our hypothesis challenges (and ultimately debunks) a fundamental assumption in computer systems: faster processor speeds result in the same or better performance.
In this work, with the use of code and kernel instrumentation, exhaustive experiments, and deep insight into the inner workings of the Linux I/O subsystem, I overcome the aforementioned challenges of variance, complexity, and nondeterminism and identify the I/O resource contention as the root cause of the slowdowns during processor speedup. Specifically, such contention comes from the Linux kernel when the journaling block device (JBD) interacts with the ext3/4 file system that introduces file write delays and file synchronization delays. To fully explain how such I/O contention causes performance anomaly, I propose analytical models of resource contention among I/O threads to describe the root cause of the observed I/O slowdowns when processors speed up. To this end, I introduce LUC, a runtime system to limit the unintended consequences of power scaling and demonstrate the effectiveness of the LUC system for two critical parallel transaction-oriented workloads, including a mail server (varMail) and online transaction processing (oltp). / Ph. D.
|
Page generated in 0.0318 seconds