<p dir="ltr">Reliability against errors, such as soft errors—transient bit flips in transistors caused by energetic particle strikes—and crash inconsistency arising from power failure, is as crucial as performance and power efficiency for a wide range of computing devices, from embedded systems to datacenters. If not properly addressed, these errors can lead to massive financial losses and even endanger human lives. Furthermore, the dynamic nature of modern computing workloads complicates the implementation of reliable systems as the likelihood and impact of these errors increase. Consequently, system designers often face a dilemma: sacrificing reliability for performance and cost-effectiveness or incurring high manufacturing and/or run-time costs to maintain high system dependability. This trade-off can result in reduced availability and increased vulnerability to errors when reliability is not prioritized or escalated costs when it is.</p><p dir="ltr">In light of this, this dissertation, for the first time, demonstrates that with a synergistic compiler and architecture co-design, it is possible to achieve reliability while maintaining high performance and low hardware cost. We begin by illustrating how compiler/architecture co-design achieves near-zero-overhead soft error resilience for embedded cores (Chapter 2). Next, we introduce ReplayCache (Chapter 3), a software-only approach that ensures crash consistency for energy harvesting systems (backed with embedded cores) and outperforms the state-of-the-art by 9x. Apart from embedded cores, reliability for server-class cores is more vital due to their widespread adoption in performance-critical environments. With that in mind, we then propose VeriPipe (Chapter 4), which showcases how a straightforward microarchitectural technique can achieve near-zero-overhead soft error resilience for server-class cores with a storage overhead of just three registers and one countdown timer. Finally, we present two approaches to achieving performant crash consistency for server-class cores by leveraging existing dynamic register renaming in out-of-order cores (Chapter 5) and repurposing Intel’s non-temporal path (Chapter 6), respectively. Through these innovations, this dissertation paves the way for more reliable and efficient computing systems, ensuring that reliability does not come at the cost of performance degradation or hardware complexity.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/26356696 |
Date | 24 July 2024 |
Creators | Jianping Zeng (19199323) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/_b_Compiler_and_Architecture_Co-design_for_Reliable_Computing_b_/26356696 |
Page generated in 0.0022 seconds