Global ETD Search

1	Retina: Cross-Layered Key-Value Store using Computational Storage Bikonda, Naga Sanjana 10 March 2022 (has links) Modern SSDs are getting faster and smarter with near-data computing capabilities. Due to their design choices, traditional key-value stores do not fully leverage these new storage devices. These key-value stores become CPU-bound even before fully utilizing the IO bandwidth. LSM or B+ tree-based key-value stores involve complex garbage collection and store sorted keys and complicated synchronization mechanisms. In this work, we propose a cross-layered key-value store named Retina that decouples the design to delegate control path manipulations to host CPU and data path manipulations to computational SSD to maximize performance and reduce compute bottlenecks. We employ many design choices not explored in other persistent key-value stores to achieve this goal. In addition to the cross-layered design paradigm, Retina introduces a new caching mechanism called Mirror cache, support for variable key-value pairs, and a novel version-based crash consistency model. By enabling all the design features, we equip Retina to reduce compute hotspots on the host CPU, take advantage of the on-storage accelerators to leverage the data locality on the computational storage, improve overall bandwidth and reduce the bandwidth net- work latencies. Thus when evaluated using YCSB, we observe the CPU utilization reduced by 4x and throughput performance improvement of 20.5% against the state-of-the-art for read-intensive workloads. / Master of Science / Modern secondary storage systems are providing an exponential increase in memory access speeds. In addition, new generation storage systems attach compute resources near data to offload computation to storage. Traditional datastore systems are lacking in performance when used with the new generation SSDs (Solid State Drive). The key reason is the SSDs are underutilized due to CPU bottlenecks. Due to design choices, conventional datastores incur expensive CPU tasks that cause the CPU to bottleneck even before the storage speeds are fully utilized. Thus, when attached to a modern SSD, conventional datastores will underutilize the storage resources. In this work, we propose a cross-layered key-value store named Retina that decouples the design to delegate control path manipulations to host CPU and data path manipulations to computational SSD to maximize performance and reduce compute bottlenecks. In addition to the cross-layered design paradigm, Retina introduces a new caching mechanism called Mirror cache and a novel version-based crash consistency model. By enabling all the design features, we equip Retina to reduce compute hotspots on the host CPU, take advantage of the on-storage accelerators to leverage the data locality on the computational storage and improve overall access speed. To evaluate Retina, we use throughput and CPU utilization as the comparison metric. We test our implementation with Yahoo Cloud Serving Benchmark, a popular datastore benchmark. We evaluate against RocksDB(the most widely adopted datastore) to enable fair performance comparison. In conclusion, we show that Retina key-value store improves the throughput performance by offloading logic to computational storage to reduce the CPU bottlenecks. Computational storage Key-Value store Crash consistency
2	Detecting Persistence Bugs from Non-volatile Memory Programs by Inferring Likely-correctness Conditions Fu, Xinwei 10 March 2022 (has links) Non-volatile main memory (NVM) technologies are revolutionizing the entire computing stack thanks to their storage-and-memory-like characteristics. The ability to persist data in memory provides a new opportunity to build crash-consistent software without paying a storage stack I/O overhead. A crash-consistent NVM program can recover back to a consistent state from a persistent NVM in the event of a software crash or a sudden power loss. In the presence of a volatile cache, data held in a volatile cache is lost after a crash. So NVM programming requires users to manually control the durability and the persistence ordering of NVM writes. To avoid performance overhead, developers have devised customized persistence mechanisms to enforce proper persistence ordering and atomicity guarantees, rendering NVM programs error-prone. The problem statement of this dissertation is how one can effectively detect persistence bugs from NVM programs. However, detecting persistence bugs in NVM programs is challenging because of the huge test space and the manual consistency validation required. The thesis of this dissertation is that we can detect persistence bugs from NVM programs in a scalable and automatic manner by inferring likely-correctness conditions from programs. A likely-correctness condition is a possible correctness condition, which is a condition a program must maintain to make the program crash-consistent. This dissertation proposes to infer two forms of likely-correctness conditions from NVM programs to detect persistence bugs. The first proposed solution is to infer likely-ordering and likely-atomicity conditions by analyzing program dependencies among NVM accesses. The second proposed solution is to infer likely-linearization points to understand a program's operation-level behavior. Using these two forms of likely-correctness conditions, we test only those NVM states and thread interleavings that violate the likely-correctness conditions. This significantly re- duces the test space required to examine. We then leverage the durable linearizability model to validate consistency automatically without manual consistency validation. In this way, we can detect persistence bugs from NVM programs in a scalable and automatic manner. In total, we detect 47 (36 new) persistence correctness bugs and 158 (113 new) persistence performance bugs from 20 single-threaded NVM programs. Additionally, we detect 27 (15 new) persistence correctness bugs from 12 multi-threaded NVM data structures. / Doctor of Philosophy / Non-volatile main memory (NVM) technologies provide a new opportunity to build crash-consistent software without incurring a storage stack I/O overhead. A crash-consistent NVM program can recover back to a consistent state from a persistent NVM in the event of a software crash or a sudden power loss. NVM has been and will further be used in various computing services integral to our daily life, ranging from data centers to high-performance computing, machine learning, and banking. Building correct and efficient crash-consistent NVM software is therefore crucial. However, developing a correct and efficient crash-consistent NVM program is challenging as developers are now responsible for manually controlling cacheline evictions in NVM programming. Controlling cacheline evictions makes NVM programming error-prone, and detecting persistence bugs that lead to inconsistent NVM states in NVM programs is an arduous task. The thesis of this dissertation is that we can detect persistence bugs from NVM programs in a scalable and automatic manner by inferring likely-correctness conditions from programs. This dissertation proposes to infer two forms of likely-correctness conditions from NVM programs to detect persistence bugs, i.e., likely-ordering/atomicity conditions and likely-linearization points. In total, we detect 47 (36 new) persistence correctness bugs and 158 (113 new) persistence performance bugs from 20 single-threaded NVM programs. Additionally, we detect 27 (15 new) persistence correctness bugs from 12 multi-threaded NVM data structures. Non-Volatile Memory Crash Consistency Durable Linearizability Testing Debugging
3	<b>Compiler and Architecture Co-design for Reliable Computing</b> Jianping Zeng (19199323) 24 July 2024 (has links) <p dir="ltr">Reliability against errors, such as soft errors—transient bit flips in transistors caused by energetic particle strikes—and crash inconsistency arising from power failure, is as crucial as performance and power efficiency for a wide range of computing devices, from embedded systems to datacenters. If not properly addressed, these errors can lead to massive financial losses and even endanger human lives. Furthermore, the dynamic nature of modern computing workloads complicates the implementation of reliable systems as the likelihood and impact of these errors increase. Consequently, system designers often face a dilemma: sacrificing reliability for performance and cost-effectiveness or incurring high manufacturing and/or run-time costs to maintain high system dependability. This trade-off can result in reduced availability and increased vulnerability to errors when reliability is not prioritized or escalated costs when it is.</p><p dir="ltr">In light of this, this dissertation, for the first time, demonstrates that with a synergistic compiler and architecture co-design, it is possible to achieve reliability while maintaining high performance and low hardware cost. We begin by illustrating how compiler/architecture co-design achieves near-zero-overhead soft error resilience for embedded cores (Chapter 2). Next, we introduce ReplayCache (Chapter 3), a software-only approach that ensures crash consistency for energy harvesting systems (backed with embedded cores) and outperforms the state-of-the-art by 9x. Apart from embedded cores, reliability for server-class cores is more vital due to their widespread adoption in performance-critical environments. With that in mind, we then propose VeriPipe (Chapter 4), which showcases how a straightforward microarchitectural technique can achieve near-zero-overhead soft error resilience for server-class cores with a storage overhead of just three registers and one countdown timer. Finally, we present two approaches to achieving performant crash consistency for server-class cores by leveraging existing dynamic register renaming in out-of-order cores (Chapter 5) and repurposing Intel’s non-temporal path (Chapter 6), respectively. Through these innovations, this dissertation paves the way for more reliable and efficient computing systems, ensuring that reliability does not come at the cost of performance degradation or hardware complexity.</p> Compiler optimizations Computer architecture. Reliability and Crash Consistency

1

Page generated in 0.0847 seconds