1 |
Retina: Cross-Layered Key-Value Store using Computational StorageBikonda, Naga Sanjana 10 March 2022 (has links)
Modern SSDs are getting faster and smarter with near-data computing capabilities. Due to their design choices, traditional key-value stores do not fully leverage these new storage devices. These key-value stores become CPU-bound even before fully utilizing the IO bandwidth. LSM or B+ tree-based key-value stores involve complex garbage collection and store sorted keys and complicated synchronization mechanisms. In this work, we propose a cross-layered key-value store named Retina that decouples the design to delegate control path manipulations to host CPU and data path manipulations to computational SSD to maximize performance and reduce compute bottlenecks. We employ many design choices not explored in other persistent key-value stores to achieve this goal. In addition to the cross-layered design paradigm, Retina introduces a new caching mechanism called Mirror cache, support for variable key-value pairs, and a novel version-based crash consistency model. By enabling all the design features, we equip Retina to reduce compute hotspots on the host CPU, take advantage of the on-storage accelerators to leverage the data locality on the computational storage, improve overall bandwidth and reduce the bandwidth net- work latencies. Thus when evaluated using YCSB, we observe the CPU utilization reduced by 4x and throughput performance improvement of 20.5% against the state-of-the-art for read-intensive workloads. / Master of Science / Modern secondary storage systems are providing an exponential increase in memory access speeds. In addition, new generation storage systems attach compute resources near data to offload computation to storage. Traditional datastore systems are lacking in performance when used with the new generation SSDs (Solid State Drive). The key reason is the SSDs are underutilized due to CPU bottlenecks. Due to design choices, conventional datastores incur expensive CPU tasks that cause the CPU to bottleneck even before the storage speeds are fully utilized. Thus, when attached to a modern SSD, conventional datastores will underutilize the storage resources. In this work, we propose a cross-layered key-value store named Retina that decouples the design to delegate control path manipulations to host CPU and data path manipulations to computational SSD to maximize performance and reduce compute bottlenecks. In addition to the cross-layered design paradigm, Retina introduces a new caching mechanism called Mirror cache and a novel version-based crash consistency model. By enabling all the design features, we equip Retina to reduce compute hotspots on the host CPU, take advantage of the on-storage accelerators to leverage the data locality on the computational storage and improve overall access speed. To evaluate Retina, we use throughput and CPU utilization as the comparison metric. We test our implementation with Yahoo Cloud Serving Benchmark, a popular datastore benchmark. We evaluate against RocksDB(the most widely adopted datastore) to enable fair performance comparison. In conclusion, we show that Retina key-value store improves the throughput performance by offloading logic to computational storage to reduce the CPU bottlenecks.
|
2 |
Offloading the sampling stage of GNN training to smart storageKritharakis, Emmanouil 16 February 2024 (has links)
Graph Neural Networks (GNNs) have emerged as a robust model for machine learning, addressing complex graph-structured data, in contrast to traditional deep learning techniques primarily used for image and text data. However, the scalability of GNNs on large graphs with billions of nodes and trillions of edges remains a challenge. Existing approaches propose partitioning across distributed systems or employing single machines with GPU caching techniques during the sampling phase. While the former encounters issues related to maintenance costs and increased latency, the latter faces bottlenecks in data movement, resulting in inefficient resource utilization and suboptimal training. To address the limitations of single-machine techniques, we direct our attention to the sampling stage and introduce a novel approach utilizing the Samsung smartSSD computational storage device. This approach significantly reduces unnecessary data movement overhead and minimizes overall training time. Computational storage devices enable the offloading of computations to their computational units. In our method, we calculate the required sampling subset on its Field programmable gate array (FPGA) of the smartSSD and transfer it to the host DRAM. Our experimental section illustrates that our proposed solution, compared to the baseline MMAP sampling method, achieves a speedup of up to 9 times in terms of sampling time and 5 times in host DRAM utilization.
|
3 |
Exploring the Boundaries of Operating System in the Era of Ultra-fast Storage TechnologiesRamanathan, Madhava Krishnan 24 May 2023 (has links)
The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. However, the OS storage stack has not been evolving fast enough to keep up with these new ultra-fast storage hardware. Hence, the latency due user-kernel context switch caused by system calls and hardware interrupts is no longer negligible as presumed in the era of slower high latency hard disks. Further, the OS storage stack is not designed with multi-core scalability in mind; so with CPU core count continuously increasing, the OS storage stack particularly the Virtual Filesystem (VFS) and filesystem layer are increasingly becoming a scalability bottleneck.
Applications bypass the kernel (kernel-bypass storage stack) completely to eliminate the storage stack from becoming a performance and scalability bottleneck. But this comes at the cost of programmability, isolation, safety, and reliability. Moreover, scalability bottlenecks in the filesystem can not be addressed by simply moving the filesystem to the userspace. Overall, while designing a kernel-bypass storage stack looks obvious and promising there are several critical challenges in the aspects of programmability, performance, scalability, safety, and reliability that needs to be addressed to bypass the traditional OS storage stack.
This thesis proposes a series of kernel-bypass storage techniques designed particularly for fast memory-centric storage. First, this thesis proposes a scalable persistent transactional memory (PTM) programming model to address the programmability and multi-core scalability challenges. Next, this thesis proposes techniques to make the PTM memory safe and fault tolerant. Further, this thesis also proposes a kernel-bypass programming framework to port legacy DRAM-based in-memory database applications to run on persistent memory-centric storage. Finally, this thesis explores an application-driven approach to address the CPU side and storage side bottlenecks in the deep learning model training by proposing a kernel-bypass programming framework to move to compute closer to the storage. Overall, the techniques proposed in this thesis will be a strong foundation for the applications to adopt and exploit the emerging ultra-fast storage technologies without being bottlenecked by the traditional OS storage stack. / Doctor of Philosophy / The storage hardware is evolving at a rapid pace to keep up with the exponential rise of data consumption. Recently, ultra-fast storage technologies such as nano-second scale byte- addressable Non-Volatile Memory (NVM), micro-second scale SSDs are being commercialized. The Operating System (OS) has been the gateway for the applications to access and manage the storage hardware. Unfortunately, the OS storage stack that is designed with slower storage technologies (e.g., hard disk drives) becomes a performance, scalability, and programmability bottleneck for the emerging ultra-fast storage technologies. This has created a large gap between the storage hardware advancements and the system software support for such emerging storage technologies. Consequently, applications are constrained by the limitations of the OS storage stack when they intend to explore these emerging storage technologies.
In this thesis, we propose a series of novel kernel-bypass storage stack designs to address the performance, scalability, and programmability limitations of the conventional OS storage stack. The kernel-bypass storage stack proposed in this thesis is carefully designed with ultra-fast modern storage hardware in mind. Application developers can leverage the kernel-bypass techniques proposed in this thesis to develop new applications or port the legacy applications to use the emerging ultra-fast storage technologies without being constrained by the limitations of the conventional OS storage stack.
|
Page generated in 0.1323 seconds