從文件系統設計的角度,我們探索了利用重復數據删除技術來消除硬盤陣列存儲設備當中的重復數據。我們提出了ScaleDFS,一個重復數據删除技術的文件系統, 旨在硬盤陣列存儲設備上實現可擴展的吞吐性能。ScaleDFS有三個主要的特點。第一,利用多核CPU並行計算出用作識別重復數據的加密指紋,以提高寫入速度。第二,緩存曾經讀取過的重復數據塊,以顯著提高讀取速度。第三,優化用作查找指紋的內存數據結構,以更加節省內存。ScaleDFS是一個以Linux系統內核模塊開發的,與POSIX兼容的,可以用在一般低成本硬件配置上的文件系統。我們進行了一系列的微觀性能測試,以及用42個不同版本的Linux虛擬鏡像文件進行了宏觀性能測試。我們證實,ScaleDFS在磁盤陣列上比目前已有的開源重復數據删除文件系統擁有更好的讀寫性能。 / We explore the use of deduplication for eliminating the storage of redundant data in RAID from a file-system design perspective. We propose ScaleDFS, a deduplication file system that seeks to achieve scalable read/write throughput in RAID. ScaleDFS is built on three novel design features. First, we improve the write throughput by exploiting multiple CPU cores to parallelize the processing of the cryptographic fingerprints that are used to identify redundant data. Second, we improve the read throughput by specifically caching in memory the recently read blocks that have been deduplicated. Third, we reduce the memory usage by enhancing the data structures that are used for fingerprint lookups. ScaleDFS is implemented as a POSIX-compliant, kernel-space driver module that can be deployed in commodity hardware configurations. We conduct microbenchmark experiments using synthetic workloads, and macrobenchmark experiments using a dataset of 42 VM images of different Linux distributions. We show that ScaleDFS achieves higher read/write throughput than existing open-source deduplication file systems in RAID. / Detailed summary in vernacular field only. / Ma, Mingcao. / "October 2012." / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 39-42). / Abstracts also in Chinese. / Chapter 1 --- Introduction --- p.2 / Chapter 2 --- Literature Review --- p.5 / Chapter 2.1 --- Backup systems --- p.5 / Chapter 2.2 --- Use of special hardware --- p.6 / Chapter 2.3 --- Scalable storage --- p.6 / Chapter 2.4 --- Inline DFSs --- p.6 / Chapter 2.5 --- VM image storage with deduplication --- p.7 / Chapter 3 --- ScaleDFS Background --- p.8 / Chapter 3.1 --- Spatial Locality of Fingerprint Placement --- p.9 / Chapter 3.2 --- Prefetching of Fingerprint Stores --- p.12 / Chapter 3.3 --- Journaling --- p.13 / Chapter 4 --- ScaleDFS Design --- p.15 / Chapter 4.1 --- Parallelizing Deduplication --- p.15 / Chapter 4.2 --- Caching Read Blocks --- p.17 / Chapter 4.3 --- Reducing Memory Usage --- p.17 / Chapter 5 --- Implementation --- p.20 / Chapter 5.1 --- Choice of Hash Function --- p.20 / Chapter 5.2 --- OpenStack Deployment --- p.21 / Chapter 6 --- Experiments --- p.23 / Chapter 6.1 --- Microbenchmarks --- p.23 / Chapter 6.2 --- OpenStack Deployment --- p.28 / Chapter 6.3 --- VM Image Operations in a RAID Setup --- p.33 / Chapter 7 --- Conclusions and FutureWork --- p.38 / Bibliography --- p.39
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328748 |
Date | January 2013 |
Contributors | Ma, Mingcao., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography |
Format | electronic resource, electronic resource, remote, 1 online resource ([1], viii, 42 leaves) : ill. |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.002 seconds