1 |
Flashing up the storage hierarchyKoltsidas, Ioannis January 2010 (has links)
The focus of this thesis is on systems that employ both flash and magnetic disks as storage media. Considering the widely disparate I/O costs of flash disks currently on the market, our approach is a cost-aware one: we explore techniques that exploit the I/O costs of the underlying storage devices to improve I/O performance. We also study the asymmetric I/O properties of magnetic and flash disks and propose algorithms that take advantage of this asymmetry. Our work is geared towards database systems; however, most of the ideas presented in this thesis can be generalised to any data-intensive application. For the case of low-end, inexpensive flash devices with large capacities, we propose using them at the same level of the memory hierarchy as magnetic disks. In such setups, we study the problem of data placement, that is, on which type of storage medium each data page should be stored. We present a family of online algorithms that can be used to dynamically decide the optimal placement of each page. Our algorithms adapt to changing workloads for maximum I/O efficiency. We found that substantial performance benefits can be gained with such a design, especially for queries touching large sets of pages with read-intensive workloads. Moving one level higher in the storage hierarchy, we study the problem of buffer allocation in databases that store data across multiple storage devices. We present our novel approach to per-device memory allocation, under which both the I/O costs of the storage devices and the cache behaviour of the data stored on each medium determine the size of the main memory buffers that will be allocated to each device. Towards informed decisions, we found that the ability to predict the cache behaviour of devices under various cache sizes is of paramount importance. In light of this, we study the problem of efficiently tracking the hit ratio curve for each device and introduce a lowoverhead technique that provides high accuracy. The price and performance characteristics of high-end flash disks make them perfectly suitable for use as caches between the main memory and the magnetic disk(s) of a storage system. In this context, we primarily focus on the problem of deciding which data should be placed in the flash cache of a system: how the data flows from one level of the memory hierarchy to the others is crucial for the performance of such a system. Considering such decisions, we found that the I/O costs of the flash cache play a major role. We also study several implementation issues such as the optimal size of flash pages and the properties of the page directory of a flash cache. Finally, we explore sorting in external memory using external merge-sort, as the latter employs access patterns that can take full advantage of the I/O characteristics of flash memory. We study the problem of sorting hierarchical data, as such is necessary for a wide variety of applications including archiving scientific data and dealing with large XML datasets. The proposed algorithm efficiently exploits the hierarchical structure in order to minimize the number of disk accesses and optimise the utilization of available memory. Our proposals are not specific to sorting over flash memory: the presented techniques are highly efficient over magnetic disks as well.
|
2 |
Reliable Writeback for Client-side Flash CachesQin, Dai 04 July 2014 (has links)
Modern data centers are increasingly using shared storage solutions for ease of
management. Data is cached on the client side on inexpensive and high-capacity
flash devices, helping improve performance and reduce contention on the storage
side. Currently, write-through caching is used because it ensures consistency
and durability under client failures, but it offers poor performance for
write-heavy workloads.
In this work, we propose two write-back based caching policies, called
write-back flush and write-back persist, that provide strong reliability
guarantees, under two different client failure models. These policies rely on
storage applications such as file systems and databases issuing write barriers
to persist their data, because these barriers are the only reliable method for
storing data durably on storage media. Our evaluation shows that these policies
achieve performance close to write-back caching, while providing stronger
guarantees than vanilla write-though caching.
|
3 |
Towards Improving Endurance and Performance in Flash Storage ClustersSalman, Mohammed 22 June 2017 (has links)
NAND flash-based Solid State Devices (SSDs) provide high performance and energy efficiency and at the same time their capacity continues to grow at an unprecedented rate. As a result, SSDs are increasingly being used in high end computing systems such as supercomputing clusters. However, one of the biggest impediments to large scale deployments is the limited erase cycles in flash devices. The natural skewness in I/O workloads can results in Wear imbalance which has a significant impact on the reliability, performance as well as lifetime of the cluster. Current load balancers for storage systems are designed with a critical goal to optimize performance. Data migration techniques are used to handle wear balancing but they suffer from a huge metadata overhead and extra erasures. To overcome these problems, we propose an endurance-aware write off-loading technique (EWO) for balancing the wear across different flash-based servers with minimal extra cost. Extant wear leveling algorithms are designed for a single flash device. With the use of flash devices in enterprise server storage, the wear leveling algorithms need to take into account the variance of the wear at the cluster level. EWO exploits the out-of-place update feature of flash memory by off- loading the writes across flash servers instead of moving data across flash servers to mitigate extra-wear cost. To evenly distribute erasures to flash servers, EWO off-loads writes from the flash servers with high erase cycles to the ones with low erase cycles by first quantitatively calculating the amount of writes based on the frequency of garbage collection. To reduce metadata overhead caused by write off-loading, EWO employs a hot-slice off-loading policy to explore the trade-offs between extra-wear cost and metadata overhead. Evaluation on a 50 to 200 node SSD cluster shows that EWO outperforms data migration based wear balancing techniques, reducing up to 70% aggregate extra erase cycles while improving the write performance by up to 20% compared to data migration. / Master of Science / Exponential increase of Internet traffic mainly from emerging applications like streaming video, social networking and cloud computing has created the need for more powerful data centers. Datacenters are composed of three main components- compute, network and storage. While there have been rapid advancements in the field of compute and networking, storage technologies have not advanced as much in comparison. Traditionally, storage consists of magnetic disks with magnetic parts which are slow and consume more power. However, Solid State Disks (SSDs) offer both better performance and lower energy. With the price of these SSDs being comparable to magnetic disks, they are increasingly being used in storage clusters. However, one of the biggest drawback of SSDs is the limited program erase (P/E) cycles. There is a need to ensure the uniform wearing of blocks in a SSD. While solutions for this do exist for a single SSD device, usage of these devices in a cluster poses new problems.
This work introduces EWO which is a wear balancing algorithm that balances wear in a flash storage cluster. It carried out load balancing in a flash storage cluster while incorporating the wear characteristics as a cost function. EWO carries out lazy data migration also referred to as write offloading. To alleviate the metadata overhead, the migration is performed at the slice level.
To evaluate EWO, a distributed key value store emulator was built to simulate the behavior of an actual flash storage cluster.
|
4 |
Telemetry Recorders and Disruptive TechnologiesKortick, David 10 1900 (has links)
ITC/USA 2009 Conference Proceedings / The Forty-Fifth Annual International Telemetering Conference and Technical Exhibition / October 26-29, 2009 / Riviera Hotel & Convention Center, Las Vegas, Nevada / Telemetry data recorders are not immune to the effects that a number of disruptive technologies have had on the telemetry industry. Data recorder designs today make use of data buses, storage types and graphical user interfaces that are constantly evolving based on the advances of personal computer and consumer electronics technologies. Many of these recorders use embedded designs that integrate disruptive technologies such as PCI Express for realtime data and signal processing, SATA interfaces for data storage and touchscreen technologies to provide an intuitive operator interface. Solid state drives also play a larger role in the latest recorder designs. This paper will explore the effects of these technologies on the latest telemetry recorders in terms of the benefits to the users, cost of implementation, obsolescence management, and integration considerations. The implications of early adoption of disruptive technologies will also be reviewed.
|
5 |
Data Processing Techniques on Modern Hardware ArchitecturesTsirogiannis, Dimitrios 31 August 2011 (has links)
The last decade has been characterized by radical changes in the computing landscape. We have witnessed the advent of multi-core processors, flash-based storage systems and the proliferation of scale out architectures, such as map-reduce-based systems and massively parallel databases. Although data management systems have embraced modern hardware technologies to some extent, they have not realized
their full potential.
The goal of this thesis is two-fold. Primarily, it demonstrates the staggering potential for performance improvement offered by modern hardware architectures and, then, proposes how data management
systems must alter in order to realize this potential. Additionally, this thesis demonstrates that utilizing modern hardware architectures is important both for performance and energy-efficiency. Towards this goal, we propose query processing and indexing techniques for chip multiprocessors and we analyze the trade-offs of executing complex database queries on modern processor technologies. Subsequently, we propose query processing methods tailored to flash-based storage systems. Finally, we analyze the power consumption of database systems and we reveal opportunities for improving their
energy efficiency.
|
6 |
Data Processing Techniques on Modern Hardware ArchitecturesTsirogiannis, Dimitrios 31 August 2011 (has links)
The last decade has been characterized by radical changes in the computing landscape. We have witnessed the advent of multi-core processors, flash-based storage systems and the proliferation of scale out architectures, such as map-reduce-based systems and massively parallel databases. Although data management systems have embraced modern hardware technologies to some extent, they have not realized
their full potential.
The goal of this thesis is two-fold. Primarily, it demonstrates the staggering potential for performance improvement offered by modern hardware architectures and, then, proposes how data management
systems must alter in order to realize this potential. Additionally, this thesis demonstrates that utilizing modern hardware architectures is important both for performance and energy-efficiency. Towards this goal, we propose query processing and indexing techniques for chip multiprocessors and we analyze the trade-offs of executing complex database queries on modern processor technologies. Subsequently, we propose query processing methods tailored to flash-based storage systems. Finally, we analyze the power consumption of database systems and we reveal opportunities for improving their
energy efficiency.
|
7 |
Informed storage management for mobile platformsKim, Hyojun 22 August 2012 (has links)
Storage devices are rapidly changing, and we need to adapt the OS storage software stack to keep up with the changes.
Such a re-evaluation of the storage software stack is especially required for mobile platforms because they are relying on inexpensive flash storage devices having very different performance characteristics from the familiar hard disk.In this thesis work, we first show the importance of storage in mobile platforms; contrary to conventional wisdom, we find evidence that storage is a significant contributor to application performance on mobile devices.
Then, we explore the solution space for flash storage;
user-level library for selective logging, host-side write buffering layer, and OS buffer replacement scheme for flash storage have been studied.
Finally, we build an integrated solution for smartphone storage, named Fjord. In the Fjord study, we re-design logging and RAM buffering solutions for smartphones, and also propose fine-grained reliability control mechanisms. We prove that non-volatile logging can improve storage performance remarkably. Understanding the characteristics of cloud-backed applications and controlling the reliability constraint for chosen cloud-backed applications can achieve additional significant performance gain.We implement and evaluate our solution on a real Android smartphone, and demonstrate significant performance gains for everyday apps on such platforms.
|
8 |
Gestion efficace et partage sécurisé des traces de mobilité / Efficient management and secure sharing of mobility tracesTon That, Dai Hai 29 January 2016 (has links)
Aujourd'hui, les progrès dans le développement d'appareils mobiles et des capteurs embarqués ont permis un essor sans précédent de services à l'utilisateur. Dans le même temps, la plupart des appareils mobiles génèrent, enregistrent et de communiquent une grande quantité de données personnelles de manière continue. La gestion sécurisée des données personnelles dans les appareils mobiles reste un défi aujourd’hui, que ce soit vis-à-vis des contraintes inhérentes à ces appareils, ou par rapport à l’accès et au partage sûrs et sécurisés de ces informations. Cette thèse adresse ces défis et se focalise sur les traces de localisation. En particulier, s’appuyant sur un serveur de données relationnel embarqué dans des appareils mobiles sécurisés, cette thèse offre une extension de ce serveur à la gestion des données spatio-temporelles (types et operateurs). Et surtout, elle propose une méthode d'indexation spatio-temporelle (TRIFL) efficace et adaptée au modèle de stockage en mémoire flash. Par ailleurs, afin de protéger les traces de localisation personnelles de l'utilisateur, une architecture distribuée et un protocole de collecte participative préservant les données de localisation ont été proposés dans PAMPAS. Cette architecture se base sur des dispositifs hautement sécurisés pour le calcul distribué des agrégats spatio-temporels sur les données privées collectées. / Nowadays, the advances in the development of mobile devices, as well as embedded sensors have permitted an unprecedented number of services to the user. At the same time, most mobile devices generate, store and communicate a large amount of personal information continuously. While managing personal information on the mobile devices is still a big challenge, sharing and accessing these information in a safe and secure way is always an open and hot topic. Personal mobile devices may have various form factors such as mobile phones, smart devices, stick computers, secure tokens or etc. It could be used to record, sense, store data of user's context or environment surrounding him. The most common contextual information is user's location. Personal data generated and stored on these devices is valuable for many applications or services to user, but it is sensitive and needs to be protected in order to ensure the individual privacy. In particular, most mobile applications have access to accurate and real-time location information, raising serious privacy concerns for their users.In this dissertation, we dedicate the two parts to manage the location traces, i.e. the spatio-temporal data on mobile devices. In particular, we offer an extension of spatio-temporal data types and operators for embedded environments. These data types reconcile the features of spatio-temporal data with the embedded requirements by offering an optimal data presentation called Spatio-temporal object (STOB) dedicated for embedded devices. More importantly, in order to optimize the query processing, we also propose an efficient indexing technique for spatio-temporal data called TRIFL designed for flash storage. TRIFL stands for TRajectory Index for Flash memory. It exploits unique properties of trajectory insertion, and optimizes the data structure for the behavior of flash and the buffer cache. These ideas allow TRIFL to archive much better performance in both Flash and magnetic storage compared to its competitors.Additionally, we also investigate the protect user's sensitive information in the remaining part of this thesis by offering a privacy-aware protocol for participatory sensing applications called PAMPAS. PAMPAS relies on secure hardware solutions and proposes a user-centric privacy-aware protocol that fully protects personal data while taking advantage of distributed computing. For this to be done, we also propose a partitioning algorithm an aggregate algorithm in PAMPAS. This combination drastically reduces the overall costs making it possible to run the protocol in near real-time at a large scale of participants, without any personal information leakage.
|
9 |
Efficient Usage Of Flash Memories In High Performance ScenariosSrimugunthan, * 10 1900 (has links) (PDF)
New PCI-e flash cards and SSDs supporting over 100,000 IOPs are now available, with several usecases in the design of a high performance storage system. By using an array of flash chips, arranged in multiple banks, large capacities are achieved. Such multi-banked architecture allow parallel read, write and erase operations. In a raw PCI-e flash card, such parallelism is directly available to the software layer. In addition, the devices have restrictions such as, pages within a block can only be written sequentially. The devices also have larger minimum write sizes (>4KB). Current flash translation layers (FTLs) in Linux are not well suited for such devices due to the high device speeds, architectural restrictions as well as other factors such as high lock contention. We present a FTL for Linux that takes into account the hardware restrictions, that also exploits the parallelism to achieve high speeds. We also consider leveraging the parallelism for garbage collection by scheduling the garbage collection activities on idle banks. We propose and evaluate an adaptive method to vary the amount of garbage collection according to the current I/O load on the device.
For large scale distributed storage systems, flash memories are an excellent choice because flash memories consume less power, take lesser floor space for a target throughput and provide faster access to data. In a traditional distributed filesystem, even distribution is required to ensure load-balancing, balanced space utilisation and failure tolerance. In the presence of flash memories, in addition, we should also ensure that the numbers of writes to these different flash storage nodes are evenly distributed, to ensure even wear of flash storage nodes, so that unpredictable failures of storage nodes are avoided. This requires that we distribute updates and do garbage collection, across the flash storage nodes. We have motivated the distributed wearlevelling problem considering the replica placement algorithm for HDFS. Viewing the wearlevelling across flash storage nodes as a distributed co-ordination problem, we present an alternate design, to reduce the message communication cost across participating nodes. We demonstrate the effectiveness of our design through simulation.
|
Page generated in 0.064 seconds