Global ETD Search

1	Optimizing Hierarchical Storage Management For Database System Liu, Xin 22 May 2014 (has links) Caching is a classical but effective way to improve system performance. To improve system performance, servers, such as database servers and storage servers, contain significant amounts of memory that act as a fast cache. Meanwhile, as new storage devices such as flash-based solid state drives (SSDs) are added to storage systems over time, using the memory cache is not the only way to improve system performance. In this thesis, we address the problems of how to manage the cache of a storage server and how to utilize the SSD in a hybrid storage system. Traditional caching policies are known to perform poorly for storage server caches. One promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache. Previous hinting approaches are ad hoc, in that a predefined reaction to specific types of hints is hard-coded into the caching policy. With ad hoc approaches, it is difficult to ensure that the best hints are being used, and it is difficult to accommodate multiple types of hints and multiple client applications. In this thesis, we propose CLient-Informed Caching (CLIC), a generic hint-based technique for managing storage server caches. CLIC automatically interprets hints generated by storage clients and translates them into a server caching policy. It does this without explicit knowledge of the application-specific hint semantics. We demonstrate using trace-based simulation of database workloads that CLIC outperforms hint-oblivious and state-of-the-art hint-aware caching policies. We also demonstrate that the space required to track and interpret hints is small. SSDs are becoming a part of the storage system. Adding SSD to a storage system not only raises the question of how to manage the SSD, but also raises the question of whether current buffer pool algorithms will still work effectively. We are interested in the use of hybrid storage systems, consisting of SSDs and hard disk drives (HDD), for database management. We present cost-aware replacement algorithms for both the DBMS buffer pool and the SSD. These algorithms are aware of the different I/O performance of HDD and SSD. In such a hybrid storage system, the physical access pattern to the SSD depends on the management of the DBMS buffer pool. We studied the impact of the buffer pool caching policies on the access patterns of the SSD. Based on these studies, we designed a caching policy to effectively manage the SSD. We implemented these algorithms in MySQL's InnoDB storage engine and used the TPC-C workload to demonstrate that these cost-aware algorithms outperform previous algorithms.
2	Novel Methods for Improving Performance and Reliability of Flash-Based Solid State Storage System Guo, Jiayang 29 May 2018 (has links) No description available. Computer Science
3	On Performance Optimization and System Design of Flash Memory based Solid State Drives in the Storage Hierarchy Chen, Feng 28 September 2010 (has links) No description available. Computer Science solid state drive flash memory hard disk drive hybrid storage system
4	Convertisseur haut rendement à dimensionnement réduit pour batterie hybridée puissance/énergie de véhicule électrique : Principe de source de courant contrôlée / Reduced sizing converter for hybrid batteries power / energy of electric vehicle : Controlled current source principle Allali, Nicolas 12 December 2016 (has links) Ce mémoire présente une solution de conversion d’énergie permettant de couplerdeux sources de tension électrique similaire à dimensionnement réduit. Appliqué à un système de stockage hybridé puissance/énergie pour véhicule électrique, cette solution présente un compromis de coût de production, de masse et de rendement énergétique meilleur que celui des hacheurs classiques. Nous proposons en réponse à cette problématique l’utilisation d’une source de courant contrôlée placée en série avec les deux sources de tension pour réaliser leur couplage. Une structure de convertisseur permettant ce couplage est étudiée, puis comparée à une structure classique, et enfin réalisée au travers d’un démonstrateur à échelle de tension réduite afin de mettre en évidence ses avantages et inconvénients / This thesis deals with the development of a power conversion solution, allowing for the coupling of two similar voltage sources. Applied to a combined battery storage sys-tem power/energy for electric vehicles, the proposed solution presents a compromise between production costs, mass and energy performance, providing a better solution than those currently in existence. As such, the two voltage sources are coupled in series with a controlled current source. A structure of the converter allowing the coupling is studied and compared to a classic structure. Finally a demonstrator on a reduced voltage scale has been realized and shows advantages of this converter solution. Stockage hybride Electronique de puissance Hacheur Véhicule électrique Dimensionnement réduit Source de courant Hybrid storage Power electronics DC/DC converter Electric vehicle Reduced sizing Current source
5	Energy savings and performance improvements with SSDs in the Hadoop Distributed File System / Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System Polato, Ivanilton 29 August 2016 (has links) Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups / Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução Armazenamento híbrido Computação verde Discos de estado sólido Distributed file systems Eficiência energética Energy efficiency Green computing Hadoop Hadoop HDFS HDFS Hybrid storage Parallel file systems Sistema de arquivos distribuído Sistemas de arquivos paralelo Solid-state disk SSDs SSDs
6	Placement autonomique de machines virtuelles sur un système de stockage hybride dans un cloud IaaS / Autonomic virtual machines placement on hybrid storage system in IaaS cloud Ouarnoughi, Hamza 03 July 2017 (has links) Les opérateurs de cloud IaaS (Infrastructure as a Service) proposent à leurs clients des ressources virtualisées (CPU, stockage et réseau) sous forme de machines virtuelles (VM). L’explosion du marché du cloud les a contraints à optimiser très finement l’utilisation de leurs centres de données afin de proposer des services attractifs à moindre coût. En plus des investissements liés à l’achat des infrastructures et de leur coût d’utilisation, la consommation énergétique apparaît comme un point de dépense important (2% de la consommation mondiale) et en constante augmentation. Sa maîtrise représente pour ces opérateurs un levier très intéressant à exploiter. D’un point de vue technique, le contrôle de la consommation énergétique s’appuie essentiellement sur les méthodes de consolidation. Or la plupart d'entre elles ne prennent en compte que l’utilisation CPU des machines physiques (PM) pour le placement de VM. En effet, des études récentes ont montré que les systèmes de stockage et les E/S disque constituent une part considérable de la consommation énergétique d’un centre de données (entre 14% et 40%). Dans cette thèse nous introduisons un nouveau modèle autonomique d’optimisation de placement de VM inspiré de MAPE-K (Monitor, Analyze, Plan, Execute, Knowledge), et prenant en compte en plus du CPU, les E/S des VM ainsi que les systèmes de stockage associés. Ainsi, notre première contribution est relative au développement d’un outil de trace des E/S de VM multi-niveaux. Les traces collectées alimentent, dans l’étape Analyze, un modèle de coût étendu dont l’originalité consiste à prendre en compte le profil d’accès des VM, les caractéristiques du système de stockage, ainsi que les contraintes économiques de l’environnement cloud. Nous analysons par ailleurs les caractéristiques des deux principales classes de stockage, pour aboutir à un modèle hybride exploitant au mieux les avantages de chacune. En effet, les disques durs magnétiques (HDD) sont des supports de stockage à la fois énergivores et peu performants comparés aux unités de calcul. Néanmoins, leur prix par gigaoctet et leur longévité peuvent jouer en leur faveur. Contrairement aux HDD, les disques SSD à base de mémoire flash sont plus performants et consomment peu d’énergie. Leur prix élevé par gigaoctet et leur courte durée de vie (comparés aux HDD) représentent leurs contraintes majeures. L’étape Plan a donné lieu, d’une part, à une extension de l'outil de simulation CloudSim pour la prise en compte des E/S des VM, du caractère hybride du système de stockage, ainsi que la mise en oeuvre du modèle de coût proposé dans l'étape Analyze. Nous avons proposé d’autre part, plusieurs heuristiques se basant sur notre modèle de coût et que nous avons intégrées dans CloudSim. Nous montrons finalement que notre approche permet d’améliorer d’un facteur trois le coût de placement de VM obtenu par les approches existantes. / IaaS cloud providers offer virtualized resources (CPU, storage, and network) as Virtual Machines(VM). The growth and highly competitive nature of this economy has compelled them to optimize the use of their data centers, in order to offer attractive services at a lower cost. In addition to investments related to infrastructure purchase and cost of use, energy efficiency is a major point of expenditure (2% of world consumption) and is constantly increasing. Its control represents a vital opportunity. From a technical point of view, the control of energy consumption is mainly based on consolidation approaches. These approaches, which exclusively take into account the CPU use of physical machines (PM) for the VM placement, present however many drawbacks. Indeed, recent studies have shown that storage systems and disk I/O represent a significant part of the data center energy consumption (between 14% and 40%).In this thesis we propose a new autonomic model for VM placement optimization based on MAPEK (Monitor, Analyze, Plan, Execute, Knowledge) whereby in addition to CPU, VM I/O and related storage systems are considered. Our first contribution proposes a multilevel VM I/O tracer which overcomes the limitations of existing I/O monitoring tools. In the Analyze step, the collected I/O traces are introduced in a cost model which takes into account the VM I/O profile, the storage system characteristics, and the cloud environment constraints. We also analyze the complementarity between the two main storage classes, resulting in a hybrid storage model exploiting the advantages of each. Indeed, Hard Disk Drives (HDD) represent energy-intensive and inefficient devices compared to compute units. However, their low cost per gigabyte and their long lifetime may constitute positive arguments. Unlike HDD, flash-based Solid-State Disks (SSD) are more efficient and consume less power, but their high cost per gigabyte and their short lifetime (compared to HDD) represent major constraints. The Plan phase has initially resulted in an extension of CloudSim to take into account VM I/O, the hybrid nature of the storage system, as well as the implementation of the previously proposed cost model. Secondly, we proposed several heuristics based on our cost model, integrated and evaluated using CloudSim. Finally, we showed that our contribution improves existing approaches of VM placement optimization by a factor of three. Stockage hybride Informatique autonomique MAPE-K E/S disque SSD HDD Cloud IaaS Placement de VM Consommation énergétique Hybrid storage Autonomic computing MAPE-K Disk I/O SSD HDD IaaS cloud VM placement Energy consumption 004
7	Energy savings and performance improvements with SSDs in the Hadoop Distributed File System / Economia de energia e aumento de desempenho usando SSDs no Hadoop Distributed File System Ivanilton Polato 29 August 2016 (has links) Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups / Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução Armazenamento híbrido Computação verde Discos de estado sólido Eficiência energética Hadoop HDFS Sistema de arquivos distribuído Sistemas de arquivos paralelo SSDs Distributed file systems Energy efficiency Green computing Hadoop HDFS Hybrid storage Parallel file systems Solid-state disk SSDs
8	Workload- and Data-based Automated Design for a Hybrid Row-Column Storage Model and Bloom Filter-Based Query Processing for Large-Scale DICOM Data Management / Conception automatisée basée sur la charge de travail et les données pour un modèle de stockage hybride ligne-colonne et le traitement des requêtes à l’aide de filtres de Bloom pour la gestion de données DICOM à grande échelle Nguyen, Cong-Danh 04 May 2018 (has links) Dans le secteur des soins de santé, les données d'images médicales toujours croissantes, le développement de technologies d'imagerie, la conservation à long terme des données médicales et l'augmentation de la résolution des images entraînent une croissance considérable du volume de données. En outre, la variété des dispositifs d'acquisition et la différence de préférences des médecins ou d'autres professionnels de la santé ont conduit à une grande variété de données. Bien que la norme DICOM (Digital Imaging et Communication in Medicine) soit aujourd'hui largement adoptée pour stocker et transférer les données médicales, les données DICOM ont toujours les caractéristiques 3V du Big Data: volume élevé, grande variété et grande vélocité. En outre, il existe une variété de charges de travail, notamment le traitement transactionnel en ligne (en anglais Online Transaction Processing, abrégé en OLTP), le traitement analytique en ligne (anglais Online Analytical Processing, abrégé en OLAP) et les charges de travail mixtes. Les systèmes existants ont des limites concernant ces caractéristiques des données et des charges de travail. Dans cette thèse, nous proposons de nouvelles méthodes efficaces pour stocker et interroger des données DICOM. Nous proposons un modèle de stockage hybride des magasins de lignes et de colonnes, appelé HYTORMO, ainsi que des stratégies de stockage de données et de traitement des requêtes. Tout d'abord, HYTORMO est conçu et mis en œuvre pour être déployé sur un environnement à grande échelle afin de permettre la gestion de grandes données médicales. Deuxièmement, la stratégie de stockage de données combine l'utilisation du partitionnement vertical et un stockage hybride pour créer des configurations de stockage de données qui peuvent réduire la demande d'espace de stockage et augmenter les performances de la charge de travail. Pour réaliser une telle configuration de stockage de données, l'une des deux approches de conception de stockage de données peut être appliquée: (1) conception basée sur des experts et (2) conception automatisée. Dans la première approche, les experts créent manuellement des configurations de stockage de données en regroupant les attributs des données DICOM et en sélectionnant une disposition de stockage de données appropriée pour chaque groupe de colonnes. Dans la dernière approche, nous proposons un cadre de conception automatisé hybride, appelé HADF. HADF dépend des mesures de similarité (entre attributs) qui prennent en compte les impacts des informations spécifiques à la charge de travail et aux données pour générer automatiquement les configurations de stockage de données: Hybrid Similarity (combinaison pondérée de similarité d'accès d'attribut et de similarité de densité d'attribut) les attributs dans les groupes de colonnes; Inter-Cluster Access Similarity est utilisé pour déterminer si deux groupes de colonnes seront fusionnés ou non (pour réduire le nombre de jointures supplémentaires); et Intra-Cluster Access La similarité est appliquée pour décider si un groupe de colonnes sera stocké dans une ligne ou un magasin de colonnes. Enfin, nous proposons une stratégie de traitement des requêtes adaptée et efficace construite sur HYTORMO. Il considère l'utilisation des jointures internes et des jointures externes gauche pour empêcher la perte de données si vous utilisez uniquement des jointures internes entre des tables partitionnées verticalement. De plus, une intersection de filtres Bloom (intersection of Bloom filters, abrégé en ) est appliqué pour supprimer les données non pertinentes des tables d'entrée des opérations de jointure; cela permet de réduire les coûts d'E / S réseau. (...) / In the health care industry, the ever-increasing medical image data, the development of imaging technologies, the long-term retention of medical data and the increase of image resolution are causing a tremendous growth in data volume. In addition, the variety of acquisition devices and the difference in preferences of physicians or other health-care professionals have led to a high variety in data. Although today DICOM (Digital Imaging and Communication in Medicine) standard has been widely adopted to store and transfer the medical data, DICOM data still has the 3Vs characteristics of Big Data: high volume, high variety and high velocity. Besides, there is a variety of workloads including Online Transaction Processing (OLTP), Online Analytical Processing (OLAP) and mixed workloads. Existing systems have limitations dealing with these characteristics of data and workloads. In this thesis, we propose new efficient methods for storing and querying DICOM data. We propose a hybrid storage model of row and column stores, called HYTORMO, together with data storage and query processing strategies. First, HYTORMO is designed and implemented to be deployed on large-scale environment to make it possible to manage big medical data. Second, the data storage strategy combines the use of vertical partitioning and a hybrid store to create data storage configurations that can reduce storage space demand and increase workload performance. To achieve such a data storage configuration, one of two data storage design approaches can be applied: (1) expert-based design and (2) automated design. In the former approach, experts manually create data storage configurations by grouping attributes and selecting a suitable data layout for each column group. In the latter approach, we propose a hybrid automated design framework, called HADF. HADF depends on similarity measures (between attributes) that can take into consideration the combined impact of both workload- and data-specific information to generate data storage configurations: Hybrid Similarity (a weighted combination of Attribute Access and Density Similarity measures) is used to group the attributes into column groups; Inter-Cluster Access Similarity is used to determine whether two column groups will be merged together or not (to reduce the number of joins); and Intra-Cluster Access Similarity is applied to decide whether a column group will be stored in a row or a column store. Finally, we propose a suitable and efficient query processing strategy built on top of HYTORMO. It considers the use of both inner joins and left-outer joins. Furthermore, an Intersection Bloom filter () is applied to reduce network I/O cost.We provide experimental evaluations to validate the benefits of the proposed methods over real DICOM datasets. Experimental results show that the mixed use of both row and column stores outperforms a pure row store and a pure column store. The combined impact of both workload-and data-specific information is helpful for HADF to be able to produce good data storage configurations. Moreover, the query processing strategy with the use of the can improve the execution time of an experimental query up to 50% when compared to the case where no is applied. DICOM Données volumineuses Données clairsemées HYTORMO Modèle de stockage hybride Stockage en lignes Stockage en colonnes Similarité hybride Filtre Bloom Intersection de filtres Bloom Joindre DICOM Big data Sparse datasets HYTORMO Hybrid storage model Row store Column store Hybrid similarity Bloom filter Intersection Bloom filter Join

Search results