Global ETD Search

81	Scalable data-management systems for Big Data / Sur le passage à l'échelle des systèmes de gestion des grandes masses de données Tran, Viet-Trung 21 January 2013 (has links) La problématique "Big Data" peut être caractérisée par trois "V": - "Big Volume" se rapporte à l'augmentation sans précédent du volume des données. - "Big Velocity" se réfère à la croissance de la vitesse à laquelle ces données sont déplacées entre les systèmes qui les gèrent. - "Big Variety" correspond à la diversification des formats de ces données. Ces caractéristiques imposent des changements fondamentaux dans l'architecture des systèmes de gestion de données. Les systèmes de stockage doivent être adaptés à la croissance des données, et se doivent de passer à l'échelle tout en maintenant un accès à hautes performances. Cette thèse se concentre sur la construction des systèmes de gestion de grandes masses de données passant à l'échelle. Les deux premières contributions ont pour objectif de fournir un support efficace des "Big Volumes" pour les applications data-intensives dans les environnements de calcul à hautes performances (HPC). Nous abordons en particulier les limitations des approches existantes dans leur gestion des opérations d'entrées/sorties (E/S) non-contiguës atomiques à large échelle. Un mécanisme basé sur les versions est alors proposé, et qui peut être utilisé pour l'isolation des E/S non-contiguës sans le fardeau de synchronisations coûteuses. Dans le contexte du traitement parallèle de tableaux multi-dimensionels en HPC, nous présentons Pyramid, un système de stockage large-échelle optimisé pour ce type de données. Pyramid revoit l'organisation physique des données dans les systèmes de stockage distribués en vue d'un passage à l'échelle des performances. Pyramid favorise un partitionnement multi-dimensionel de données correspondant le plus possible aux accès générés par les applications. Il se base également sur une gestion distribuée des métadonnées et un mécanisme de versioning pour la résolution des accès concurrents, ce afin d'éliminer tout besoin de synchronisation. Notre troisième contribution aborde le problème "Big Volume" à l'échelle d'un environnement géographiquement distribué. Nous considérons BlobSeer, un service distribué de gestion de données orienté "versioning", et nous proposons BlobSeer-WAN, une extension de BlobSeer optimisée pour un tel environnement. BlobSeer-WAN prend en compte la hiérarchie de latence et favorise les accès aux méta-données locales. BlobSeer-WAN inclut la réplication asynchrone des méta-données et une résolution des collisions basée sur des "vector-clock". Afin de traîter le caractère "Big Velocity" de la problématique "Big Data", notre dernière contribution consiste en DStore, un système de stockage en mémoire orienté "documents" qui passe à l'échelle verticalement en exploitant les capacités mémoires des machines multi-coeurs. Nous montrons l'efficacité de DStore dans le cadre du traitement de requêtes d'écritures atomiques complexes tout en maintenant un haut débit d'accès en lecture. DStore suit un modèle d'exécution mono-thread qui met à jour les transactions séquentiellement, tout en se basant sur une gestion de la concurrence basée sur le versioning afin de permettre un grand nombre d'accès simultanés en lecture. / Big Data can be characterized by 3 V’s. • Big Volume refers to the unprecedented growth in the amount of data. • Big Velocity refers to the growth in the speed of moving data in and out management systems. • Big Variety refers to the growth in the number of different data formats. Managing Big Data requires fundamental changes in the architecture of data management systems. Data storage should continue being innovated in order to adapt to the growth of data. They need to be scalable while maintaining high performance regarding data accesses. This thesis focuses on building scalable data management systems for Big Data. Our first and second contributions address the challenge of providing efficient support for Big Volume of data in data-intensive high performance computing (HPC) environments. Particularly, we address the shortcoming of existing approaches to handle atomic, non-contiguous I/O operations in a scalable fashion. We propose and implement a versioning-based mechanism that can be leveraged to offer isolation for non-contiguous I/O without the need to perform expensive synchronizations. In the context of parallel array processing in HPC, we introduce Pyramid, a large-scale, array-oriented storage system. It revisits the physical organization of data in distributed storage systems for scalable performance. Pyramid favors multidimensional-aware data chunking, that closely matches the access patterns generated by applications. Pyramid also favors a distributed metadata management and a versioning concurrency control to eliminate synchronizations in concurrency. Our third contribution addresses Big Volume at the scale of the geographically distributed environments. We consider BlobSeer, a distributed versioning-oriented data management service, and we propose BlobSeer-WAN, an extension of BlobSeer optimized for such geographically distributed environments. BlobSeer-WAN takes into account the latency hierarchy by favoring locally metadata accesses. BlobSeer-WAN features asynchronous metadata replication and a vector-clock implementation for collision resolution. To cope with the Big Velocity characteristic of Big Data, our last contribution feautures DStore, an in-memory document-oriented store that scale vertically by leveraging large memory capability in multicore machines. DStore demonstrates fast and atomic complex transaction processing in data writing, while maintaining high throughput read access. DStore follows a single-threaded execution model to execute update transactions sequentially, while relying on a versioning concurrency control to enable a large number of simultaneous readers. Gestion de données Systèmes répartis Data management High performance computing Parallel computing
82	När informationssystemen leder till dubbelarbete : En fallstudie av ärendehanteringen inom Huddinge kommun Barton, Julia, Flygare, Sandra January 2017 (has links) Idag sker organisatoriskt administrationsarbete till stor del med hjälp av datoriserad teknik. Det finns dock en ineffektivitet i hur detta administrationsarbete går till i olika datasystem vilket lett till högre kostnad för mindre genererad nytta åt kunderna. Inom den offentliga sektorn är dessa kunder invånarna. Denna studie undersöker hur Huddinge kommun arbetar med ärendehantering i olika medieformer, vad som påverkar ärendehanteringen i och mellan medieformerna, samt vilka problem som uppstår i ärendeprocessen till följd av dessa. Studien tar upp hur socioteknik och människa-dator interaktion samverkar inom ramen för informationssystem och datahantering i förhållande till användarna och ärendehantering. Observationer och intervjuer har genomförts vid tre förvaltningar, samt Servicecenter på Huddinge kommun. Resultatet visar att ärendehanteringen inom kommunen är beroende av flera olika medieformer där informationssystemens användbarhet och funktionalitet påverkar hur säkert, konsekvent och tillförlitligt ärendehanteringen sker. Återkommande problem gällande dubbelarbete, oavslutade ärenden och inkonsekvent datainsamling och registrering av data kopplat till ärenden kan identifieras. Resultaten kan användas av Huddinge kommun vid kravställning och inköp av informationssystem ämnade för ärendehantering. Avgränsningar har gjorts gällande antalet representerade förvaltningsenheter inom kommunen. Övriga förvaltningsenheter kan använda sig av andra medieformer och ha andra erfarenheter gällande användning av dessa vid ärendehantering. / Today organizational management work is largely done with the help of computer technology. However, there are inefficiencies in how these management efforts are performed in various computer systems which has led to higher costs for less-generated benefits for customers. In the public sector, these customers are the residents. This study examines how the municipality of Huddinge works with case management in various forms of media, what influences the case management in and between these media forms and the problems that arise in the case process as a result of these. The study addresses how sociotechnology and human-computer interaction cooperate within the field of information systems and data management in relation to users and case management. Observations and interviews were conducted at three municipal administration units and the Service center in Huddinge. The results show that case management within the municipality is dependent on various forms of media, where the information systems usability and functionality affects how safe, consistent and reliable case handling takes place. Recurring problems regarding duplication of work, unfinished cases and inconsistent data collection and registration of data linked to case work can be identified. The results can be used by the municipality of Huddinge when defining requirements and purchasing information systems intended for case management. Limitations have been made regarding the number of represented administration units within the municipality. Other units within the municipality can use other forms of media, and therefore have other experiences regarding the use of these when working with case management. case management information system usability data management municipality ärendehantering informationssystem användbarhet datahantering kommun
83	Kartläggning av olika testdatahanteringsverktyg : Jämförelse och utvärdering av olika testdatahanteringsverktyg Viking, Jakob January 2019 (has links) Due to new regulation GDPR, a whole industry had to change its way of handling data. This industry is the test data management industry, an industry that bases its products on managing PII (Personally Identifiable Information). This leads to an increased demand to how data is stored, which by its own leads to different solutions and several companies that try their chances to establish themselves in this market. The overall purpose of this study is to extract the good and bad aspects from five different test data management tools. In addition to the collection of fact, tests are performed to gain experience with each program to later summarize them both. The result consists of the result from the test cases and the result from the comparison matrix and together they form the grade on the test data management tool. The conclusion that can be drawn from this mapping is that the programs with the highest flexibility have a greater chance of success, but there are also simple programs that show that simplicity is at least as important. / På grund av den nya förordningen GDPR var en hel bransch tvungen att ändra sitt sätt att hantera data på. Denna bransch är testdatahanteringen, en bransch som baserar sina produkter på att hantera PII (Personally Identifiable Information). Detta leder till ett ökat krav på hur data förvaras leder till olika lösningar och flera företag som tar chansen att etablera sig i marknaden. Det övergripande syftet i denna undersökning är att ta fram de positiva och negativa aspekterna ur fem olika testdatahanterare. Utöver bara faktainsamling utförs tester för att få erfarenhet med varje program och därefter sammanfatta åsikterna med konkreta fakta. Resultatet består av resultatet från testfallen och resultatet från jämförelsematrisen och tillsammans bildar de ett betyg på testdatahanteringsverktyget. Den slutsats som kan dras från denna kartläggning är att de program med högst flexibilitet har större chans att lyckas men även finns det enkla program som visar att simpelhet är minst lika viktigt. Test data GDPR PII test data management tool Testdata GDPR PII testdatahanterare Software Engineering Programvaruteknik
84	Make Your Data Work for You: True Stories of People and Technology Riehs, Daniel January 2006 (has links) Thesis advisor: Alan Lawson / Technology should enhance the human experience. Instead, it often alienates people from aspects of life that are considered most important. Artists are separated from their works, friends are separated from each other, and human ingenuity is filtered though computers before it can impact the world. These five short stories focus mainly on alienations inherent to communications and media technology, but also touch on database management and copyright concerns. Some take place in the present day; others present views of the future. All five stories use fiction to explore the truth of humanity's absurd relationship to technology. / Thesis (BA) — Boston College, 2006. / Submitted to: Boston College. College of Arts and Sciences. / Discipline: College Honors Program. Literature, General fiction satire technology and society neo-luddite cell phones data management
85	Especificação de um framework para rastreabilidade da cadeia produtiva de grãos Vaz, Monica Cristine Scherer 21 February 2014 (has links) Made available in DSpace on 2017-07-21T14:19:40Z (GMT). No. of bitstreams: 1 MONICA CRISTINE SCHERER VAZ.pdf: 2687414 bytes, checksum: 22b96a7da0478a4e61874c0a445e1d24 (MD5) Previous issue date: 2014-02-21 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Traceability allows to identify the product’s origin and route, at any time of its production chain, main requirement in normalization processes, certifications and in quality management systems. About food products, where contamination acciddent’s impact involves health damage, traceability allows that the affected lots may be detected and quickly removed from the market with safety, minimizing losses. Besides legal requirements about traceability, the final consumer is looking for having access to informations about food which are being eaten, motivating tecnological solution’s development in this area. The goal of this dissertation is to present the specification of RastroGrão Framework, to traceability of productive grains process. This Framework is based in regulations and rules of quality applied to traceability and in systems of grains traceability, allowing records relative to production’s processes. These records can be changing according to the necessity of each participant agent of production chain. Data to be traced are recorded by the user, not being necessary system maintenance everytime it appears a new requirement to be traced. To the development of Framework was used Java Language to WEB Environment and PostgreSQL Database. The main contribution of this dissertation are related to the benefits offered by the Framework are: i) visibility of informations, which can be accessed through internet, by all the chain agents; ii) integration of all links of the productive chain; and iii) information’s availability to the final consumer through QR-Code, that can be accessed in the internet or printed on the packaging. / A rastreabilidade permite identificar a origem e o percurso de um produto, em qualquer momento de sua cadeia produtiva, requisito fundamental em processos de normalização, certificações e em sistemas de gestão da qualidade. Em se tratando de produtos alimentícios, onde o impacto de acidentes de contaminação envolve riscos a saúde, a rastreabilidade permite que os lotes afetados sejam detectados e retirados do mercado, com rapidez e segurança, minimizando prejuízos. Além das exigências legais em torno da rastreabilidade, o consumidor final está buscando ter acesso às informações sobre os alimentos que estão consumindo, motivando o desenvolvimento de soluções tecnológicas nesta área. O objetivo desta dissertação é apresentar a especificação do Framework RastroGrão, para rastreabilidade do processo produtivo de grãos. Esse Framework está baseado nos regulamentos e normas de qualidade aplicadas à rastreabilidade e em sistemas de rastreabilidade de grãos, permitindo os registros inerentes aos processos de produção. Esses registros podem ser alterados de acordo com a necessidade de cada agente participante da cadeia de produção. Os dados a serem rastreados são registrados pelo próprio usuário, não necessitando de manutenção de sistema cada vez que surge um novo requisito a ser rastreado. Para o desenvolvimento do Framework foi utilizada a Linguagem Java para Ambiente WEB e o Banco de Dados PostgreSQL. As principais contribuições desta dissertação estão relacionadas aos benefícios oferecidos pelo Framework são: i) visibilidade das informações, as quais podem ser acessadas pela internet, por todos os agentes da cadeia; ii) integração de todos os elos da cadeia produtiva, e iii) disponibilização das informações para o consumidor final através de QR-Code, que pode ser acessado na internet ou impresso nas embalagens framework rastreabilidade grãos gestão de dados framework traceability grain data management
86	Privacy-preserving queries on encrypted databases Meng, Xianrui 07 December 2016 (has links) In today's Internet, with the advent of cloud computing, there is a natural desire for enterprises, organizations, and end users to outsource increasingly large amounts of data to a cloud provider. Therefore, ensuring security and privacy is becoming a significant challenge for cloud computing, especially for users with sensitive and valuable data. Recently, many efficient and scalable query processing methods over encrypted data have been proposed. Despite that, numerous challenges remain to be addressed due to the high complexity of many important queries on encrypted large-scale datasets. This thesis studies the problem of privacy-preserving database query processing on structured data (e.g., relational and graph databases). In particular, this thesis proposes several practical and provable secure structured encryption schemes that allow the data owner to encrypt data without losing the ability to query and retrieve it efficiently for authorized clients. This thesis includes two parts. The first part investigates graph encryption schemes. This thesis proposes a graph encryption scheme for approximate shortest distance queries. Such scheme allows the client to query the shortest distance between two nodes in an encrypted graph securely and efficiently. Moreover, this thesis also explores how the techniques can be applied to other graph queries. The second part of this thesis proposes secure top-k query processing schemes on encrypted relational databases. Furthermore, the thesis develops a scheme for the top-k join queries over multiple encrypted relations. Finally, this thesis demonstrates the practicality of the proposed encryption schemes by prototyping the encryption systems to perform queries on real-world encrypted datasets. Computer science Cryptography Database security Data management Graph theory Information security/privacy Query processing
87	Impact d’une politique proactive de surveillance et de gestion des risques infectieux dans un centre hospitalo-universitaire parisien sur la diffusion des Bactéries Multi-Résistantes aux antibiotiques / Impact of a proactive surveillance and infectious risk management policy in a Parisian teachting hospital on the Multi-Resistant bacteria spread Grohs, Patrick 20 December 2017 (has links) Cette thèse recouvre une période de 15 ans de lutte contre la diffusion hospitalière des bactéries multi-résistantes aux antibiotiques (BMR) au sein d’un CHU parisien pilote. Elle intègre plusieurs études axées sur les outils informatiques de surveillance, la maîtrise de la transmission croisée, et le bon usage des antibiotiques. La configuration d'une base de données dédiée, concentrant l'ensemble des données de la résistance, doit intégrer des critères de requêtes épidémiologique précis, normalisés, et reproductibles. Des variations significatives du taux de Staphylococcus aureus résistant à la méticilline (SARM) ont été observées en fonction de la méthode d'élimination des doublons utilisée (de 27,6% à 33,8%), et l’extrapolation des résultats annuels à partir de données colligées sur des unités de temps plus courtes, mésestime la valeur annuelle de l’incidence du SARM (de -42% à +30%). Ces variations peuvent influer sur l'interprétation des indicateurs relatifs au SARM. L'alerte électronique BMR, en avertissant les soignants de la réadmission d'un patient porteur au sein de leur unité, a permis d'optimiser la compliance aux mesures de précautions contact complémentaires (15% avant vs 90,2% après), et de participer à la diminution de l’incidence du SARM (1,07 en 2002 vs 0,37 en 2012). Elle a également mis en évidence la forte pression de colonisation due aux patients SARM réadmis, 46% des patients porteurs étant réadmis au moins une fois dont les 2/3, moins de trois mois après leur sortie. En réanimation, 1/5 des patients sont porteurs d’Entérobactéries productrices de béta-lactamase à spectre étendu (EBLSE) et/ou hyperproductrices de céphalosporinase (HP-CASE), les taux de portage à l’admission étant respectivement de 79,2% et 48,1% rapportés à l’ensemble des patients porteurs. Aucune stratégie de dépistage n’assure une identification de 100% des patients porteurs, la discontinuité de portage au cours du séjour concernant la majorité des patients. Le remplacement de la ceftriaxone par le cefotaxime a permis de ralentir la progression de l'incidence des HP-CASE. Enfin, l’optimisation du dépistage des EBLSE par l'évaluation puis l'utilisation de robots et de milieux de culture dédiés, a permis de gérer l’inflation des échantillons de dépistage / This work describes 15 years of fight against the spread of multi-drug resistant bacteria (MRB) in a tertiary care hospital, and includes several published studies focused on monitoring systems, cross-transmission control, and antibiotics policy. Data processing plays a crucial role in monitoring MRB. The configuration of a dedicated database, concentrating all resistance data, has to include accurate, standardized, and reproducible epidemiological settings. Significant changes in the methicillin-resistant staphylococcus aureus (MRSA) rate were observed, according to the method used to remove duplicates (27.6% to 33.8%), and the extrapolation of annual results from data collected from shorter period length, underestimates the annual value of MRSA incidence (from -42% to + 30%). These variations may affect the interpretation of MRSA official indicators. The MRB electronic alert, used to warn the staff of the readmission of a previously known colonized patient, conducted to increase the compliance for implementation of infection control measures (15% before vs 90.2% after), and participated in reducing the MRSA incidence (1.07 in 2002 vs. 0.37 in 2012). It also highlighted the strong colonization pressure due to readmitted MRSA colonized patients, since 46% of them were readmitted at least once, readmissions occurring less than three months after discharge in 2/3 cases. In intensive care units, 1/5 patient are colonized by Enterobacteriaceae producing broad spectrum beta-lactamase (ESBL) and/or high level cephalosporinase (HL-CASE), rates at admission being respectively 79.2% and 48.1% when considering all colonized patients. No screening strategy is able to detect 100% of colonization situations, the discontinuity of colonization during the hospital stay concerning a majority of patients. A significant inflexion in the slope in the HL-CASE incidence curve occurred after the substitution of ceftriaxone by cefotaxime. Finally, the optimization of ESBL screening by the assessment and then the use of a robot and dedicated culture media, allowed managing the increased number of screening specimens Antibio-Résistance Informatique Hopital Dépistage Sarm Blse Multi-Drug resistance Data management Hospital Screening Mrsa Esbl
88	Efficient Query Processing Over Web-Scale RDF Data Amgad M. Madkour (5930015) 17 January 2019 (has links) The Semantic Web, or the Web of Data, promotes common data formats for representing structured data and their links over the web. RDF is the defacto standard for semantic data where it provides a flexible semi-structured model for describing concepts and relationships. RDF datasets consist of entries (i.e, triples) that range from thousands to Billions. The astronomical growth of RDF data calls for scalable RDF management and query processing strategies. This dissertation addresses efficient query processing over web-scale RDF data. The first contribution is WORQ, an online, workload-driven, RDF query processing technique. Based on the query workload, reduced sets of intermediate results (or reductions, for short) that are common for specific join pattern(s) are computed in an online fashion. Also, we introduce an efficient solution for RDF queries with unbound properties. The second contribution is SPARTI, a scalable technique for computing the reductions offline. SPARTI utilizes a partitioning schema, termed SemVP, that enables efficient management of the reductions. SPARTI uses a budgeting mechanism with a cost model to determine the worthiness of partitioning. The third contribution is KC, an efficient RDF data management system for the cloud. KC uses generalized filtering that encompasses both exact and approximate set membership structures that are used for filtering irrelevant data. KC defines a set of common operations and introduces an efficient method for managing and constructing filters. The final contribution is semantic filtering where data can be reduced based on the spatial, temporal, or ontological aspects of a query. We present a set of encoding techniques and demonstrate how to use semantic filters to reduce irrelevant data in a distributed setting. Applied Computer Science query processing techniques Resource Description Framework (RDF) data management systems
89	RepliC: ReplicaÃÃo ElÃstica de Banco de Dados Multi-Inquilino em Nuvem com Qualidade de ServiÃo / RepliC: Elastic Multi-Tenant Database Replication in the Cloud with Quality of Service FlÃvio Rubens de Carvalho Sousa 15 January 2013 (has links) nÃo hÃ / Fatores econÃmicos estÃo levando ao aumento das infraestruturas e instalaÃÃes de fornecimento de computaÃÃo como um serviÃo, conhecido como Cloud Computing ou ComputaÃÃo em Nuvem, onde empresas e indivÃduos podem alugar capacidade de computaÃÃo e armazenamento, em vez de fazerem grandes investimentos de capital necessÃrios para a construÃÃo e instalaÃÃo de equipamentos de computaÃÃo em larga escala. Na nuvem, o usuÃrio do serviÃo tem algumas garantias, tais como desempenho e disponibilidade. Essas garantias de qualidade de serviÃo (QoS) sÃo definidas entre o provedor do serviÃo e o usuÃrio e expressas por meio de um acordo de nÃvel de serviÃo (SLA). Este acordo consiste de contratos que especificam um nÃvel de qualidade que deve ser atendido e penalidades em caso de falha. Muitas empresas dependem de um SLA e estas esperam que os provedores de nuvem forneÃam SLAs baseados em caracterÃsticas de desempenho. Contudo, em geral, os provedores baseiam seus SLAs apenas na disponibilidade dos serviÃos oferecidos. Sistemas de gerenciamento de banco de dados (SGBDs) para computaÃÃo em nuvem devem tratar uma grande quantidade de aplicaÃÃes, tenants ou inquilinos. Abordagens multi-inquilino tÃm sido utilizadas para hospedar vÃrios inquilinos dentro de um Ãnico SGBD, favorecendo o compartilhamento eficaz de recursos, alÃm de gerenciar uma grande quantidade de inquilinos com padrÃes de carga de trabalho irregulares. Por outro lado, os provedores em nuvem devem reduzir os custos operacionais garantindo a qualidade. Neste contexto, uma caracterÃstica chave Ã a replicaÃÃo de banco de dados, que melhora a disponibilidade, desempenho e, consequentemente, a qualidade do serviÃo. TÃcnicas de replicaÃÃo de dados tÃm sido usadas para melhorar a disponibilidade, o desempenho e a escalabilidade em diversos ambientes. Contudo, a maior parte das estratÃgias de replicaÃÃo de banco de dados tÃm se concentrado em aspectos de escalabilidade e consistÃncia do sistema com um nÃmero estÃtico de rÃplicas. Aspectos relacionados Ã elasticidade para banco de dados multi-inquilino tÃm recebido pouca atenÃÃo. Estas questÃes sÃo importantes em ambientes em nuvem, pois os provedores precisam adicionar rÃplicas de acordo com a carga de trabalho para evitar violaÃÃo do SLA e eles precisam remover rÃplicas quando a carga de trabalho diminui, alÃm de consolidar os inquilinos. Visando solucionar este problema, este trabalho apresenta RepliC, uma abordagem para a replicaÃÃo de banco de dados em nuvem com foco na qualidade do serviÃo, elasticidade e utilizaÃÃo eficiente dos recursos por meio de tÃcnicas multi-inquilino. RepliC utiliza informaÃÃes dos SGBDs e do provedor para provisionar recursos de forma dinÃmica. Com o objetivo de avaliar RepliC, experimentos que medem a qualidade de serviÃo e elasticidade sÃo apresentados. Os resultados destes experimentos confirmam que RepliC garante a qualidade com uma pequena quantidade de violaÃÃo do SLA enquanto utiliza os recursos de forma eficiente. / Fatores econÃmicos estÃo levando ao aumento das infraestruturas e instalaÃÃes de fornecimento de computaÃÃo como um serviÃo, conhecido como Cloud Computing ou ComputaÃÃo em Nuvem, onde empresas e indivÃduos podem alugar capacidade de computaÃÃo e armazenamento, em vez de fazerem grandes investimentos de capital necessÃrios para a construÃÃo e instalaÃÃo de equipamentos de computaÃÃo em larga escala. Na nuvem, o usuÃrio do serviÃo tem algumas garantias, tais como desempenho e disponibilidade. Essas garantias de qualidade de serviÃo (QoS) sÃo definidas entre o provedor do serviÃo e o usuÃrio e expressas por meio de um acordo de nÃvel de serviÃo (SLA). Este acordo consiste de contratos que especificam um nÃvel de qualidade que deve ser atendido e penalidades em caso de falha. Muitas empresas dependem de um SLA e estas esperam que os provedores de nuvem forneÃam SLAs baseados em caracterÃsticas de desempenho. Contudo, em geral, os provedores baseiam seus SLAs apenas na disponibilidade dos serviÃos oferecidos. Sistemas de gerenciamento de banco de dados (SGBDs) para computaÃÃo em nuvem devem tratar uma grande quantidade de aplicaÃÃes, tenants ou inquilinos. Abordagens multi-inquilino tÃm sido utilizadas para hospedar vÃrios inquilinos dentro de um Ãnico SGBD, favorecendo o compartilhamento eficaz de recursos, alÃm de gerenciar uma grande quantidade de inquilinos com padrÃes de carga de trabalho irregulares. Por outro lado, os provedores em nuvem devem reduzir os custos operacionais garantindo a qualidade. Neste contexto, uma caracterÃstica chave Ã a replicaÃÃo de banco de dados, que melhora a disponibilidade, desempenho e, consequentemente, a qualidade do serviÃo. TÃcnicas de replicaÃÃo de dados tÃm sido usadas para melhorar a disponibilidade, o desempenho e a escalabilidade em diversos ambientes. Contudo, a maior parte das estratÃgias de replicaÃÃo de banco de dados tÃm se concentrado em aspectos de escalabilidade e consistÃncia do sistema com um nÃmero estÃtico de rÃplicas. Aspectos relacionados Ã elasticidade para banco de dados multi-inquilino tÃm recebido pouca atenÃÃo. Estas questÃes sÃo importantes em ambientes em nuvem, pois os provedores precisam adicionar rÃplicas de acordo com a carga de trabalho para evitar violaÃÃo do SLA e eles precisam remover rÃplicas quando a carga de trabalho diminui, alÃm de consolidar os inquilinos. Visando solucionar este problema, este trabalho apresenta RepliC, uma abordagem para a replicaÃÃo de banco de dados em nuvem com foco na qualidade do serviÃo, elasticidade e utilizaÃÃo eficiente dos recursos por meio de tÃcnicas multi-inquilino. RepliC utiliza informaÃÃes dos SGBDs e do provedor para provisionar recursos de forma dinÃmica. Com o objetivo de avaliar RepliC, experimentos que medem a qualidade de serviÃo e elasticidade sÃo apresentados. Os resultados destes experimentos confirmam que RepliC garante a qualidade com uma pequena quantidade de violaÃÃo do SLA enquanto utiliza os recursos de forma eficiente. gerenciamento de dados elasticidade replicaÃÃo Cloud computing data management elasticity replication CIENCIA DA COMPUTACAO CIENCIA DA COMPUTACAO
90	JavaRMS : um sistema de gerência de dados para grades baseado num modelo par-a-par / JavaRMS: a grid data management system based on a peer-to-peer model Gomes, Diego da Silva January 2008 (has links) A grande demanda por computação de alto desempenho culminou na construção de ambientes de execução de larga escala como as Grades Computacionais. Não diferente de outras plataformas de execução, seus usuários precisam obter os dados de entrada para suas aplicações e muitas vezes precisam armazenar os resultados por elas gerados. Apesar de o termo Grade ter surgido de uma metáfora onde os recursos computacionais estão tão facilmente acessíveis como os da rede elétrica, as ferramentas para gerenciamento de dados e de recursos de armazenamento disponíveis estão muito aquém do necessário para concretizar essa idéia. A imaturidade desses serviços se torna crítica para aplicações científicas que necessitam processar grandes volumes de dados. Nesses casos, utiliza-se apenas os recursos de alto desempenho e assegura-se confiabilidade, disponibilidade e segurança para os dados através de presença humana. Este trabalho apresenta o JavaRMS, um sistema de gerência de dados para Grades. Ao empregar um modelo par-a-par, consegue-se agregar os recursos menos capacitados disponíveis no ambiente de Grade, diminuindo-se assim o custo da solução. O sistema utiliza a técnica de nodos virtuais para lidar com a grande heterogeneidade de recursos, distribuindo os dados de acordo com o espaço de armazenamento fornecido. Empregase fragmentação para viabilizar o uso dos recursos menos capacitados e para melhorar o desempenho das operações que envolvem a transferência de arquivos. Utiliza-se replicação para prover persistência aos dados e para melhorar sua disponibilidade. JavaRMS lida ainda com a dinamicidade e a instabilidade dos recursos através de um modelo de estados, de forma a diminuir o impacto das operações de manutenção. A arquitetura contempla também serviços para gerenciamento de usuários e protege os recursos contra fraudes através de um sistema de cotas. Todas as operações foram projetadas para serem seguras. Por fim, disponibiliza-se toda a infra-estrutura necessária para que serviços de busca e ferramentas de interação com o usuário sejam futuramente fornecidos. Os experimentos realizados com o protótipo do JavaRMS comprovam que usar um modelo par-a-par para organizar os recursos e localizar os dados resulta em boa escalabilidade. Já a técnica de nodos virtuais se mostrou eficiente para distribuir de forma balanceada os dados entre as máquinas, de acordo com a capacidade de armazenamento oferecida. Através de testes com a principal operação que envolve a transferência de arquivos, comprovou-se que o modelo é capaz de melhorar significativamente o desempenho de aplicações que necessitam processar grandes volumes de dados. / Large scale execution environments such as Grids emerged to meet high-performance computing demands. Like in other execution platforms, its users need to get input data to their applications and to store their results. Although the Grid term is a metaphor where computing resources are so easily accessible as those from the eletric grid, its data and resource management tools are not sufficiently mature to make this idea a reality. They usually target high-performance resources, where data reliability, availability and security is assured through human presence. It turns to be critical when scientific applications need to process huge amounts of data. This work presents JavaRMS, a Grid data management system. By using a peer-topeer model, it aggregates low capacity resources to reduce storage costs. Resource heterogeneity is dealt with the virtual node technique, where peers receive data proportionally to their provided storage space. It applies fragmentation to make feasible the usage of low capacity resources and to improve file transfer operations performance. Also, the system achieves data persistence and availability through replication. In order to decrease the impact of maintenance operations, JavaRMS deals with resource dinamicity and instability with a state model. The architecture also contains user management services and protects resources through a quota system. All operations are designed to be secure. Finally, it provides the necessary infrastructure for further deployment of search services and user interactive tools. Experiments with the JavaRMS prototype showed that using a peer-to-peer model for resource organization and data location results in good scalability. Also, the virtual node technique showed to be efficient to provide heterogeneity-aware data distribution. Tests with the main file transfer operation proved the model can significantly improve data-intensive applications performance. Processamento distribuido Java (Linguagem de programação) Grid computing Data management Peer-to-peer model Replication Archiving

Search results