Global ETD Search

191	Cultural GrAnt: um protocolo de roteamento baseado em inteligência coletiva para redes tolerantes a atrasos Vendramin, Ana Cristina Barreiras Kochem 06 June 2012 (has links) Esta tese apresenta um novo protocolo de roteamento voltado para as Redes Tolerantes a Atrasos que exibem comportamentos complexos e dinâmicos. O protocolo proposto chama-se Cultural GrAnt (do inglês Cultural Greedy Ant) uma vez que este utiliza um sistema híbrido composto por um Algoritmo Cultural (AC) e uma versão gulosa da meta-heurística de Otimização por Colônia de Formigas (ACO). No Cultural GrAnt, o ACO representa o espaço populacional de um AC e utiliza uma regra de transição gulosa de modo a intensificar bons caminhos já encontrados ou explorar novos caminhos através da seleção, dentre um conjunto de candidatos, dos nós encaminhadores de mensagens mais promissores. A principal motivação para o uso do ACO é tirar proveito da sua busca baseada em população de indivíduos e da adaptação da sua estrutura de aprendizado. O AC obtém informações durante o processo evolucionário e as utiliza para guiar a população e, então, acelerar o aprendizado enquanto provê soluções mais eficientes. Considerando informações de funções heurísticas, concentração de feromônio e conhecimentos armazenados no espaço de crenças do AC, o protocolo Cultural GrAnt inclui três módulos: roteamento; escalonamento; e gerenciamento de buffer. Esse é o primeiro protocolo de roteamento que emprega ACO e AC de modo a: inferir os melhores encaminhadores de mensagens através de informações oportunistas sobre a conectividade social entre os nós; determinar os melhores caminhos que uma mensagem deve seguir para eventualmente alcançar o seu destino final, enquanto limita o número de replicações e descartes de mensagens na rede; determinar a ordem de escalonamento das mensagens; e gerenciar o espaço de armazenamento do buffer dos nós. O protocolo Cultural GrAnt é comparado com os protocolos Epidêmico e PROPHET em dois cenários de mobilidade distintos: um modelo de movimento baseado em atividades, onde simula-se o dia-a-dia de pessoas em suas atividades de trabalho, lazer e descanso; e um modelo de movimento baseado em comunidades de pessoas. Os resultados de simulações obtidos através do simulador ONE mostram que em ambos os cenários, o protocolo Cultural GrAnt alcança uma taxa mais alta de entrega de mensagens, uma replicação menor de mensagens e um número menor de mensagens descartadas se comparado com os protocolos Epidêmico e PROPHET. / This work presents a new routing protocol for complex and dynamic Delay Tolerant Networks (DTN). The proposed protocol is called Cultural GrAnt (Greedy Ant), as it uses a hybrid system composed of a Cultural Algorithm and a greedy version of the Ant Colony Optimization (ACO) metaheuristic. In Cultural GrAnt, ACO represents the population space of the cultural algorithm and uses a greedy transition rule to either exploit previously found good paths or explore new paths by selecting, among a set of candidates, the most promising message forwarders. The main motivation for using ACO is to take advantage of its population-based search and adaptive learning framework. Conversely, CA gathers information during the evolutionary process and uses it to guide the population and thus accelerate learning while providing more efficient solutions. Considering information from heuristic functions, pheromone concentration, and knowledge stored in the CA belief space, the Cultural GrAnt protocol includes three modules: routing, scheduling, and buffer management. To the best of our knowledge, this is the first routing protocol that employs both ACO and CA to infer the best message forwarders using opportunistic information about social connectivity between nodes, determine the best paths a message must follow to eventually reach its destination while limiting message replications and droppings, and perform message transmission scheduling and buffer space management. Cultural GrAnt is compared to the Epidemic and PROPHET protocols in two different mobility scenarios: an activity-based movement model, which simulates the daily lives of people in their work, leisure and rest activities; and a community-based movement model. Simulation results obtained by the ONE simulator show that, in both scenarios, Cultural GrAnt achieves a higher delivery ratio, lower message replication, and fewer dropped messages than Epidemic and PROPHET. Rede de computador - Protocolos Roteadores (Rede de computador) Tolerância a falha (Computadores) Inteligência coletiva Otimização matemática Engenharia elétrica Computer network protocols Ruters (Computer network) Fault-tolerant computing Swarm intelligence Mathematical optimization Electric engineering
192	PTTA: protocolo para distribuição de conteúdo em redes tolerantes ao atraso e desconexões Albini, Fábio Luiz Pessoa 30 October 2013 (has links) O presente trabalho consiste na proposta de um novo protocolo de transporte para redes tolerantes a atrasos e desconexões (DTN - Delay Tolerant Network) chamado PTTA - Protocolo de Transporte Tolerante a Atrasos (em inglês - DTTP - Delay Tolerant Transport Protocol). Este protocolo tem o objetivo de oferecer uma confiabilidade estatística na entrega das informações em redes deste tipo. Para isso, serão utilizados Códigos Fontanais como técnica de correção de erros. Os resultados mostram as vantagens da utilização do PTTA. Este trabalho ainda propõe um mecanismo de controle da fonte adaptável para o PTTA a fim de limitar a quantidade de dados gerados pela origem (fonte). O esquema proposto almeja aumentar a diversidade das informações codificadas sem o aumento da carga na rede. Para atingir este objetivo o intervalo de geração e o TTL (Time To Live - Tempo de vida) das mensagens serão manipulados com base em algumas métricas da rede. A fim de validar a eficiência do mecanismo proposto, diferentes cenários foram testados utilizando os principais protocolos de roteamento para DTNs. Os resultados de desempenho foram obtidos levando em consideração o tamanho do buffer, o TTL das mensagens e a quantidade de informação redundante gerada na rede. Os resultados de simulações obtidos através do simulador ONE mostram que nos cenários avaliados, o PTTA alcança um aumento na taxa de entrega das informações em um menor tempo, quando comparado com outro protocolo de transporte sem confirmação, permitindo assim um ganho de desempenho na rede. / The present work consists in the proposal of a new transport protocol for delay tolerant networks and disconnections (DTN - Delay Tolerant Network) called DTTP - Delay Tolerant Transport Protocol (in portuguese – PTTA - Protocolo de Transporte Tolerante a Atrasos). This protocol aims to provide a statistical reliability in DTNs' information delivery. For this, we use fountain codes as error correction technique. The results show the advantages of using DTTP. This work also proposes an adaptive control mechanism for the DTTP source to limit the amount of generated data. The proposed scheme aims at increasing the diversity of encoded information without increasing the load on the network. To achieve this goal the messages generation interval and TTL (Time To Live) will be handled based on some network metrics. In order to validate the efficiency of the proposed mechanism, different scenarios will be tested using the main routing protocols for DTNs. The performance results were obtained taking into account the buffer size, messages TTL and the amount of redundant information generated on the network. The simulation results, obtained through The ONE simulator, show that in the evaluated scenarios PTTA achieves an increase in the information delivery rate in a shorter time compared to other transport protocol for confirmation, thus allowing a gain in the network performance. Rede de computador - Protocolos Tolerância a falha (Computadores) Redes de computadores - Confiabilidade Simulação (Computadores) Computer network protocols Fault-tolerant computing Computer networks - Reliability Computer simulation
193	Cultural GrAnt: um protocolo de roteamento baseado em inteligência coletiva para redes tolerantes a atrasos Vendramin, Ana Cristina Barreiras Kochem 06 June 2012 (has links) Esta tese apresenta um novo protocolo de roteamento voltado para as Redes Tolerantes a Atrasos que exibem comportamentos complexos e dinâmicos. O protocolo proposto chama-se Cultural GrAnt (do inglês Cultural Greedy Ant) uma vez que este utiliza um sistema híbrido composto por um Algoritmo Cultural (AC) e uma versão gulosa da meta-heurística de Otimização por Colônia de Formigas (ACO). No Cultural GrAnt, o ACO representa o espaço populacional de um AC e utiliza uma regra de transição gulosa de modo a intensificar bons caminhos já encontrados ou explorar novos caminhos através da seleção, dentre um conjunto de candidatos, dos nós encaminhadores de mensagens mais promissores. A principal motivação para o uso do ACO é tirar proveito da sua busca baseada em população de indivíduos e da adaptação da sua estrutura de aprendizado. O AC obtém informações durante o processo evolucionário e as utiliza para guiar a população e, então, acelerar o aprendizado enquanto provê soluções mais eficientes. Considerando informações de funções heurísticas, concentração de feromônio e conhecimentos armazenados no espaço de crenças do AC, o protocolo Cultural GrAnt inclui três módulos: roteamento; escalonamento; e gerenciamento de buffer. Esse é o primeiro protocolo de roteamento que emprega ACO e AC de modo a: inferir os melhores encaminhadores de mensagens através de informações oportunistas sobre a conectividade social entre os nós; determinar os melhores caminhos que uma mensagem deve seguir para eventualmente alcançar o seu destino final, enquanto limita o número de replicações e descartes de mensagens na rede; determinar a ordem de escalonamento das mensagens; e gerenciar o espaço de armazenamento do buffer dos nós. O protocolo Cultural GrAnt é comparado com os protocolos Epidêmico e PROPHET em dois cenários de mobilidade distintos: um modelo de movimento baseado em atividades, onde simula-se o dia-a-dia de pessoas em suas atividades de trabalho, lazer e descanso; e um modelo de movimento baseado em comunidades de pessoas. Os resultados de simulações obtidos através do simulador ONE mostram que em ambos os cenários, o protocolo Cultural GrAnt alcança uma taxa mais alta de entrega de mensagens, uma replicação menor de mensagens e um número menor de mensagens descartadas se comparado com os protocolos Epidêmico e PROPHET. / This work presents a new routing protocol for complex and dynamic Delay Tolerant Networks (DTN). The proposed protocol is called Cultural GrAnt (Greedy Ant), as it uses a hybrid system composed of a Cultural Algorithm and a greedy version of the Ant Colony Optimization (ACO) metaheuristic. In Cultural GrAnt, ACO represents the population space of the cultural algorithm and uses a greedy transition rule to either exploit previously found good paths or explore new paths by selecting, among a set of candidates, the most promising message forwarders. The main motivation for using ACO is to take advantage of its population-based search and adaptive learning framework. Conversely, CA gathers information during the evolutionary process and uses it to guide the population and thus accelerate learning while providing more efficient solutions. Considering information from heuristic functions, pheromone concentration, and knowledge stored in the CA belief space, the Cultural GrAnt protocol includes three modules: routing, scheduling, and buffer management. To the best of our knowledge, this is the first routing protocol that employs both ACO and CA to infer the best message forwarders using opportunistic information about social connectivity between nodes, determine the best paths a message must follow to eventually reach its destination while limiting message replications and droppings, and perform message transmission scheduling and buffer space management. Cultural GrAnt is compared to the Epidemic and PROPHET protocols in two different mobility scenarios: an activity-based movement model, which simulates the daily lives of people in their work, leisure and rest activities; and a community-based movement model. Simulation results obtained by the ONE simulator show that, in both scenarios, Cultural GrAnt achieves a higher delivery ratio, lower message replication, and fewer dropped messages than Epidemic and PROPHET. Rede de computador - Protocolos Roteadores (Rede de computador) Tolerância a falha (Computadores) Inteligência coletiva Otimização matemática Engenharia elétrica Computer network protocols Ruters (Computer network) Fault-tolerant computing Swarm intelligence Mathematical optimization Electric engineering
194	Adaptive Fault Tolerance Strategies for Large Scale Systems George, Cijo January 2012 (has links) (PDF) Exascale systems of the future are predicted to have mean time between node failures (MTBF) of less than one hour. At such low MTBF, the number of processors available for execution of a long running application can widely vary throughout the execution of the application. Employing traditional fault tolerance strategies like periodic checkpointing in these highly dynamic environments may not be effective because of the high number of application failures, resulting in large amount of work lost due to rollbacks apart from the increased recovery overheads. In this context, it is highly necessary to have fault tolerance strategies that can adapt to the changing node availability and also help avoid significant number of application failures. In this thesis, we present two adaptive fault tolerance strategies that make use of node failure pre-diction mechanisms to provide proactive fault tolerance for long running parallel applications on large scale systems. The first part of the thesis deals with an adaptive fault tolerance strategy for malleable applications. We present ADFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. We first develop cost models that consider different factors like accuracy of node failure predictions and application scalability, for evaluating the benefits of various fault tolerance actions including check-pointing, live-migration and rescheduling. Our adaptive framework then uses the cost models to make runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to minimize application failures and maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in work done by the application in the presence of failures, and is effective even for petascale and exascale systems. In the second part of the thesis, we present a fault tolerance strategy using adaptive process replication that can provide fault tolerance for applications using partial replication of a set of application processes. This fault tolerance framework adaptively changes the set of replicated processes (replicated set) periodically based on node failure predictions to avoid application failures. We have developed an MPI prototype implementation, PAREP-MPI that allows dynamically changing the replicated set of processes for MPI applications. Experiments with real scientific applications on real systems have shown that the overhead of PAREP-MPI is minimal. We have shown using simulations with real and synthetic failure traces that our strategy involving adaptive process replication significantly outperforms existing mechanisms providing up to 20% improvement in application efficiency even for exascale systems. Significant observations are also made which can drive future research efforts in fault tolerance for large and very large scale systems. Fault-tolerant Computing Large Scale Systems Adaptive Fault Tolerance Adaptive Process Replication Large Scale Systems - Fault Tolerance Malleability and Rescheduling Large Scale Parallel Systems Proactive Fault Tolerance High Performance Computing Adaptive Fault Management Computer Science
195	Teste de robustez de uma infraestrutura confiável para arquiteturas baseadas em serviços Web / Robustness testing of a reliable infrastructure for web service-based architectures Maja, Willian Yabusame, 1986- 19 August 2018 (has links) Orientador: Eliane Martins / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-19T00:06:46Z (GMT). No. of bitstreams: 1 Maja_WillianYabusame_M.pdf: 5488949 bytes, checksum: 89d142ebb211bdb6d1eec333a99c6727 (MD5) Previous issue date: 2011 / Resumo: Os sistemas baseados em serviços Web estão suscetíveis a diversos tipos de falhas, entre elas, as causadas pelo ambiente em que operam, a Internet, que está sujeita a sofrer com problemas como, atrasos de entrega de mensagem, queda de conexão, mensagens inválidas entre outros. Para que estas falhas não causem um problema maior para quem está interagindo com o serviço Web, existem soluções, como é o caso do Archmeds, que fornece uma infraestrutura confiável que melhora a confiabilidade e disponibilidade dos sistemas baseados em serviços Web. Mas, para o Archmeds ser uma solução confiável, ele também deve ser testado, pois ele também é um sistema que está sujeito a ter defeitos. Por isso, este trabalho propõe uma abordagem para teste de robustez no Archmeds e para isso, contou com o desenvolvimento de uma ferramenta de injeção de falhas chamada WSInject, que utiliza falhas de comunicação e dados de entrada inválidos nos parâmetros das chamadas aos serviços. Com isso espera-se emular as falhas do ambiente real de operação dos serviços Web e revelar os defeitos do sistema sob teste. Este trabalho também levou em conta que o Archmeds é uma composição de serviços Web, por isso também propõe uma abordagem para testar composições de serviços. Com os resultados deste estudo de caso, espera-se que esta abordagem de teste de robustez possa ser reutilizada para outros sistemas baseados em serviços Web / Abstract: Web service-based systems are subject to different types of faults, among them, the ones caused by the environment in which they work, which is the Internet. These faults could be problems like delay of message, connection loss, invalid message request, and others. To avoid that these faults do not become a bigger problem for the clients who are interacting with the Web service, a solution can be the use of a reliable infrastructure, like Archmeds, to increases the reliability and availability of the Web-service-based systems. Although Archmeds is a solution with the aim to increase the reliability of Web services, it is also subject to faults and for this reason, it should also be tested. This work proposes an approach to test the robustness of Archmeds and to reach this goal, a fault injection tool, called WSInject, was developed. It uses communication faults and invalid inputs into services calls. In order to reveal the failures, these faults aim to emulate the real ones that affect the Web services in the real operational environment. This work also took into account that Archmeds is a Web service composition and for this reason, it was created an approach to test it. With the results of this case study, it is expected that this approach can be adapted to others applications based in Web services technology / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Software - Validação Serviços Web Tolerância à falha (Computação) Computer software - Validation Software engineering - Fault injection Web services Fault-tolerant computing
196	Low Overhead Soft Error Mitigation Methodologies Prasanth, V January 2012 (has links) (PDF) CMOS technology scaling is bringing new challenges to the designers in the form of new failure modes. The challenges include long term reliability failures and particle strike induced random failures. Studies have shown that increasingly, the largest contributor to the device reliability failures will be soft errors. Due to reliability concerns, the adoption of soft error mitigation techniques is on the increase. As the soft error mitigation techniques are increasingly adopted, the area and performance overhead incurred in their implementation also becomes pertinent. This thesis addresses the problem of providing low cost soft error mitigation. The main contributions of this thesis include, (i) proposal of a new delayed capture methodology for low overhead soft error detection, (ii) adopting Error Control Coding (ECC) for delayed capture methodology for correction of single event upsets, (iii) analyzing the impact of different derating factors to reduce the hardware overhead incurred by the above implementations, and (iv) proposal for hardware software co-design for reliability based upon critical component identification determined by the application executing on the hardware (as against standalone hardware analysis). This thesis first surveys existing soft error mitigation techniques and their associated limitations. It proposes a new delayed capture methodology as a low overhead soft error detection technique. Delayed capture methodology is an enhancement of the Razor flip-flop methodology. In the delayed capture methodology, the parity for a set of flip-flops is calculated at their inputs and outputs. The input parity is latched on a second clock, which is delayed with respect to the functional clock by more than the soft error pulse width. It requires an extra flip-flop for each set of flip-flops. On the other hand, in the Razor flip-flop methodology an additional flip-flop is required for every functional flip-flop. Due to the skew in the clocks, either the parity flip-flop or the functional flip-flop will capture the effect of transient, and hence by comparing the output parity and latched input parity an error can be detected. Fault injection experiments are performed to evaluate the bneefits and limitations of the proposed approach. The limitations include soft error detection escapes and lack of error correction capability. Different cases of soft error detection escapes are analyzed. They are attributed mainly to a Single Event Upset (SEU) causing multiple flip-flops within a group to be in error. The error space due to SEUs is analyzed and an intelligent flip-flop grouping method using graph theoretic formulations is proposed such that no SEU can cause multiple flip-flops within a group to be in error. Once the error occurs, leaving the correction aspects to the application may not be desirable. The proposed delayed capture methodology is extended to replace parity codes with codes having higher redundancy to enable correction. The hardware overhead due to the proposed methodology is analyzed and an area savings of about 15% is obtained when compared to an existing soft error mitigation methodology with equivalent coverage. The impact of different derating factors in determining the hardware overhead due to the soft error mitigation methodology is then analyzed. We have considered electrical derating and timing derating information for the evaluation purpose. The area overhead of the circuit with implementation of delayed capture methodology, considering different derating factors standalone and in combination is then analyzed. Results indicate that in different circuits, either a combination of these derating factors yield optimal results, or each of them considered standalone. This is due to the dependency of the solution on the heuristic nature of the algorithms used. About 23% area savings are obtained by employing these derating factors for a more optimal grouping of flip-flops. A new paradigm of hardware software co-design for reliability is finally proposed. This is based on application derating in which the application / firmware code is profiled to identify the critical components which must be guarded from soft errors. This identification is based on the ability of the application software to tolerate certain errors in hardware. An algorithm to identify critical components in the control logic based on fault injection is developed. Experimental results indicated that for a safety critical automotive application, only 12% of the sequential logic elements were found to be critical. This approach provides a framework for investigating how software methods can complement hardware methods, to provide a reduced hardware solution for soft error mitigation. Soft Error Mitigation Hardware Reliability Delayed Capture Methodology Fault Tolerant Computing Hardware Derating Transient Faults Soft Errors Soft Error Tolerant Designs Error Control Coding (ECC) Delayed Capture Fault Tolerance Computer Science
197	Grid Fault management techniques: the case of a Grid environment with malicious entities Akimana, Rachel 01 October 2008 (has links) <p>La tolérance et la gestion des fautes dans les grilles de données/calcul est d’une importance capitale. En effet, comme dans tout autre système distribué, les composants d’une grille sont susceptibles de tomber en panne à tout moment. Mais le risque de panne croît avec la taille du système, et est donc plus exacerbé dans un système de grille. En plus, tout en essayant de mettre à profit les ressources offertes par la grille, les applications tournant sur celle-ci sont de plus en plus complexes (ex. impliquent des interactions complexes, prennent des jours d’exécution), ce qui les rend plus vulnérables aux fautes. Le plus difficile dans la gestion des fautes dans une grille, c’est qu’il est difficile de savoir si une faute qui survient sur une entité de la grille est induite malicieusement ou accidentellement.<p><p>Dans notre travail de thèse, nous utilisons le terme faute, au sens large, pour faire référence à tout étant inattendu qui survient sur tout composant de la grille. Certains de ces états provoquent des comportements aussi inattendus et perceptibles au niveau de la grille tandis que d’autres passent inaperçues. De plus, certaines de ces fautes sont le résultat d’une action malveillante alors que d’autres surviennent accidentellement ou instantanément. Dans ce travail de thèse, nous avons traité le cas de ces fautes induites malicieusement, et qui généralement passent inaperçues. Nous avons considéré en particulier le problème de la confidentialité et de l’intégrité des données stockées à long-terme sur la grille.<p><p>L’étude de la confidentialité des données a été faite en deux temps dont la première partie concerne la confidentialité des données actives. Dans cette partie, nous avons considéré une application liée à la recherche des similitudes d’une séquence d’ADN dans une base de données contenant des séquences d’ADN et stockée sur la grille. Pour cela, nous avons proposé une méthode qui permet d’effectuer la comparaison sur un composant distant, mais tout en gardant confidentielle la séquence qui fait l’objet de la comparaison. <p>Concernant les données passives, nous avons proposé une méthode de partage des données confidentielles et chiffrés sur la grille.<p> <p>En rapport avec l’intégrité des données, nous avons considéré le cas des données anonymes dans le cadre de l’intégrité des données passives. Pour les données actives, nous avons considéré le problème de la corruption des jobs exécutés sur la grille. Pour chacune des cas, nous avons proposé des mécanismes permettant de vérifier l’authenticité des données utilisées ou produites par ces applications.<p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences exactes et naturelles Informatique générale Computer system failures Fault-tolerant computing Data protection Computational grids (Computer systems) Pannes système (Informatique) Tolérance aux fautes (Informatique) Grilles informatiques Grids Faults data integrity data confidentiality malicious entities
198	Studies In Automatic Management Of Storage Systems Pipada, Pankaj 06 1900 (has links) (PDF) Autonomic management is important in storage systems and the space of autonomics in storage systems is vast. Such autonomic management systems can employ a variety of techniques depending upon the specific problem. In this thesis, we first take an algorithmic approach towards reliability enhancement and then we use learning along with a reactive framework to facilitate storage optimization for applications. We study how the reliability of non-repairable systems can be improved through automatic reconfiguration of their XOR-coded structure. To this regard we propose to increase the fault tolerance of non-repairable systems by reorganizing the system, after a failure is detected, to a new XOR-code with a better fault tolerance. As errors can manifest during reorganization due to whole reads of multiple submodules, our framework takes them in to account and models such errors as based on access intensity (ie.BER-biterrorrate). We present and evaluate the reliability of an example storage system with and without reorganization. Motivated by the critical need for automating various aspects of data management in virtualized data centers, we study the specific problem of automatically implementing Virtual Machine (VM) migration in a dynamic environment according to some pre-set policies. This is a problem that requires automated identification of various workloads and their execution environments running inside virtual machines in a non-intrusive manner. To this end we propose AuM (for Autonomous Manager) that has the capability to learn workloads by aggregating variety of information obtained from network traces of storage protocols. We use state of the art Machine Learning tools, namely Multiple Kernel learning ,to aggregate information and show that AuM is indeed very accurate in identifying work loads, their execution environments and is also successful in following user set policies very closely for the VM migration tasks. Storage infrastructure in large-scale cloud data center environments must support applications with diverse, time-varying data access patterns while observing the quality of service. To meet service level requirements in such heterogeneous application phases, storage management needs to be phase-aware and adaptive ,i.e. ,identify specific storage access patterns of applications as they occur and customize their handling accordingly. We build LoadIQ, an online application phase detector for networked (file and block) storage systems. In a live deployment , LoadIQ analyzes traces and emits phase labels learnt online. Such labels could be used to generate alerts or to trigger phase-specific system tuning. Automatic Computer Storage Management Computer Storage Systems Machine Learning Automatic Virtual Machine Migration Fault-tolerant Computing Automated Workload Identification Automatic VM Migration Computer Storage Optimization Adaptive Storage Management Workload Phase Identification Fault Tolerance Workload Identification LoadIQ Computer Science
199	Protocolo de difusão síncrona totalmente ordenada para aglomerados de alto desempenho / Synchronous total order broadcast protocol for high performance clusters Cason, Daniel, 1987- 22 August 2018 (has links) Orientador: Luiz Eduardo Buzato / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-22T08:23:33Z (GMT). No. of bitstreams: 1 Cason_Daniel_M.pdf: 1133475 bytes, checksum: 2c269ea2aec943999c502cd1944b09ae (MD5) Previous issue date: 2013 / Resumo: Protocolos de Difusão Totalmente Ordenada (DTO) constituem o núcleo de diversas soluções que dão suporte ao desenvolvimento de aplicações distribuídas tolerantes a falhas. O longo período no qual este problema vem sendo objeto de pesquisa e a quantidade de algoritmos que foram para ele propostos atestam, não só a sua importância, mas também a dificuldade de se obter soluções eficientes para DTO. Este trabalho apresenta um novo algoritmo de DTO, que explora a sincronia e a confiabilidade inerentes ao ambiente dos aglomerados ou clusters de alto desempenho para construir uma solução bastante simples de Difusão Totalmente Ordenada, cujo desempenho experimental mostrou-se comparável ao obtido por soluções de DTO projetadas para modelos assíncronos de computação. O protocolo proposto destina-se ao modelo assíncrono temporizado de computação, aumentado com um mecanismo simples, baseado na difusão de pulsos, para sincronizá-la a execução dos processos. A hipótese que sustenta este mecanismo de sincronização é que os aglomerados modernos, dado que se controle a carga a eles aplicada, podem executar por períodos razoavelmente longos sem que ocorram falhas de processos e apresentando um comportamento bastante similar ao de sistemas síncronos. Dada esta hipótese, os processos que realizam Difusão Totalmente Ordenada tornam-se capazes de construir visões globais da computação distribuída, e a construção de visões globais, por sua vez, torna trivial a resolução de Difusão Totalmente Ordenada. O protocolo proposto tolera uma quantidade ilimitada de falhas de desempenho, que previnem o progresso da solução de DTO, mas que não levam à violação de suas propriedades de segurança, que são asseguradas na presença de assincronia e de falhas de processos. O protocolo foi implementado em Java e o seu desempenho foi avaliado em um aglomerado com máquinas interconectadas via Ethernet. A comparação dos resultados obtidos com os resultados de desempenho publicados para as principais soluções de DTO existentes nos permite afirmar que nossa solução representa um interessante compromisso entre desempenho experimental e simplicidade de projeto e implementação de soluções de Difusão Totalmente Ordenada. Além dos resultados de desempenho, esta pesquisa também revela que ainda há espaço para a exploração prática da interação entre sincronia e assincronia na engenharia de protocolos distribuídos / Abstract: Total order broadcast algorithms are at the core of several toolkits for the construction of fault-tolerant applications. The importance and the difficulty of finding efficient total order broadcast (TOB) algorithms is attested by the long period that such algorithms have been the object of intense research and by the large number of algorithms already proposed. This work presents a new algorithm for total order broadcast that takes advantage of the inherent reliability and timeliness of high performance clusters in its design. Experimental results show that the performance of this very simple TOB is on a par with the performance of TOBs designed for asynchronous computing models. The proposed protocol has been designed for the timed asynchronous computing model, enhanced with a simple pulse-based mechanism that is used to synchronize the processes' execution. The assumption behind the pulse-based synchronization is that modern clusters, given some workload conditioning, can maintain reasonably long failure-free execution periods in which they behave very much as synchronous system. This assumption allows the processes that engage in total order broadcasts to build a global view of their joint computation and this global view, in its turn, allows them to solve total order broadcast in a straightforward way. The protocol tolerates an unbounded number of timing failures, that can prevent its progress but have no impact on its safety, it is also safe in the in the presence of asynchrony, and processes failures. The protocol has been implemented in Java and tested on an Ethernet-based cluster. A comparison of the results obtained in the experiments with results published for other well-known TOBs allow us to conclude that our solution represents an interesting trade-o_ between performance and simplicity of design and implementation for total order broadcasts protocols. Beyond performance, this research seems to indicate that there is still room for the practical exploration of the interplay between synchronicity and asynchronicity in the engineering of distributed protocols / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Algoritmos de ordenação total Tolerância à falha (Computação) Sincronização Computação em nuvem Total-order broadcast algorithms Fault-tolerant computing Synchronization Ethernet (Local area network system) Cloud computing

Search results