Global ETD Search

1	Global Data Computation in a Dedicated Chordal Ring Wang, Xianbing, Teo, Yong Meng 01 1900 (has links) Existing Global Data Computation (GDC) protocols for asynchronous systems are designed for fully connected networks. In this paper, we discuss GDC in a dedicated asynchronous chordal ring, a type of un-fully connected networks. The virtual links approach, which constructs t+1 (t<n) process-disjoint paths for each pair of processes without direct connection to tolerate failures (where t is the maximum number of processes that may crash and n is the total number of processes), can be applied to solve the GDC problem in the chordal but the virtual links approach incurs high message complexity. To reduce the high communication cost, we propose a non round-based GDC protocol for the asynchronous chordal ring with perfect failure detectors. The main advantage of our approach is that there is no notion of round, processes only send messages via direct connections and the implementation of failure detectors does not require process-disjoint paths. Analysis and comparison with the virtual links approach shows that our protocol reduces the message complexity significantly. / Singapore-MIT Alliance (SMA) data computation chordal rings perfect failure detector
2	The Weakest Failure Detector for Solving Wait-Free, Eventually Bounded-Fair Dining Philosophers Song, Yantao 14 January 2010 (has links) This dissertation explores the necessary and sufficient conditions to solve a variant of the dining philosophers problem. This dining variant is defined by three properties: wait-freedom, eventual weak exclusion, and eventual bounded fairness. Wait-freedom guarantees that every correct hungry process eventually enters its critical section, regardless of process crashes. Eventual weak exclusion guarantees that every execution has an infinite suffix during which no two live neighbors execute overlapping critical sections. Eventual bounded fairness guarantees that there exists a fairness bound k such that every execution has an infinite suffix during which no correct hungry process is overtaken more than k times by any neighbor. This dining variant (WF-EBF dining for short) is important for synchronization tasks where eventual safety (i.e., eventual weak exclusion) is sufficient for correctness (e.g., duty-cycle scheduling, self-stabilizing daemons, and contention managers). Unfortunately, it is known that wait-free dining is unsolvable in asynchronous message-passing systems subject to crash faults. To circumvent this impossibility result, it is necessary to assume the existence of bounds on timing properties, such as relative process speeds and message delivery time. As such, it is of interest to characterize the necessary and sufficient timing assumptions to solve WF-EBF dining. We focus on implicit timing assumptions, which can be encapsulated by failure detectors. Failure detectors can be viewed as distributed oracles that can be queried for potentially unreliable information about crash faults. The weakest detector D for WF-EBF dining means that D is both necessary and sufficient. Necessity means that every failure detector that solves WF-EBF dining is at least as strong as D. Sufficiency means that there exists at least one algorithm that solves WF-EBF dining using D. As such, our research goal is to characterize the weakest failure detector to solve WF-EBF dining. We prove that the eventually perfect failure detector 3P is the weakest failure detector for solving WF-EBF dining. 3P eventually suspects crashed processes permanently, but may make mistakes by wrongfully suspecting correct processes finitely many times during any execution. As such, 3P eventually stops suspecting correct processes.
3	Quality of service of crash-recovery failure detectors Ma, Tiejun January 2007 (has links) This thesis presents the results of an investigation into the failure detection problem. We consider the specific case of the Quality of Service (QoS) of crash failure detection. In contrast to previous work, we address the crash failure detection problem when the monitored target is resilient and recovers after failure. To the best of our knowledge, this is the first work to provide an analysis of crash-recovery failure detection from the QoS perspective. We develop a probabilistic model of the behavior of a crash-recovery target, i.e. one which has the ability to recover from the crash state. We show that the fail-free run and the crash-stop run are special cases of the crash-recovery run with mean time to failure (MTTF) approaching to infinity and mean time to recovery (MTTR) approaching to infinity, respectively. We extend the previously published QoS metrics to allow the measurement of the recovery speed, and the definition of the completeness property of a failure detector. Then, the impact of the dependability of the crash-recovery target on the QoS bounds for such a crash-recovery failure detector is analyzed using general dependability metrics, such as MTTF and MTTR, based on an approximate probabilistic model of the two-process failure detection system. Then according to our approximate model, we show how to estimate the failure detector’s parameters to achieve a required QoS, based on Chen et al.’s NFD-S algorithm analytically, and how to execute the configuration procedure of this crash-recovery failure detector. In order to make the failure detector adaptive to the target’s crash-recovery behavior and enable the autonomy of the monitoring procedure, we propose two types of recovery detection protocols. One is a reliable recovery detection protocol, which can guarantee to detect each occurring failure and recovery by adopting persistent storage. The other is a lightweight recovery detection protocol, which does not guarantee to detect every failure and recovery but which reduces the system overhead. Both of these recovery detection protocols improve the completeness without reducing the other QoS aspects of a failure detector. In addition, we also demonstrate how to estimate the inputs, such as the dependability metrics, using the failure detector itself. In order to evaluate our analytical work, we simulate the following failure detection algorithms: the simple heartbeat timeout algorithm, the NFD-S algorithm and the NFDS algorithm with the lightweight recovery detection protocol, for various values of MTTF and MTTR. The simulation results show that the dependability of a recoverable monitored target could have significant impact on the QoS of such a failure detector. This conforms well to our models and analysis. We show that in the case of reasonable long MTTF, the NFD-S algorithm with the lightweight recovery detection protocol exhibits better QoS than the NFD-S algorithm for the completeness of a crash-recovery failure detector, and similarly for other QoS metrics. 620.110287
4	A P2p Based Failure Detection Model For Distributed Systems Celal, Kavuklu 01 August 2006 (has links) (PDF) A comprehensive failure detection model is proposed to detect service failures in asynchronous distributed systems. The proposed model takes advantage of P2P technology to provide required functionality. When compared to similar studies in failure detection, the presented failure detection model is more autonomous in resolving service dependencies, embodies more flexibility in providing different failure detection functions (like unreliable failure detectors, membership services) and offers more security. A failure detection library is developed using JXTA P2P framework to show realization of such a model. QA Computer Software 76.75-76.765
5	ATENTO: UM DETECTOR DE DEFEITOS PARA MANETS BASEADO NA POTÊNCIA DO SINAL / ATENTO: A SIGNAL POWER BASED FAILURE DETECTOR FOR MANETS Baggio, Miguel Angelo 30 December 2010 (has links) Fault detector is an essential component in building reliable distributed systems and its design depends heavily on the model of distributed system, which has demanded several solutions to address the movement of nodes. This dissertation presents a gossip-based unreliable failure detector for mobile ad hoc networks (MANETs) that uses the intensity of received signal to differ faulty nodes from mobile nodes. The differentiation is done by maintaining information on the intensity measurement signal reception in the nodes of the system in a small historic regions. This work also presents a simulator for mobile wireless networks where is possible configure simulations messaging with pre-determined movements of the nodes in the network. The simulator provides the signal strength received in a message, and allows configuration simulate various aspects such as frequency and intensity transmission of a message. The failure detector presented in this work provides a new method that uses more than one rule for deciding to suspect a node. Surveys show improvements in service quality of the detector compared with the traditional gossip algorithm. / Detectar falhas é uma tarefa essencial na construção de sistemas distribuídos confiáveis e seu projeto depende fortemente do modelo de sistema distribuído. Diversas soluções tem sido desenvolvidas para tratar a movimentação de nodos em sistemas distribuídos. Este trabalho apresenta um novo detector de defeitos assíncrono não-confiável para redes móveis ad hoc (MANETs). Nodos falhos são diferenciados de nodos móveis através da manutenção de informação sobre a medida de potência do sinal de recepção nos nodos do sistema em um pequeno histórico de regiões. Neste trabalho também foi desenvolvido um simulador para redes móveis sem fio onde é possível configurar simulações de transmissão de mensagens com movimentos prédeterminados dos nodos na rede. O simulador disponibiliza a potência do sinal recebido em uma mensagem, e permite que sejam configurados diversos aspectos da simulação como freqüência e potência de transmissão de uma mensagem. O detector de defeitos proposto neste trabalho utiliza um método para suspeitar de um nodo que está dentro de sua região de alcance de transmissão e outro método para um nodo que está movimentando-se para fora do seu alcance de transmissão. Avaliações apresentadas neste trabalho demonstram que é possível uma redução no tempo de detecção de uma falha e um menor número de falsas suspeitas, quando comparados com detectores que utilizam apenas gossip. Detector de defeitos Redes sem-fio Mobilidade Potência do sinal Failure detector Wireless networks Mobility and intensity
6	Impact FD : an unreliable failure detector based on process relevance and confidence in the system / Impact FD : um detector de falhas baseado na relevância dos processos e confiaça no sistema Rossetto, Anubis Graciela de Moraes January 2016 (has links) Detectores de falhas não confiáveis tradicionais são oráculos disponíveis localmente para processos deumsistema distribuído que fornecem uma lista de processos suspeitos de terem falhado. Este trabalho propõe um novo e flexível detector de falhas não confiável, chamado Impact FD, que fornece como saída um valor trust level que é o grau de confiança no sistema. Ao expressar a relevância de cada processo por um valor de fator de impacto, bem como por uma margem de falhas aceitáveis do sistema, o Impact FD permite ao usuário ajustar a configuração do detector de falhas de acordo com os requisitos da aplicação: em certos cenários, o defeito de umprocesso de baixo impacto ou redundante não compromete a confiança no sistema, enquanto o defeito de um processo de alto fator de impacto pode afetá-la seriamente. Assim, pode ser adotada uma estragégia de monitoramento com maior ou menor rigor. Em particular, definimos algumas propriedades de flexibilidade que caracterizam a capacidade do Impact FD para tolerar uma certa margem de falhas ou falsas suspeitas, ou seja, a sua capacidade de fornecer diferentes conjuntos de respostas que levam o sistema a estados confiáveis. O Impact FD é adequado para sistemas que apresentam redundância de nodos, heterogeneidade de nodos, recurso de agrupamento e permite uma margem de falhas que não degrada a confiança no sistema. Nós também mostramos que algumas classes do Impact FD são equivalentes a § e , que são detectores de falhas fundamentais para contornar a impossibilidade de resolver o problema do consenso em sistemas de transmissão de mensagens assíncronas na presença de falhas. Adicionalmente, com base em pressupostos de sincronia e nas abordagens baseada em tempo e padrão de mensagem, apresentamos três algoritmos que implementam o Impact FD. Os resultados da avaliação de desempenho usando traces reais do PlanetLab confirmam o grau de aplicabilidade flexível do nosso detector de falhas e, devido à margem aceitável de falhas, o número de falsas respostas ou suspeitas pode ser tolerado quando comparado a tradicionais detectores de falhas não confiáveis. / Traditional unreliable failure detectors are per process oracles that provide a list of processes suspected of having failed. This work proposes a new and flexible unreliable failure detector (FD), denoted the Impact FD, that outputs a trust level value which is the degree of confidence in the system. By expressing the relevance of each process by an impact factor value as well as a margin of acceptable failures of the system, the Impact FD enables the user to tune the failure detection configuration in accordance with the requirements of the application: in some scenarios, the failure of low impact or redundant processes does not jeopardize the confidence in the system, while the crash of a high impact process may seriously affect it. Either a softer or stricter monitoring strategy can be adopted. In particular, we define some flexibility properties that characterize the capacity of the Impact FD to tolerate a certain margin of failures or false suspicions, i.e., its capacity of providing different sets of responses that lead the system to trusted states. The Impact FD is suitable for systems that present node redundancy, heterogeneity of nodes, clustering feature, and allow a margin of failures which does not degrade the confidence in the system. We also show that some classes of the Impact FD are equivalent to and § which are fundamental FDs to circumvent the impossibility of solving the consensus problem in asynchronous message-passing systems in presence of failures. Additionally, based on different synchrony assumptions and message-pattern or timer-based approaches, we present three algorithms which implement the Impact FD. Performance evaluation results using real PlanetLab traces confirmthe degree of flexible applicability of our failure detector and, due to the accepted margin of failures, that false responses or suspicions may be tolerated when compared to traditional unreliable failure detectors. Tolerancia : Falhas : Software Fator de Impacto Fault tolerance Unreliable failure detector Impact factor Trust level of the system Process relevance Flexibility property Margin of failures
7	Impact FD : an unreliable failure detector based on process relevance and confidence in the system / Impact FD : um detector de falhas baseado na relevância dos processos e confiaça no sistema Rossetto, Anubis Graciela de Moraes January 2016 (has links) Detectores de falhas não confiáveis tradicionais são oráculos disponíveis localmente para processos deumsistema distribuído que fornecem uma lista de processos suspeitos de terem falhado. Este trabalho propõe um novo e flexível detector de falhas não confiável, chamado Impact FD, que fornece como saída um valor trust level que é o grau de confiança no sistema. Ao expressar a relevância de cada processo por um valor de fator de impacto, bem como por uma margem de falhas aceitáveis do sistema, o Impact FD permite ao usuário ajustar a configuração do detector de falhas de acordo com os requisitos da aplicação: em certos cenários, o defeito de umprocesso de baixo impacto ou redundante não compromete a confiança no sistema, enquanto o defeito de um processo de alto fator de impacto pode afetá-la seriamente. Assim, pode ser adotada uma estragégia de monitoramento com maior ou menor rigor. Em particular, definimos algumas propriedades de flexibilidade que caracterizam a capacidade do Impact FD para tolerar uma certa margem de falhas ou falsas suspeitas, ou seja, a sua capacidade de fornecer diferentes conjuntos de respostas que levam o sistema a estados confiáveis. O Impact FD é adequado para sistemas que apresentam redundância de nodos, heterogeneidade de nodos, recurso de agrupamento e permite uma margem de falhas que não degrada a confiança no sistema. Nós também mostramos que algumas classes do Impact FD são equivalentes a § e , que são detectores de falhas fundamentais para contornar a impossibilidade de resolver o problema do consenso em sistemas de transmissão de mensagens assíncronas na presença de falhas. Adicionalmente, com base em pressupostos de sincronia e nas abordagens baseada em tempo e padrão de mensagem, apresentamos três algoritmos que implementam o Impact FD. Os resultados da avaliação de desempenho usando traces reais do PlanetLab confirmam o grau de aplicabilidade flexível do nosso detector de falhas e, devido à margem aceitável de falhas, o número de falsas respostas ou suspeitas pode ser tolerado quando comparado a tradicionais detectores de falhas não confiáveis. / Traditional unreliable failure detectors are per process oracles that provide a list of processes suspected of having failed. This work proposes a new and flexible unreliable failure detector (FD), denoted the Impact FD, that outputs a trust level value which is the degree of confidence in the system. By expressing the relevance of each process by an impact factor value as well as a margin of acceptable failures of the system, the Impact FD enables the user to tune the failure detection configuration in accordance with the requirements of the application: in some scenarios, the failure of low impact or redundant processes does not jeopardize the confidence in the system, while the crash of a high impact process may seriously affect it. Either a softer or stricter monitoring strategy can be adopted. In particular, we define some flexibility properties that characterize the capacity of the Impact FD to tolerate a certain margin of failures or false suspicions, i.e., its capacity of providing different sets of responses that lead the system to trusted states. The Impact FD is suitable for systems that present node redundancy, heterogeneity of nodes, clustering feature, and allow a margin of failures which does not degrade the confidence in the system. We also show that some classes of the Impact FD are equivalent to and § which are fundamental FDs to circumvent the impossibility of solving the consensus problem in asynchronous message-passing systems in presence of failures. Additionally, based on different synchrony assumptions and message-pattern or timer-based approaches, we present three algorithms which implement the Impact FD. Performance evaluation results using real PlanetLab traces confirmthe degree of flexible applicability of our failure detector and, due to the accepted margin of failures, that false responses or suspicions may be tolerated when compared to traditional unreliable failure detectors. Tolerancia : Falhas : Software Fator de Impacto Fault tolerance Unreliable failure detector Impact factor Trust level of the system Process relevance Flexibility property Margin of failures
8	Impact FD : an unreliable failure detector based on process relevance and confidence in the system / Impact FD : um detector de falhas baseado na relevância dos processos e confiaça no sistema Rossetto, Anubis Graciela de Moraes January 2016 (has links) Detectores de falhas não confiáveis tradicionais são oráculos disponíveis localmente para processos deumsistema distribuído que fornecem uma lista de processos suspeitos de terem falhado. Este trabalho propõe um novo e flexível detector de falhas não confiável, chamado Impact FD, que fornece como saída um valor trust level que é o grau de confiança no sistema. Ao expressar a relevância de cada processo por um valor de fator de impacto, bem como por uma margem de falhas aceitáveis do sistema, o Impact FD permite ao usuário ajustar a configuração do detector de falhas de acordo com os requisitos da aplicação: em certos cenários, o defeito de umprocesso de baixo impacto ou redundante não compromete a confiança no sistema, enquanto o defeito de um processo de alto fator de impacto pode afetá-la seriamente. Assim, pode ser adotada uma estragégia de monitoramento com maior ou menor rigor. Em particular, definimos algumas propriedades de flexibilidade que caracterizam a capacidade do Impact FD para tolerar uma certa margem de falhas ou falsas suspeitas, ou seja, a sua capacidade de fornecer diferentes conjuntos de respostas que levam o sistema a estados confiáveis. O Impact FD é adequado para sistemas que apresentam redundância de nodos, heterogeneidade de nodos, recurso de agrupamento e permite uma margem de falhas que não degrada a confiança no sistema. Nós também mostramos que algumas classes do Impact FD são equivalentes a § e , que são detectores de falhas fundamentais para contornar a impossibilidade de resolver o problema do consenso em sistemas de transmissão de mensagens assíncronas na presença de falhas. Adicionalmente, com base em pressupostos de sincronia e nas abordagens baseada em tempo e padrão de mensagem, apresentamos três algoritmos que implementam o Impact FD. Os resultados da avaliação de desempenho usando traces reais do PlanetLab confirmam o grau de aplicabilidade flexível do nosso detector de falhas e, devido à margem aceitável de falhas, o número de falsas respostas ou suspeitas pode ser tolerado quando comparado a tradicionais detectores de falhas não confiáveis. / Traditional unreliable failure detectors are per process oracles that provide a list of processes suspected of having failed. This work proposes a new and flexible unreliable failure detector (FD), denoted the Impact FD, that outputs a trust level value which is the degree of confidence in the system. By expressing the relevance of each process by an impact factor value as well as a margin of acceptable failures of the system, the Impact FD enables the user to tune the failure detection configuration in accordance with the requirements of the application: in some scenarios, the failure of low impact or redundant processes does not jeopardize the confidence in the system, while the crash of a high impact process may seriously affect it. Either a softer or stricter monitoring strategy can be adopted. In particular, we define some flexibility properties that characterize the capacity of the Impact FD to tolerate a certain margin of failures or false suspicions, i.e., its capacity of providing different sets of responses that lead the system to trusted states. The Impact FD is suitable for systems that present node redundancy, heterogeneity of nodes, clustering feature, and allow a margin of failures which does not degrade the confidence in the system. We also show that some classes of the Impact FD are equivalent to and § which are fundamental FDs to circumvent the impossibility of solving the consensus problem in asynchronous message-passing systems in presence of failures. Additionally, based on different synchrony assumptions and message-pattern or timer-based approaches, we present three algorithms which implement the Impact FD. Performance evaluation results using real PlanetLab traces confirmthe degree of flexible applicability of our failure detector and, due to the accepted margin of failures, that false responses or suspicions may be tolerated when compared to traditional unreliable failure detectors. Tolerancia : Falhas : Software Fator de Impacto Fault tolerance Unreliable failure detector Impact factor Trust level of the system Process relevance Flexibility property Margin of failures
9	UMA NOVA ABORDAGEM PARA REDUÇÃO DE MENSAGENS DE CONTROLE EM DETECTORES DE DEFEITOS / A New Approach to Reduce Control Messages in Failure Detectors Turchetti, Rogério Corrêa 15 May 2006 (has links) An unreliable failure detector is a basic building block widely used to implement fault tolerance techniques in asynchronous distributed systems. The use of failure detectors comes from the impossibility to implement deterministic agreement protocols in these environments, since it is not possible to distinguish a crashed process from a very slow process. However, the massive use of distributed computational resources claims for solutions applicable in large scale distributed systems. In these systers, traditional failure detector algorithms can present scalability problems, such as control message explosion problem, because a large number of messages could compromise the quality of service of failure detectors and the system scalability. The goal of this dissertation is minimize the problem of control message explosion generated by failure detector algorithms in large scale processes monitoring. To do that, we propose a new approach to reduce the number of control messages from reusing messages. Our approach explores the manipulation of the interrogation period or heartbeat period, maximizing the reuse of messages, and it is organized by two strategies: ATF (Frequency Rate Adaptation), that reuses failure detector messages to suppress control messages; and AMA (Reusing of Application Message), that reuses client application messages to suppress control messages. As result, the resulting approach is generic, in the sense that it could be applied to any failure detector algorithm, and practical, in the sense that for its, the traditional failure detectors algorithms need only to change the semantic of control messages. From our experimental results, we demonstrate that our approach reduces the number of control messages, minimizing the message explosion problem, without compromising the quality of service of the failure detector / Detectores de defeitos não con�áveis são amplamente utilizados como bloco básico na implementa ção de técnicas de tolerância a falhas em sistemas distribuídos assíncronos. Sua utilização nestes ambientes é motivada pela impossibilidade de implementação de protocolos de acordo determinísticos, pois não há como distinguir processos defeituosos daqueles de acesso mais lento. Entretanto, o uso maciço de recursos computacionais exige soluções aplicáveis a sistemas distribuídos de larga escala. Neste contexto, algoritmos tradicionais de detecção de defeitos podem apresentar problemas de escalabilidade, tal como o de explosão de mensagens. O grande número de mensagens enviadas pode comprometer a qualidade de serviço do detector de defeitos e a escalabilidade do sistema. Esta dissertação visa minimizar o problema da explosão de mensagens de controle geradas pelos algoritmos de detecção de defeitos em ações de monitoramento de processos. Para tal, propõe-se uma nova abordagem para redução do número de mensagens de controle através do reaproveitamento de mensagens. A abordagem explora a manipulação da periodicidade de envio das mensagens de controle, maximizando o reaproveitamento de mensagens, e é composta por duas estratégias: ATF (Adaptação da Taxa de Freqüência), a qual reaproveita mensagens dos próprios algoritmos de detecção para suprir mensagem de controle; e AMA (Aproveitamento de Mensagens da Aplicação), a qual reaproveita mensagens das aplicações clientes para o mesmo objetivo da ATF. Como resultado, têm-se uma abordagem genérica, no sentido que pode ser aplicada a qualquer algoritmo de detecção, e prática, no sentido que algoritmos tradicionais de detectores de defeitos necessitam apenas alterar a semântica das mensagens de controle para utilizá-la. Através de experimentos demonstra-se que sua aplicação reduz o número de mensagens de controle, minimizando o problema da explosão de mensagens, sem comprometer a qualidade de serviço do detector de defeitos Detectores de defeitos Tolerância a falhas Sistemas distribuídos assíncronos Explosão de mensagens Reaproveitamento de mensagens Failure detector Fault tolerance Asynchronous distributed systems Message explosion Reuse of messages

Search results