• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 70
  • 25
  • 14
  • 12
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 140
  • 51
  • 35
  • 28
  • 27
  • 26
  • 24
  • 23
  • 23
  • 22
  • 20
  • 19
  • 18
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Produktionsplanering i massproducerande MTO-företag : Med fokus på leveranssäkerhet och kostandseffektivitet / Production planning within MTO companies with mass production : Focus on delivery dependability and cost efficiency

Forsberg, Elin, Johansson, Malin January 2019 (has links)
Syfte – Syftet med studien är att identifiera förbättringsförslag relaterade till produktionsplanering för att bidra till leveranssäkerhet och kostnadseffektivitet i producerande företag. För att besvara syftet har tre frågeställningar formulerats: 1. Vilka faktorer är kritiska inom produktionsplanering?2. Hur säkerställs att produktionsplanering bidrar till leveranssäkerhet respektive kostnadseffektivitet?3. Hur kan leveranssäkerhet och kostnadseffektivitet balanseras? Metod – En förstudie genomfördes för att definiera problemområdet och formulera ett syfte med undersökningen. Därefter utfördes en fallstudie och litteratursökning för att samla in empiri och teori. Fallstudien genomfördes på ett massproducerande MTO-företag och bestod av intervjuer samt dokumentstudier. Litteraturstudien skapade ett teoretiskt ramverk kring produktionsplanering, leveranssäkerhet och kostnadseffektivitet. Slutligen analyserades empirin och teorin för att generera förbättringsförslag. Resultat – I studien identifierades kritiska faktorer i teorin och empirin med koppling till produktionsplanering. Analysen visade att massproducerande MTO-företag bör säkerställa kvalitet och leveranssäkerhet innan flexibilitet och till sist kostnadseffektivitet. För att hantera de här faktorerna presenterades förbättringsförslag vilka kan påverka företags leveranssäkerhet och kostnadseffektivitet. Teoretiskt och Empiriskt bidrag – Studiens teoretiska bidrag består av ett underlag för hur kritiska faktorer inom massproducerande MTO-företags produktionsplanering kan hanteras. Studien bidrar med förståelse för produktionsplaneringens inverkan på ett företags leveranssäkerhet och kostnadseffektivitet. Det empiriska bidraget utgörs av förbättringsförslag inom produktionsplaneringen för massproducerande MTO-företag. Begränsningar – Studien är en fallstudie med enfallsdesign vilket begränsar generaliserbarheten. Resultat kan endast generaliseras till andra massproducerande MTOföretag och till viss del MTS-företag. Trovärdigheten i studien har stärkts genom många intervjurespondenter samt genom dokumentstudier. / Purpose – The purpose with the study is to identify improvements within production plans to ensure that they contribute to delivery dependability and cost efficiency in manufacturing companies. To be able to answer the purpose three research questions have been formulated. 1. What factors are critical within production planning? 2. How to ensure that production plans contribute to delivery dependability respective cost efficiency? 3. How can delivery dependability and cost efficiency be balanced? Method – A pilot study was conducted in order to define the problem area and formulate a purpose for the study. Then a case study and a literature review were performed to provide empirical and theoretical data. The case study was conducted at a manufacturing company with a mass production and a MTO strategy. The data was collected through interviews and document studies. The literature study created a theoretical framework regarding production planning, delivery dependability and cost efficiency. Finally, the empirical and theoretical data were analysed to generate improvements. Findings – In the study critical factors were identified within production planning. The analysis indicated that manufacturing companies with a mass production and a MTO strategy should ensure quality and delivery dependability before flexibility and at last, cost efficiency. To manage these factors improvements were presented that can improve delivery dependability and cost efficiency at a company. Implications – The theoretical contribution consists of a support system how to handle critical factors within a manufacturing company with a mass production and a MTO strategy. The study contributes with an insight on how the production planning effects delivery dependability and cost efficiency in a company. The empirical contribution consists of improvements within production planning for manufacturing companies with a mass production and a MTO strategy. Limitations – The study is a single case study which delimitate the generalizability. The result of the study can only be generalized to other companies with a mass production and a MTO strategy and partially to companies with a mass production and a MTS strategy. The reliability of the study has been strengthened through several interview respondents together with document studies.
22

Diagnostic en réseau de mobiles communicants, stratégies de répartition de diagnostic en fonction de contraintes de l'application / Diagnostic of mobiles networks, strategies for the diagnostic distribution as a function of the application constraints

Sassi, Insaf 27 November 2017 (has links)
Dans la robotique mobile, le réseau de communication est un composant important du système global pour que le système accomplisse sa mission. Dans un tel type de système, appelé un système commandé en réseau sans fil (SCR sans fil ou WNCS), l’intégration du réseau sans fil dans la boucle de commande introduit des problèmes qui ont un impact sur la performance et la stabilité i.e, sur la qualité de commande (QoC). Cette QoC dépend alors de la qualité de service (QoS) et la performance du système va donc dépendre des paramètres de la QoS. C’est ainsi que l’étude de l’influence des défauts du réseau sans fil sur la QoC est cruciale. Le WNCS est un système temps réel qui a besoin d’un certain niveau de QoS pour une bonne performance. Cependant, la nature probabiliste du protocole de communication CSMA/CA utilisé dans la plupart des technologies sans fil ne garantit pas les contraintes temps réel. Il faut alors une méthode probabiliste pour analyser et définir les exigences de l’application en termes de QoS, c’est-à-dire en termes de délai, de gigue, de débit, et de perte de paquets. Une première contribution de cette thèse consiste à étudier les performances et la fiabilité d’un réseau sans fil IEEE 802.11 pour des WNCSs qui partagent le même réseau et le même serveur de commandes en développant un modèle stochastique. Ce modèle est une chaîne de Markov qui modélise la méthode d’accès au canal de communication. Ce modèle a servi pour définir les paramètres de la QoS qui peuvent garantir une bonne QoC. Nous appliquons notre approche à un robot mobile commandé par une station distante. Le robot mobile a pour mission d’atteindre une cible en évitant les obstacles. Pour garantir l’accomplissement de cette mission, une méthode de diagnostic probabiliste est primordiale puisque le comportement du système n’est pas déterministe. La deuxième contribution a été d’établir la méthode probabiliste qui sert à surveiller le bon déroulement de la mission et l’état du robot. C’est un réseau bayésien (RB) modulaire qui modélise les relations de dépendance cause-à-effet entre les défaillances qui ont un impact sur la QoC du système. La dégradation de la QoC peut être due soit à un problème lié à l’état interne du robot, soit à un problème lié à la QoS, soit à un problème lié au contrôleur lui-même. Les résultats du modèle markovien sont utilisés dans le RB modulaire pour définir l'espace d'état de ses variables (étude qualitative) et pour définir les probabilités conditionnelles de l'état de la QoS (étude quantitative). Le RB permet d’éviter la dégradation de la QoC en prenant la bonne décision qui assure la continuité de la mission. En effet, dans une approche de co-design, quand le RB détecte une dégradation de la QoC due à une mauvaise QoS, la station envoie un ordre au robot pour qu'il change son mode de fonctionnement ou qu'il commute sur un autre contrôleur débarqué. Notre hypothèse est que l’architecture de diagnostic est différente en fonction des modes de fonctionnement : nous optons pour un RB plus global et partagé lorsque le robot est connecté à la station et pour RB interne au robot lorsqu’il est autonome. La commutation d’un mode de fonctionnement débarqué à un mode embarqué implique la mise à jour du RB. Un autre apport de cette thèse est la définition d’une stratégie de commutation entre les modes de diagnostic : commutation d’un RB distribué à un RB monolithique embarqué quand le réseau de communication ne fait plus partie de l'architecture du système et vice-versa. Les résultats d’inférence et de scénario de diagnostic ont montré la pertinence de l’utilisation des RBs distribués modulaires. Ils ont aussi montré la capacité du RB développé à détecter la dégradation de la QoC et de la QoS et à superviser l’état du robot. L’aspect modulaire du RB a permis de faciliter la reconfiguration de l’outil de diagnostic selon l’architecture de commande ou de communication adaptée (RB distribué ou RB monolithique embarqué). / In mobile robotics systems, the communication network is an important component of the overall system, it enables the system to accomplish its mission. Such a system is called Wireless Networked Control System WNCS where the integration of the wireless network into the control loop introduces problems that impact its performance and stability i.e, its quality of control (QoC). This QoC depends on the quality of service (QoS) therefore, the performance of the system depends on the parameters of the QoS. The study of the influence of wireless network defects on the QoC is crucial. WNCS is considered as a real-time system that requires a certain level of QoS for good performance. However, the probabilistic behavior of the CSMA / CA communication protocol used in most wireless technologies does not guarantee real-time constraints. A probabilistic method is then needed to analyze and define the application requirements in terms of QoS: delay, jitter, rate, packet loss. A first contribution of this thesis is to study the performance and reliability of an IEEE 802.11 wireless network for WNCSs that share the same network and the same control server by developing a stochastic model. This model is a Markov chain that models the access procedure to the communication channel. This model is used to define the QoS parameters that can guarantee the good QoC. In this thesis, we apply our approach to a mobile robot controlled by a remote station. The mobile robot aims to reach a target by avoiding obstacles, a classic example of mobile robotics applications. To ensure that its mission is accomplished, a probabilistic diagnostic method is essential because the system behavior is not deterministic. The second contribution of this thesis is to establish the probabilistic method used to monitor the robot mission and state. It is a modular Bayesian network BN that models cause-and-effect dependency relationships between failures that have an impact on the system QoC. The QoC degradation may be due either to a problem related to the internal state of the robot, a QoS problem or a controller problem. The results of the Markov model analysis are used in the modular BN to define its variables states (qualitative study) and to define the conditional probabilities of the QoS (quantitative study). It is an approach that permits to avoid the QoC degradation by making the right decision that ensures the continuity of the mission. In a co-design approach, when the BN detects a degradation of the QoC due to a bad QoS, the station sends an order to the robot to change its operation mode or to switch to another distant controller. Our hypothesis is that the diagnostic architecture depends on the operation mode. A distributed BN is used when the robot is connected to the station and a monolithic embedded BN when it is autonomous. Switching from a distributed controller to an on-board one involves updating the developed BN. Another contribution of this thesis consists in defining a switching strategy between the diagnostic modes: switching from a distributed BN to an on-board monolithic BN when the communication network takes no longer part of the system architecture and vice versa -versa. The inference and diagnostic scenarii results show the relevance of using distributed modular BNs. They also prove the ability of the developed BN to detect the degradation of QoC and QoS and to supervise the state of the robot. The modular structure of the BN facilitates the reconfiguration of the diagnostic policy according to the adapted control and communication architecture (distributed BN or on-board monolithic RB).
23

Une approche adaptative basée sur la diversité pour la gestion des fautes dans les services Web / An adaptive diversity-based approach for managing faults in Web services

Abdeldjelil, Hanane 20 November 2013 (has links)
Les services Web tolérants aux fautes sont des composants avec une grande résilience aux défaillances qui résultent de différentes fautes imprévues, par exemple des bugs logiciels ou crash de machine. Comme il est impossible de prévoir l'apparition d'éventuelles fautes, de nombreuses stratégies consistent à dupliquer, d'une manière passive ou active, les composants critiques (eg. services Web) qui interagissent durant une exécution d'application distribuée (eg. composition). La capacité d'une application à continuer l exécution en présence de défaillances de composants référé a la Tolérance aux Fautes (TF). La duplication est la solution largement utilisée pour rendre les composants tolérants aux fautes. La TF peut être assurée à travers la réplication ou la diversité. Nous nous intéressons particulièrement dans cette thèse à la diversité, et nous montrons comment un ensemble de services Web sémantiquement équivalents qui fournissent la même fonctionnalité (eg. prévisions météo), mais qui l'implémentent différemment, collaborent pour rendre un service Web TF. Nous illustrons les limites de la réplication (présence de fautes répliquées), et proposons la diversité comme une solution alternative. En effet, la littérature a révélé un intérêt limité dans l'utilisation de la diversité pour rendre les services Web tolérants aux fautes / Fault Tolerant Web services are components with higher resilience to failures that result out of various unexpected faults for instance software bugs and machine crashes. Since it is impractical to predict the potential occurrence of a fault, a widely used strategy consists of duplicating, in a passive or active way, critical components (e.g., Web services) that interact during a distributed application execution (e.g., composition). The ability of this application to continue operation despite component failures is referred to as Fault Tolerance (FT). Duplication is usually put forward as a solution to make these components fault tolerant. It is achieved through either replication or diversity. In this thesis, we are particularly interested in diversity, and we show how semantically similar Web services, i.e., offer same functionality (e.g., Weather Forecast) but implement this functionality differently in terms of business logic and technical resources, collaborate together to make web services fault tolerant. We illustrate the limitations of replication (e.g., presence of replicated faults) and suggests diversity as an alternative solution. Our literature review revealed a limited interest in diversity for FT Web services
24

ELICERE: o processo de elicitação de metas de dependabilidade para sistemas computacionais críticos: estudo de caso aplicado a área espacial. / ELICERE: a process for defining dependability goals for critical computer system: case study apply to space area.

Lahoz, Carlos Henrique Netto 06 August 2009 (has links)
Os avanços tecnológicos na eletrônica e no software têm sido rapidamente assimilados pelos sistemas computacionais demandando novas abordagens para a engenharia de sistemas e de software prover produtos confiáveis, sob critérios bem estabelecidos de qualidade. Dentro deste contexto, o processo de elicitação de requisitos tem um papel estratégico no desenvolvimento de projetos. Problemas na atividade de elicitação contribuem para produzir requisitos pobres, inadequados ou mesmo inexistentes que podem causar a perda de uma missão, desastres materiais e financeiros, a extinção prematura de um projeto ou promover uma crise organizacional. Esta tese apresenta o processo de elicitação de metas de dependabilidade, chamado ELICERE, aplicado em sistemas computacionais críticos, que se fundamenta nas técnicas de engenharia de requisitos orientada a metas, chamada i*, e nas técnicas de engenharia de segurança HAZOP e FMEA, que identificam e analisam perigos operacionais de um sistema. Depois de criar os modelos do sistema usando os diagramas i*, eles são analisados através de palavras-guia baseadas no HAZOP e FMEA, de onde as metas relacionadas à dependabilidade são extraídas. Através desta abordagem interdisciplinar, ELICERE promove a identificação de metas, que atendam aos requisitos de qualidade, relativos à dependabilidade, para sistemas computacionais críticos ainda na fase de concepção de um projeto. A abordagem do estudo de caso é baseada em um estudo qualitativo e descritivo de um caso único, usando o projeto de um foguete lançador hipotético, chamado V-ALFA. A aplicação do ELICERE neste projeto espacial teve a intenção de aperfeiçoar as atividades de engenharia de requisitos do sistema computacional do Veículo Lançador de Satélites Brasileiro, e também como forma de explicar como o processo ELICERE funciona. / The technological advances in electronic and software have been rapidly assimilated by computer systems, demanding new approaches for software and system engineering to provide reliable products, under well-known quality criteria. In this context, requirements engineering has a strategic role in project development. Problems in the elicitation activity contribute to producing poor, inadequate or even non-existent requirements that can cause mission losses, material or financial disasters, premature project termination or promote an organizational crisis. This thesis introduces the dependability goals elicitation process, called ELICERE, applied to critical computer systems based on a goal-oriented requirement engineering technique, called i*, and the safety engineering techniques HAZOP and FMEA, which will be applied for the identification and analysis of operational risks of a system. After creating the system models using i* diagrams, they are analyzed through guidewords based on HAZOP and FMEA, from which goals related to dependability are extracted. Through this interdisciplinary approach, ELICERE promotes the identification of goals that meet the quality requirements, related to dependability for critical systems, still in the project conception phase. The case study approach is based on a qualitative and descriptive single-case, using a computer system project of a hypothetical launching rocket, called V-ALFA. The ELICERE application in this space project intends to improve the requirement engineering activities in the computer system of the Brazilian Satellite Launch Vehicle, and also a way to explain how the ELICERE process works.
25

Analysing and supporting the reliability decision-making process in computing systems with a reliability evaluation framework / Analyser et supporter le processus de prise de décision dans la fiabilité des systèmes informatiques avec un framework d'évaluation de fiabilité

Kooli, Maha 01 December 2016 (has links)
La fiabilité est devenu un aspect important de conception des systèmes informatiques suite à la miniaturisation agressive de la technologie et le fonctionnement non interrompue qui introduisent un grand nombre de sources de défaillance des composantes matérielles. Le système matériel peut être affecté par des fautes causées par des défauts de fabrication ou de perturbations environnementales telles que les interférences électromagnétiques, les radiations externes ou les neutrons de haute énergie des rayons cosmiques et des particules alpha. Pour les systèmes embarqués et systèmes utilisés dans les domaines critiques pour la sécurité tels que l'avionique, l'aérospatiale et le transport, la présence de ces fautes peut endommager leurs composants et conduire à des défaillances catastrophiques. L'étude de nouvelles méthodes pour évaluer la fiabilité du système permet d'aider les concepteurs à comprendre les effets des fautes sur le système, et donc de développer des produits fiables et sûrs. En fonction de la phase de conception du système, le développement de méthodes d'évaluation de la fiabilité peut réduire les coûts et les efforts de conception, et aura un impact positif le temps de mise en marché du produit.L'objectif principal de cette thèse est de développer de nouvelles techniques pour évaluer la fiabilité globale du système informatique complexe. L'évaluation vise les fautes conduisant à des erreurs logicielles. Ces fautes peuvent se propager à travers les différentes structures qui composent le système complet. Elles peuvent être masquées lors de cette propagation soit au niveau technologique ou architectural. Quand la faute atteint la partie logicielle du système, elle peut endommager les données, les instructions ou le contrôle de flux. Ces erreurs peuvent avoir un impact sur l'exécution correcte du logiciel en produisant des résultats erronés ou empêcher l'exécution de l'application.Dans cette thèse, la fiabilité des différents composants logiciels est analysée à différents niveaux du système (en fonction de la phase de conception), mettant l'accent sur le rôle que l'interaction entre le matériel et le logiciel joue dans le système global. Ensuite, la fiabilité du système est évaluée grâce à des méthodologies d'évaluation flexible, rapide et précise. Enfin, le processus de prise de décision pour la fiabilité des systèmes informatiques est pris en charge avec les méthodes et les outils développés. / Reliability has become an important design aspect for computing systems due to the aggressive technology miniaturization and the uninterrupted performance that introduce a large set of failure sources for hardware components. The hardware system can be affected by faults caused by physical manufacturing defects or environmental perturbations such as electromagnetic interference, external radiations, or high-energy neutrons from cosmic rays and alpha particles.For embedded systems and systems used in safety critical fields such as avionic, aerospace and transportation, the presence of these faults can damage their components and leads to catastrophic failures. Investigating new methods to evaluate the system reliability helps designers to understand the effects of faults on the system, and thus to develop reliable and dependable products. Depending on the design phase of the system, the development of reliability evaluation methods can save the design costs and efforts, and will positively impact product time-to-market.The main objective of this thesis is to develop new techniques to evaluate the overall reliability of complex computing system running a software. The evaluation targets faults leading to soft errors. These faults can propagate through the different structures composing the full system. They can be masked during this propagation either at the technological or at the architectural level. When a fault reaches the software layer of the system, it can corrupt data, instructions or the control flow. These errors may impact the correct software execution by producing erroneous results or prevent the execution of the application leading to abnormal termination or application hang.In this thesis, the reliability of the different software components is analyzed at different levels of the system (depending on the design phase), emphasizing the role that the interaction between hardware and software plays in the overall system. Then, the reliability of the system is evaluated via a flexible, fast, and accurate evaluation framework. Finally, the reliability decision-making process in computing systems is comprehensively supported with the developed framework (methodology and tools).
26

ELICERE: o processo de elicitação de metas de dependabilidade para sistemas computacionais críticos: estudo de caso aplicado a área espacial. / ELICERE: a process for defining dependability goals for critical computer system: case study apply to space area.

Carlos Henrique Netto Lahoz 06 August 2009 (has links)
Os avanços tecnológicos na eletrônica e no software têm sido rapidamente assimilados pelos sistemas computacionais demandando novas abordagens para a engenharia de sistemas e de software prover produtos confiáveis, sob critérios bem estabelecidos de qualidade. Dentro deste contexto, o processo de elicitação de requisitos tem um papel estratégico no desenvolvimento de projetos. Problemas na atividade de elicitação contribuem para produzir requisitos pobres, inadequados ou mesmo inexistentes que podem causar a perda de uma missão, desastres materiais e financeiros, a extinção prematura de um projeto ou promover uma crise organizacional. Esta tese apresenta o processo de elicitação de metas de dependabilidade, chamado ELICERE, aplicado em sistemas computacionais críticos, que se fundamenta nas técnicas de engenharia de requisitos orientada a metas, chamada i*, e nas técnicas de engenharia de segurança HAZOP e FMEA, que identificam e analisam perigos operacionais de um sistema. Depois de criar os modelos do sistema usando os diagramas i*, eles são analisados através de palavras-guia baseadas no HAZOP e FMEA, de onde as metas relacionadas à dependabilidade são extraídas. Através desta abordagem interdisciplinar, ELICERE promove a identificação de metas, que atendam aos requisitos de qualidade, relativos à dependabilidade, para sistemas computacionais críticos ainda na fase de concepção de um projeto. A abordagem do estudo de caso é baseada em um estudo qualitativo e descritivo de um caso único, usando o projeto de um foguete lançador hipotético, chamado V-ALFA. A aplicação do ELICERE neste projeto espacial teve a intenção de aperfeiçoar as atividades de engenharia de requisitos do sistema computacional do Veículo Lançador de Satélites Brasileiro, e também como forma de explicar como o processo ELICERE funciona. / The technological advances in electronic and software have been rapidly assimilated by computer systems, demanding new approaches for software and system engineering to provide reliable products, under well-known quality criteria. In this context, requirements engineering has a strategic role in project development. Problems in the elicitation activity contribute to producing poor, inadequate or even non-existent requirements that can cause mission losses, material or financial disasters, premature project termination or promote an organizational crisis. This thesis introduces the dependability goals elicitation process, called ELICERE, applied to critical computer systems based on a goal-oriented requirement engineering technique, called i*, and the safety engineering techniques HAZOP and FMEA, which will be applied for the identification and analysis of operational risks of a system. After creating the system models using i* diagrams, they are analyzed through guidewords based on HAZOP and FMEA, from which goals related to dependability are extracted. Through this interdisciplinary approach, ELICERE promotes the identification of goals that meet the quality requirements, related to dependability for critical systems, still in the project conception phase. The case study approach is based on a qualitative and descriptive single-case, using a computer system project of a hypothetical launching rocket, called V-ALFA. The ELICERE application in this space project intends to improve the requirement engineering activities in the computer system of the Brazilian Satellite Launch Vehicle, and also a way to explain how the ELICERE process works.
27

Reconfiguration dynamique des architectures orientées services

Fredj, Manel 10 December 2009 (has links) (PDF)
Reconfiguration dynamique des architectures orientées services
28

A tool for automatic formal analysis of fault tolerance

Nilsson, Markus January 2005 (has links)
<p>The use of computer-based systems is rapidly increasing and such systems can now be found in a wide range of applications, including safety-critical applications such as cars and aircrafts. To make the development of such systems more efficient, there is a need for tools for automatic safety analysis, such as analysis of fault tolerance.</p><p>In this thesis, a tool for automatic formal analysis of fault tolerance was developed. The tool is built on top of the existing development environment for the synchronous language Esterel, and provides an output that can be visualised in the Item toolkit for fault tree analysis (FTA). The development of the tool demonstrates how fault tolerance analysis based on formal verification can be automated. The generated output from the fault tolerance analysis can be represented as a fault tree that is familiar to engineers from the traditional FTA analysis. The work also demonstrates that interesting attributes of the relationship between a critical fault combination and the input signals can be generated automatically.</p><p>Two case studies were used to test and demonstrate the functionality of the developed tool. A fault tolerance analysis was performed on a hydraulic leakage detection system, which is a real industrial system, but also on a synthetic system, which was modeled for this purpose.</p>
29

Restoring Consistency after Network Partitions

Asplund, Mikael January 2007 (has links)
<p>The software industry is facing a great challenge. While systems get more complex and distributed across the world, users are becoming more dependent on their availability. As systems increase in size and complexity so does the risk that some part will fail. Unfortunately, it has proven hard to tackle faults in distributed systems without a rigorous approach. Therefore, it is crucial that the scientific community can provide answers to how distributed computer systems can continue functioning despite faults.</p><p>Our contribution in this thesis is regarding a special class of faults which occurs whennetwork links fail in such a way that parts of the network become isolated, such faults are termed network partitions. We consider the problem of how systems that have integrity constraints on data can continue operating in presence of a network partition. Such a system must act optimistically while the network is split and then perform a some kind of reconciliation to restore consistency afterwards.</p><p>We have formally described four reconciliation algorithms and proven them correct. The novelty of these algorithms lies in the fact that they can restore consistency after network partitions in a system with integrity constraints and that one of the protocols allows the system to provide service during the reconciliation. We have implemented and evaluated the algorithms using simulation and as part of a partition-tolerant CORBA middleware. The results indicate that it pays off to act optimistically and that it is worthwhile to provide service during reconciliation.</p>
30

UpRight fault tolerance

Clement, Allen Grogan 13 November 2012 (has links)
Experiences with computer systems indicate an inconvenient truth: computers fail and they fail in interesting ways. Although using redundancy to protect against fail-stop failures is common practice, non-fail-stop computer and network failures occur for a variety of reasons including power outage, disk or memory corruption, NIC malfunction, user error, operating system and application bugs or misconfiguration, and many others. The impact of these failures can be dramatic, ranging from service unavailability to stranding airplane passengers on the runway to companies closing. While high-stakes embedded systems have embraced Byzantine fault tolerant techniques, general purpose computing continues to rely on techniques that are fundamentally crash tolerant. In a general purpose environment, the current best practices response to non-fail-stop failures can charitably be described as pragmatic: identify a root cause and add checksums to prevent that error from happening again in the future. Pragmatic responses have proven effective for patching holes and protecting against faults once they have occurred; unfortunately the initial damage has already been done, and it is difficult to say if the patches made to address previous faults will protect against future failures. We posit that an end-to-end solution based on Byzantine fault tolerant (BFT) state machine replication is an efficient and deployable alternative to current ad hoc approaches favored in general purpose computing. The replicated state machine approach ensures that multiple copies of the same deterministic application execute requests in the same order and provides end-to-end assurance that independent transient failures will not lead to unavailability or incorrect responses. An efficient and effective end-to-end solution covers faults that have already been observed as well as failures that have not yet occurred, and it provides structural confidence that developers won't have to track down yet another failure caused by some unpredicted memory, disk, or network behavior. While the promise of end-to-end failure protection is intriguing, significant technical and practical challenges currently prevent adoption in general purpose computing environments. On the technical side, it is important that end-to-end solutions maintain the performance characteristics of deployed systems: if end-to-end solutions dramatically increase computing requirements, dramatically reduce throughput, or dramatically increase latency during normal operation then end-to-end techniques are a non-starter. On the practical side, it is important that end-to-end approaches be both comprehensible and easy to incorporate: if the cost of end-to-end solutions is rewriting an application or trusting intricate and arcane protocols, then end-to-end solutions will not be adopted. In this thesis we show that BFT state machine replication can and be used in deployed systems. Reaching this goal requires us to address both the technical and practical challenges previously mentioned. We revisiting disparate research results from the last decade and tweak, refine, and revise the core ideas to fit together into a coherent whole. Addressing the practical concerns requires us to simplify the process of incorporating BFT techniques into legacy applications. / text

Page generated in 0.4985 seconds