Global ETD Search

161	Algorithms for Self-Organizing Wireless Sensor Networks Ould-Ahmed-Vall, ElMoustapha 09 April 2007 (has links) The unique characteristics of sensor networks pose numerous challenges that have to be overcome to enable their efficient use. In particular, sensor networks are energy constrained because of their reliance on battery power. They can be composed of a large number of unreliable nodes. These characteristics render node collaboration essential to the accomplishment of the network task and justify the development of new algorithms to provide services such as routing, fault tolerance and naming. This work increases the knowledge on the growing field of sensor network algorithms by contributing a new evaluation tool and two new algorithms. A new sensor network simulator that can be used to evaluate sensor network algorithms is discussed. It incorporates models for the different functional units composing a sensor node and characterizes the energy consumption of each. It is designed in a modular and efficient way favoring the ease of use and extension. It allows the user to choose from different implementations of energy models, accuracy models and types of sensors. The second contribution of this thesis is a distributed algorithm to solve the unique ID assignment problem in sensor networks. Our solution starts by assigning long unique IDs and organizing nodes in a tree structure. This tree structure is used to compute the size of the network. Then, unique IDs are assigned using the minimum length. Globally unique IDs are useful in providing many network functions, e.g. node maintenance and security. Theoretical and simulation analysis of the ID assignment algorithm demonstrate that a high percentage of nodes are assigned unique IDs at the termination of the algorithm when the algorithm parameters are set properly. Furthermore, the algorithm terminates in a short time that scales well with the network size. The third contribution of this thesis is a general fault-tolerant event detection scheme that allows nodes to detect erroneous local decisions based on the local decisions reported by their neighbors. It can handle cases where nodes have different and dynamic accuracy levels. We prove analytically that the derived fault-tolerant estimator is optimal under the maximum a posteriori criterion. An equivalent weighted voting scheme is derived. Sensor networks Network simulation ID assignment Fault tolerance Sensor networks Computer simulation Algorithms
162	A Prescription for Partial Synchrony Sastry, Srikanth 2011 May 1900 (has links) Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals. Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M_-based algorithms risks anomalous behavior. As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well. Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors. Partial Synchrony Failure Detectors Empirical Systems Fault Tolerance Crash Faults Real Time Fairness
163	CORBA in the aspect of replicated distributed real-time databases Milton, Robert January 2002 (has links) <p>A distributed real-time database (DRTDB) is a database distributed over a network on several nodes and where the transactions are associated with deadlines. The issues of concern in this kind of database are data consistency and the ability to meet deadlines. In addition, there is the possibility that the nodes, on which the database is distributed, are heterogeneous. This means that the nodes may be built on different platforms and written in different languages. This makes the integration of these nodes difficult, since data types may be represented differently on different nodes. The common object request broker architecture (CORBA), defined by the Object Management Group (OMG), is a distributed object computing (DOC) middleware created to overcome problems with heterogeneous sites.</p><p>The project described in this paper aims to investigate the suitability of CORBA as a middleware in a DRTDB. Two extensions to CORBA, Fault-Tolerance CORBA (FT-CORBA) and Real-Time CORBA (RT-CORBA) is of particular interest since the combination of these extensions provides the features for object replication and end-to-end predictability, respectively. The project focuses on the ability of RT-CORBA meeting hard deadlines and FT-CORBA maintaining replica consistency by using replication with eventual consistency. The investigation of the combination of RT-CORBA and FT-CORBA results in two proposed architectures that meet real-time requirements and provides replica consistency with CORBA as the middleware in a DRTDB.</p> Distributed real-time database replication Real-Time CORBA and Fault-Tolerance CORBA
164	Operating System Support for Redundant Multithreading Döbel, Björn 12 December 2014 (has links) (PDF) Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors. Fehlertoleranz Hardware Betriebssystem Fault Tolerance Hardware Operating System ddc:004 rvk:ST 260
165	Highly available storage with minimal trust Mahajan, Prince 05 July 2012 (has links) Storage services form the core of modern Internet-based services spanning commercial, entertainment, and social-networking sectors. High availability is crucial for these services as even an hour of unavailability can cost them millions of dollars in lost revenue. Unfortunately, it is difficult to build highly available storage services that provide useful correctness properties. Both benign (system crashes, power out- ages etc.) and Byzantine faults (memory or disk corruption, software or configuration errors etc.) plague the availability of these services. Furthermore, the goal of high availability conflicts with our desire to provide good performance and strong correctness guarantees. For example, the Consistency, Availability, and Partition- resilience (CAP) theorem states that a storage service that must be available despite network partitions cannot enforce strong consistency. Similarly, the tradeoff between latency and durability dictates that a low-latency service cannot ensure durability in the presence of data-center wide failures. This dissertation explores the theoretical and practical limits of storage services that can be safe and live despite the presence of benign and Byzantine faults. On the practical front, we use cloud storage as a deployment model to build Depot, a highly available storage service that addresses the above challenges. Depot minimizes the trust clients have to put in the third party storage provider. As a result, Depot clients can continue functioning despite benign or Byzantine faults of the cloud servers. Yet, Depot provides stronger availability, durability, and consistency properties than those provided by many of the existing cloud deployments, without incurring prohibitive performance cost. For example, in contrast to Amazon S3’s eventual consistency, Depot provides a variation of causal consistency on each volume, while tolerating Byzantine faults. On the theoretical front, we explore the consistency-availability tradeoffs. Tradeoffs between consistency and availability have proved useful for designers in deciding how much to strengthen consistency if high availability is desired or how much to compromise availability if strong consistency is essential. We explore the limits of such tradeoffs by attempting to answer the question: What are the semantics that can be implemented without compromising availability? In this work, we investigate this question for both fail-stop and Byzantine failure models. An immediate benefit of answering this question is that we can compare and contrast the consistency provided by Depot with that achievable by an optimal implementation. More crucially, this result complements the CAP theorem. While, the CAP theorem defines a set of properties that cannot be achieved, this work identifies the limits of properties that can be achieved. / text Cloud storage Byzantine fault tolerance Fork-Join-Causal (FJC) consistency Fork consistency
166	Fault tolerance in distributed systems : a coding-theoretic approach Balasubramanian, Bharath 19 November 2012 (has links) Distributed systems are rapidly increasing in importance due to the need for scalable computations on huge volumes of data. This fact is reflected in many real-world distributed applications such as Amazon's EC2 cloud computing service, Facebook's Cassandra key-value store or Apache's Hadoop MapReduce framework. Multi-core architectures developed by companies such as Intel and AMD have further brought this to prominence, since workloads can now be distributed across many individual cores. The nodes or entities in such systems are often built using commodity hardware and are prone to physical failures and security vulnerabilities. Achieving fault tolerance in such systems is a challenging task, since it is not easy to observe and control these distributed entities. Replication is a standard approach for fault tolerance in distributed systems. The main advantage of this approach is that the backups incur very little overhead in terms of the time taken for normal operation or recovery. However, replication is grossly wasteful in terms of the number of backups required for fault tolerance. The large number of backups has two major implications. First, the total space or memory required for fault tolerance is considerably high. Second, there is a significant cost of resources such as the power required to run the backup processes. Given the large number of distributed servers employed in real-world applications, it is a hard task to provide fault tolerance while achieving both space and operational efficiency. In the world of data fault tolerance and communication, coding theory is used as the space efficient alternate for replication. A direct application of coding theory to distributed servers, treating the servers as blocks of data, is very inefficient in terms of the updates to the backups. This is primarily because each update to the server will affect many blocks in memory, all of which have to be re-encoded at the backups. This leads us to the following thesis statement: Can we design a mechanism for fault tolerance in distributed systems that combines the space efficiency of coding theory with the low operational overhead of replication? We present a new paradigm to solve this problem, broadly referred to as fusion. We provide fusion-based solutions for two models of computation that are representative of a large class of applications: (i) Systems modeled as deterministic finite state machines and, (ii) Systems modeled as programs containing data structures. For finite state machines, we use the notion of Hamming distances to present a polynomial time algorithm to generate efficient backup state machines. For programs hosting data structures, we use a combination of erasure codes and selective replication to generate efficient backups for most commonly used data structures such as queues, array lists, linked lists, vectors and maps. We present theoretical and experimental results that demonstrate the efficiency of our schemes over replication. Finally, we use our schemes to design an efficient solution for fault tolerance in two real-world applications: Amazons Dynamo key-value store, and Google's MapReduce framework. / text Distributed systems Fault tolerance Coding theory Data structures Finite state machines
167	Ανάπτυξη τεχνικής αύξησης της αξιοπιστίας των κρυφών μνημών πρώτου επιπέδου βασισμένη στη χωρική τοπικότητα των μπλοκ μνήμης Μαυρόπουλος, Μιχαήλ 16 May 2014 (has links) Στην παρούσα διπλωματική εργασία θα ασχοληθούμε με το πρόβλημα της αξιοπιστίας των κρυφών μνημών δεδομένων και εντολών πρώτου επιπέδου. Η υψηλή πυκνότητα ολοκλήρωσης και η υψηλή συχνότητα λειτουργίας των σύγχρονων ολοκληρωμένων κυκλωμάτων έχει οδηγήσει σε σημαντικά προβλήματα αξιοπιστίας, που οφείλονται είτε στην κατασκευή, είτε στη γήρανση των ολοκληρωμένων κυκλωμάτων. Στην παρούσα εργασία γίνεται αρχικά μια αποτίμηση της μείωσης της απόδοσης των κρυφών μνημών πρώτου επιπέδου όταν εμφανίζονται μόνιμα σφάλματα για διαφορετικές τεχνολογίες ολοκλήρωσης. Στη συνέχεια παρουσιάζεται μια νέα τεχνική αντιμετώπισης της επίδρασης των σφαλμάτων, η οποία βασίζεται στη πρόβλεψη της χωρικής τοπικότητας των μπλοκ μνήμης που εισάγονται στις κρυφές μνήμες πρώτου επιπέδου. Η αξιολόγηση της εν λόγω τεχνικής γίνεται με τη χρήση ενός εξομοιωτή σε επίπεδο αρχιτεκτονικής. / In this thesis we will work on the problem of reliability of first-level data and instruction cache memories. Technology scaling improvement is affecting the reliability of ICs due to increases in static and dynamic variations as well as wear out failures. First of all, in this work we try to estimate the impact of permanent faults in first level faulty caches. Then we propose a methodology to mitigate this negative impact of defective bits. Out methodology based on prediction of spatial locality of the incoming blocks to cache memory. Finally using cycle accurate simulation we showcase that our approach is able to offer significant benefits in cache performance. Κρυφές μνήμες Ανοχή σφαλμάτων Χωρική τοπικότητα 005.435 Cache memories Fault tolerance Spatial locality
168	Σύστημα ελέγχου Quadrotor με ανοχή σε σφάλματα Γκούντας, Κωνσταντίνος 10 March 2015 (has links) Η παρούσα διπλωματική εργασία πραγματεύεται τη μελέτη και τη μοντελοποίηση ενός ελεγκτή, ο οποίος δύναται να διατηρήσει τον έλεγχο ενός ιπτάμενου ελικοπτέρου τεσσάρων ελίκων γνωστό και ως quadrotor, σε περίπτωση δυσλειτουργίας κάποιου κινητήρα. Η μοντελοποίηση και η προσομοίωση γίνεται με τη βοήθεια του προγράμματος Μatlab/Simulink. Αναλυτικότερα, στο κεφάλαιο 1, γίνεται μια σύντομη παρουσίαση του quadrotor, αναφέροντας τον τρόπο λειτουργίας του και τους σημαντικότερους σταθμούς της ιστορίας του μέχρι σήμερα. Στη συνέχεια, επισημαίνεται η αναγκαιότητα ενσωμάτωσης συστημάτων αυτομάτου ελέγχου σε συστήματα που επηρεάζουν άμεσα ή έμμεσα την ανθρώπινη ζωή καθώς και η σπουδαιότητα βελτίωσης αυτών, κάνοντάς τα ανεχτικά σε σφάλματα και δυσλειτουργίες. Στο κεφάλαιο 2, μελετώνται οι εξισώσεις που περιγράφουν την κίνηση και τον προσανατολισμό του quadrotor. Έτσι δημιουργείται το μοντέλο του συστήματος, τόσο με σταθερό όσο και με μεταβλητό κέντρο μάζας, το οποίο θα χρησιμοποιηθεί στις προσομοιώσεις για την αξιολόγηση των ελεγκτών. Τονίζεται πως η αντίσταση του αέρα δεν θεωρείται πλέον αμελητέα καθώς επηρεάζει σε σημαντικό βαθμό τη κατάσταση του οχήματος. Στο κεφάλαιο 3, δίνεται το θεωρητικό υπόβαθρο στο όποιο θα στηριχτούν οι ελεγκτές που θα δημιουργηθούν. Πιο συγκεκριμένα, γίνεται μια παρουσίαση του ελεγκτή PID, καθώς και πως επηρεάζουν οι μεταβλητές του ένα σύστημα. Στη συνέχεια, παρουσιάζονται σημαντικές πληροφορίες για την μοντελοποίηση των σφαλμάτων και το κεφάλαιο κλείνει με τη παρουσίαση ελεγκτών με ανοχή σε σφάλματα. Στο κεφάλαιο 4, παρουσιάζονται τα μοντέλα των ελεγκτών που χρησιμοποιήθηκαν στις προσομοιώσεις. Πιο αναλυτικά, ξεκινώντας από ένα ελεγκτή ικανό υπό προϋποθέσεις, καταλήγουμε σε ένα εύρωστο ελεγκτή αξιοποιώντας τη μετακίνηση του κέντρου μάζας. Για κάθε ελεγκτή δίνονται οι παράμετροι καθώς και η σχηματική του απεικόνιση στο simulink. Στο κεφάλαιο 5, παρουσιάζονται και σχολιάζονται αναλυτικά τα αποτελέσματα των προσομοιώσεων. Δίνονται γραφήματα που αφορούν τη θέση και τον προσανατολισμό σε όλη τη διάρκεια της πτήσης αλλά και στο κρίσιμο μεταβατικό φαινόμενο. Τέλος, επισημαίνονται τα πλεονεκτήματα και τα μειονεκτήματα καθενός από τους υλοποιημένους ελεγκτές. Στο κεφάλαιο 6, δίνονται τα συμπεράσματα, αναφέρονται παρόμοιοι ελεγκτές και επιτεύγματα στο τομέα αυτό καθώς και οι διαφορές των ήδη υπαρχόντων λύσεων με τη δική μας προσέγγιση. / In this thesis, a control system, capable of retaining the control of a quadrotor vehicle when one of its actuators fails, is studied and simulated. The software which was used for modeling and simulation is Matlab/Simulink. Specifically, in Chapter 1, there is a brief presentation of quadrotor, stating how it operates and mentioning the most important historical cornerstones. Subsequently, we point to the necessity of integration of automatic control systems to systems that directly or indirectly affect human life and the importance of their improvement by making them tolerant to errors and malfunctions. In Chapter 2, we study the equations that describe the movement and orientation of the quadrotor. Using these equations, the quadrotor model is created, both for fixed and movable center of mass, which will be used in this thesis’ simulations. It is important to mention that the air resistance is no longer considered negligible as it significantly affects the state of the vehicle. In Chapter 3, the theoretical background is given upon which the controllers will be implemented. More specifically, the PID controller is presented, and its influence on the system’s performance. The modeling of quadrotor’s faults is presented and the chapter ends with the presentation of fault tolerant controllers. Chapter 4 presents the controllers’ models which were used in the simulations. Specifically, starting from a controller designed around a nominal operating point, we arrive at a robust controller utilizing the movement of the center of mass. For each controller, its parameters are given as well as its model in Simulink. In Chapter 5, the simulations’ results are presented and discussed in detail. Graphs are used to show the position and the orientation not only throughout the flight but also during the critical transition phenomenon. Finally, advantages and disadvantages of each of the implemented controllers are stated. In Chapter 6, there is the conclusion of the thesis. Similar controllers or achievements are mentioned and how our approximation differs from the other existing solutions. Σφάλματα Ανοχή σε σφάλματα 623.746 9 Quadrotor vehicle Faults Fault tolerance Drones
169	Graph Algorithms for Network Tomography and Fault Tolerance Gopalan, Abishek January 2013 (has links) The massive growth and proliferation of media, content, and services on the Internet are driving the need for more network capacity as well as larger networks. With increasing bandwidth and transmission speeds, even small disruptions in service can result in a significant loss of data. Thus, it is becoming increasingly important to monitor networks for their performance and to be able to handle failures effectively. Doing so is beneficial from a network design perspective as well as in being able to provide a richer experience to the users of such networks. Network tomography refers to inference problems in large-scale networks wherein it is of interest to infer individual characteristics, such as link delays, through aggregate measurements, such as end-to-end path delays. In this dissertation, we establish a fundamental theory for a class of network tomography problems in which the link metrics of a network are modeled to be additive. We establish the necessary and sufficient conditions on the network topology, provide polynomial time graph algorithms that quantify the extent of identifiability, and algorithms to identify the unknown link metrics. We develop algorithms for all graph topologies classified on the basis of their connectivity. The solutions developed in this dissertation extend beyond networking and are applicable in areas such as nano-electronics and power systems. We then develop graph algorithms to handle link failures effectively and to provide multipath routing capabilities in IP as well as Ethernet based networks. Our schemes guarantee recovery and are designed to pre-compute alternate next hops that can be taken upon link failures. This allows for fast re-routing as we avoid the need to wait for (control plane) re-computations. Computer networking Fault tolerance Graph theory Network tomography Electrical & Computer Engineering Algorithms
170	Sistemų su klaidų įterpimu formalizavimas / Systems with possibility to insert faults formalization Blažaitytė, Eglė 16 August 2007 (has links) Kiekvienos sistemos kūrimo tikslas yra veikianti, gyvybinga ir saugi sistema, teikianti norimus ir patikimus rezultatus. Sistemos saugumas – tai sistemos savybė, reiškianti, kad sistemos funkcionavimo metu neįvyks jokia nenumatyta situacija. Gyvybingumas – sistemos reakcija į tam tikrus įvykius ir sugebėjimas atlikti nustatytas užduotis bei pateikti teisingus sprendimus arba rezultatus. Norint sukurti tokią sistemą, kuri ateityje tenkins nustatytus reikalavimus, yra labai svarbu iš anksto nustatyti jos formalią reikalavimų specifikaciją, nes nuo to priklauso galutinis produktas – kiek įvairių situacijų, į kurias sistema gali patekti, ar bus numatyta, kaip ji susidoros su atitinkamais išoriniais ar vidiniais įvykiais. Tokią specifikaciją galima praplėsti įvairiomis modifikacijomis, kurios gali padėti aptikti potencialias klaidas sistemoje, kurias įvertinus kūrimo metu, galima sistemai suteikti tolerancijos klaidoms savybę. / In order to create a fault tolerant system, very clear requirements should be prepared and all possible fault events should be analyzed. It can be properly made by using any of system modeling formalism. In this work alternating bit protocol system was chosen to formalize and analyzed in fault tolerant software aspects. Alternating bit protocol was modified in two ways – it’s functionality under perfect circumstances and with added faults, in order to make the system fault tolerant. These both cases were formalized by PLA and DEVS formalization methods. After the research of different formalisms and adjusting FDEVS to alternating bit protocol, FPLA formalization method was created. Informatics Tolerancija klaidoms Formalizavimas PLA Sistemų patikimumas Fault tolerance Formalization PLA Systems dependability

Search results