• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 292
  • 135
  • 54
  • 27
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 623
  • 623
  • 161
  • 150
  • 138
  • 116
  • 107
  • 102
  • 74
  • 73
  • 72
  • 71
  • 66
  • 61
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Cross-Layer Fault-Tolerant Design and Analysis for High Manufacturing Yield and System Reliability

Guo, Jianghao 26 May 2016 (has links)
No description available.
122

Implementation of Logic Fault Tolerance on a Dynamically Reconfigurable FPGA

Jayarama, Kiran January 2016 (has links)
No description available.
123

A Foundation for Fault Tolerant Components

Leal, William Milo 17 December 2001 (has links)
No description available.
124

Scalable design of fault-tolerance for wireless sensor networks

Demirbas, Murat 29 September 2004 (has links)
No description available.
125

High performance and network fault tolerant MPI with multi-pathing over infiniBand

Vishnu, Abhinav 11 December 2007 (has links)
No description available.
126

Network Fault Resilient MPI for Multi-Rail Infiniband Clusters

Pai Raikar, Siddhesh Prakash Sunita January 2011 (has links)
No description available.
127

FEASIBILITY STUDIES OF STATISTIC MULTIPLEXED COMPUTING

Celik, Yasin January 2018 (has links)
In 2012, when Professor Shi introduced me to the concept of Statistic Multiplexed Computing (SMC), I was skeptical. It contradicted everything I have learned and heard about distributed and parallel computing. However, I did believe that unhandled failures in any application will negatively impact its scalability. For that, I agreed to take on the feasibility study of SMC for practical applications. After six+ years research and experimentations, it became clear to me that the most widely believed misconception is “either performance or reliability” when upscaling a distributed application. This conception was the result of the direct use of hop-by-hop communication protocols in distributed application construction. Terminology: Hop-by-hop data protocol is a two-sided reliable lossless data communication protocol for transmitting data between a sender and a receiver. Either the sender or the receiver crash will cause data losses. Examples: MPI, RPC, RMI, OpenMP. End-to-end data protocol is a single-sided reliable lossless data communication protocol for transmitting data between application programs. All runtime available processors, networks and storage will be automatically dispatched to the best effort support of the reliable communication regardless transient and permanent device failures. Examples: HDFS, Blockchain, Fabric and SMC. Active end-to-end data protocol is a single-sided reliable lossless data communication pro- tocol for transmitting data and automatically synchronizing application programs. Example: SMC (AnkaCom, AnkaStore (this dissertation)). Unlike the hop-by-hop protocols, the use of end-to-end protocol forms an application- dependent overlay network. An overlay network for distributed and parallel computing application, such as Blockchain, has been proven to defy the “common wisdom” for two important distributed computing challenges: a) Extreme scale computing without single-point failures is practically feasible. Thus, all transaction or data losses can be eliminated. b) Extreme scale synchronized transaction replication is practically feasible. Thus, the CAP conjecture and theorem become irrelevant. Unlike passive overlay networks, such as the HDFS and Blockchain, this dissertation study proves that an active overlay network can deliver higher performance, higher reliability and security at the same time as the application up scales. Although application-level security is not part of this dissertation, it is easy to see that application-level end-to-end protocols will fundamentally eliminate the “man-in-the-middle” attacks. This will nullify many well-known attacks. With the zero-single-point failure and zero impact synchronous replication features, SMC applications are naturally resistant to DDoS and ransomware attacks. This dissertation explores practical implementations of the SMC concept for compute intensive (CI) and data intensive (DI) applications. This defense will disclose the details of CI and DI runtime implementations and results of inductive computational experiments. The computational environments include the NSF Chameleon bare-metal HPC cloud and Temple’s TCloud cluster. / Computer and Information Science
128

Decentralized Crash-Resilient Runtime Verification

Kazemlou, Shokoufeh January 2017 (has links)
This is the final revision of my M.Sc. Thesis. / Runtime Verification is a technique to extract information from a running system in order to detect executions violating a given correctness specification. In this thesis, we study distributed synchronous/asynchronous runtime verification of systems. In our setting, there is a set of distributed monitors that have only partial views of a large system and are subject to failures. In this context, it is unavoidable that monitors may have different views of the underlying system, and therefore may have different valuations of the correctness property. In this thesis, we propose an automata-based synchronous monitoring algorithm that copes with f crash failures in a distrbuted setting. The algorithm solves the synchronous monitoring problem in f + 1 rounds of communication, and significantly reduces the message size overhead. We also propose an algorithm for distributed crash-resilient asynchronous monitoring that consistently monitors the system under inspection without any communication between monitors. Each local monitor emits a verdict set solely based on its own partial observation, and the intersection of the verdict sets will be the same as the verdict computed by a centralized monitor that has full view of the system. / Thesis / Master of Science (MSc)
129

Challenges with Providing Reliability Assurance for Self-Adaptive Cyber-Physical Systems

Riaz, Sana, Kabir, Sohag, Campean, Felician, Mokryani, Geev, Dao, Cuong D., Angarita-Marquez, Jorge L., Al-Ja'afreh, Mohammad A.A. 03 February 2023 (has links)
No / Self-adaptive systems are evolving systems that can adjust their behaviour to accommodate dynamic requirements or to better serve the goal. These systems can vary in their architecture, operation, or adaptive strategies based on the application. Moreover, the evaluation can happen in different ways depending on system architecture and its requirements. Self-adaptive systems can be prone to situations like adaptation faults, inconsistencies in context or low performance on tasks due to their dynamism and complexity. That is why it is important to have reliability assurance of the system to monitor such situations which can compromise the system functionality. In this paper, we provide a brief background on different types of self-adaptive systems and various ways a system can evolve. We discuss the different mechanisms that have been applied in the last two decades for reliability evaluation of such systems and identify challenges and limitations as research opportunities related to the self-adaptive system’s reliability evaluation. / This research was undertaken as a part of the “Model-based Reliability Evaluation for Autonomous Systems with Evolving Architectures” project funded by the University of Bradford under the SURE Grant scheme.
130

A Low-latency Consensus Algorithm for Geographically Distributed Systems

Arun, Balaji 15 May 2017 (has links)
This thesis presents Caesar, a novel multi-leader Generalized Consensus protocol for geographically replicated systems. Caesar is able to achieve near-perfect availability, provide high performance - low latency and high throughput compared to the existing state-of-the- art, and tolerate replica failures. Recently, a number of state-of-the-art consensus protocols that implement the Generalized Consensus definition have been proposed. However, the major limitation of these existing approaches is the significant performance degradation when application workload produces conflicting requests. Caesar's main goal is to overcome this limitation by changing the way a fast decision is taken: its ordering protocol does not reject a fast decision for a client request if a quorum of nodes reply with different dependency sets for that request. It only switches to a slow decision if there is no chance to agree on the proposed order for that request. Caesar is able to achieve this using a combination of wait condition and logical time stamping. The effectiveness of Caesar is demonstrated through an evaluation study performed on Amazon's EC2 infrastructure using 5 geo-replicated sites. Caesar outperforms other multi-leader (e.g., EPaxos) competitors by as much as 1.7x in presence of 30% conflicting requests, and single-leader (e.g., Multi-Paxos) by as much as 3.5x. The protocol is also resistant to heavy client loads unlike existing protocols. / Master of Science

Page generated in 0.0517 seconds