Spelling suggestions: "subject:"[een] DISTRIBUTED SYSTEMS"" "subject:"[enn] DISTRIBUTED SYSTEMS""
111 |
Towards Reliable Federated Learning: Decentralization and Fault ToleranceZhilin Wang (17805221) 04 December 2024 (has links)
<p dir="ltr">In recent years, Federated Learning (FL) has emerged as a promising approach for training machine learning models across distributed data sources while preserving privacy. However, traditional FL faces significant challenges in reliabilities, including the risk of the single point of failure and vulnerabilities to adversarial attacks. </p><p dir="ltr">This research proposes an innovative framework, Blockchain-based FL(BCFL), leveraging blockchain to decentralize the FL system and enhance its reliability. To optimize BCFL in resource-constrained environments, we design incentive mechanisms and resource allocation schemes to maximize computational efficiency for clients engaging in both training and mining tasks. Additionally, we introduce a dual-task resource allocation scheme specifically tailored for Mobile Edge Computing (MEC), enabling edge servers to manage both BCFL and offloading tasks efficiently. To address the inherent risk of client dropout in distributed learning, we propose the HieAvg algorithm within a decentralized hierarchical FL framework, mitigating the impact of stragglers through historical weight-based aggregation. This research also introduces the Faker attack, a novel model poisoning approach that exploits weaknesses in similarity metrics commonly used in FL defenses. In response, we develop the Similarity of Partial Parameters (SPP) defense, a random parameter selection strategy that disrupts the predictability of similarity evaluations, offering robust protection against adaptive attacks.</p><p dir="ltr">Our research provides practical strategies to fortify FL systems against reliability vulnerabilities. This work lays the foundation for more secure, reliable, and efficient FL in various environments through decentralized architectures and novel fault </p>
|
112 |
Deactivation Diagram Development for Naval Ship System Vulnerability AnalysisSnyder, Daniel Joseph 17 June 2019 (has links)
System architecture analyses of distributed ship systems offer a practical view of system behavior over all operational states; however, the effectiveness of these analyses can be bound by limited computational performance or capability. Deactivation diagrams provide an alternative view to conventional system architecture descriptions, allowing for rapid analysis of system connectivity and flow based on precomputed single-state system descriptions. This thesis explores the development of system deactivation diagrams and their use in early-stage naval ship system design. Software tools developed in C++ and VBA as part of this research support the Virginia Tech (VT) Naval Ship Design Concept and Requirements Exploration (CandRE) process and tools utilizing the U.S. Navy's Leading-Edge Architecture for Prototyping Systems (LEAPS) framework database. These tools incorporate automated path-finding algorithms developed based on proven network theory and effective computational methods for use in performing ship system deactivation analysis. Data drawn from the results of this approach possess extensible applicability towards studies in naval ship system vulnerability, flow optimization, network architecture, and other system analyses. Supplementary work on interfacing the LEAPS framework libraries with deactivation analyses has demonstrated the capability for generating deactivation diagrams from complex LEAPS ship system databases and paved the way for future incorporation of LEAPS into research work at Virginia Tech. / Master of Science / As the development of new ships becomes more technically complex due to the increased incorporation of redundant and interdependent ship systems, there is a greater need for advanced tools to support future ship system design. Ship operational capabilities rely on the resiliency of onboard systems in all situations, included damaged conditions, and require comprehensive design evaluation to identify weaknesses in system concepts. This thesis details the development of a computational approach to ship system analysis using precomputed deactivation diagrams for early-stage naval ship system design. Deactivation diagrams are a unique way of looking at the interconnectivity of system components and offer a consolidated view of complex network architecture to significantly simplify and accelerate subsequent analyses. Developments in computational algorithms for ship system connectivity presented in this thesis aid in the automated development of deactivation diagrams and support system flow and vulnerability analyses with particular regard to ongoing work on the Virginia Tech (VT) Naval Ship Design Concept and Requirements Exploration (C&RE) process. Additional thesis development work referencing the U.S. Navy’s Leading-Edge Architecture for Prototyping Systems (LEAPS) database framework has demonstrated the capability for generating deactivation diagrams from complex LEAPS ship system databases and paved the way for future incorporation of LEAPS into research work at VT.
|
113 |
A Distributed Software Framework for the Virginia Tech Ground StationDavid, Paul Uri 23 November 2015 (has links)
The key goal in this work is to enable a flexible ground station that is not constrained to a particular mission or set of hardware. In addition, with the concepts and software produced in this thesis, it will play a significant role in educating engineers and students by providing critical infrastructure and a sandbox for ground station operations. Key pieces of software were developed in this work to create a flexible and robust software-defined ground station. Several digital transmission modes were developed in order to allow communication between the ground station and common amateur radio CubeSats and SmallSats. In order to handle distributed tasks and process at a ground station with multiple servers and controllers, a specialized actor framework was written in Python for ease of use. Actors have the ability to send messages to one another over a network, and they maintain their own memory in order to avoid synchronization problems that come with sharing memory. In addition to the software developed in this work, a novel Peer-to-Peer (P2P) protocol for a network of ground stations is proposed in order to increase coverage and access to spacecraft without requiring centralized server infrastructure. This protocol provides the method to scale the developed software architecture beyond a single ground station. Since the Virginia Tech Ground Station (VTGS) will have many concurrent processes running across multiple servers, it was necessary to apply the actor model in order to simplify the design of the system. The purpose of this thesis is to describe the developed software for the VTGS as well as the P2P protocol for a larger network of ground stations. There are three primary repositories: planck-dsp, gr-vtgs, and pystation. The planck-dsp library and gr-vtgs Out-of-tree (OOT) make up the primary digital signal processing and communications toolboxes, where GNU Radio serves as the scheduler for signal processing blocks used in flow graphs. The pystation module is the extensible software actor framework that connects various systems both locally and remotely. It is also responsible for scheduling and handling ground station requests. While the software was primarily created for the VTGS, it is general enough to apply to other ground station implementations. / Master of Science
|
114 |
On Improving Distributed Transactional Memory through Nesting, Partitioning and OrderingTurcu, Alexandru 03 March 2015 (has links)
Distributed Transactional Memory (DTM) is an emerging, alternative concurrency control model that aims to overcome the challenges of distributed-lock based synchronization. DTM employs transactions in order to guarantee consistency in a concurrent execution. When two or more transactions conflict, all but one need to be delayed or rolled back.
Transactional Memory supports code composability by nesting transactions. Nesting how- ever can be used as a strategy to improve performance. The closed nesting model enables partial rollback by allowing a sub-transaction to abort without aborting its parent, thus reducing the amount of work that needs to be retried. In the open nesting model, sub- transactions can commit to the shared state independently of their parents. This reduces isolation and increases concurrency.
Our first main contribution in this dissertation are two extensions to the existing Transac- tional Forwarding Algorithm (TFA). Our extensions are N-TFA and TFA-ON, and support closed nesting and open nesting, respectively. We additionally extend the existing SCORe algorithm with support for open nesting (we call the result SCORe-ON). We implement these algorithms in a Java DTM framework and evaluate them. This represents the first study of transaction nesting in the context of DTM, and contributes the first DTM implementation which supports closed nesting or open nesting.
Closed nesting through our N-TFA implementation proved insufficient for any significant throughput improvements. It ran on average 2% faster than flat nesting, while performance for individual tests varied between 42% slowdown and 84% speedup. The workloads that benefit most from closed nesting are characterized by short transactions, with between two and five sub-transactions.
Open nesting, as exemplified by our TFA-ON and SCORe-ON implementations, showed promising results. We determined performance improvement to be a trade-off between the overhead of additional commits and the fundamental conflict rate. For write-intensive, high- conflict workloads, open nesting may not be appropriate, and we observed a maximum speedup of 30%. On the other hand, for lower fundamental-conflict workloads, open nesting enabled speedups of up to 167% in our tests.
In addition to the two nesting algorithms, we also develop Hyflow2, a high-performance DTM framework for the Java Virtual Machine, written in Scala. It has a clean Scala API and a compatibility Java API. Hyflow2 was on average two times faster than Hyflow on high-contention workloads, and up to 16 times faster in low-contention workloads.
Our second main contribution for improving DTM performance is automated data partition- ing. Modern transactional processing systems need to be fast and scalable, but this means many such systems settled for weak consistency models. It is however possible to achieve all of strong consistency, high scalability and high performance, by using fine-grained partitions and light-weight concurrency control that avoids superfluous synchronization and other over- heads such as lock management. Independent transactions are one such mechanism, that rely on good partitions and appropriately defined transactions. On the downside, it is not usually straightforward to determine optimal partitioning schemes, especially when dealing with non-trivial amounts of data. Our work attempts to solve this problem by automating the partitioning process, choosing the correct transactional primitive, and routing transactions appropriately.
Our third main contribution is Alvin, a system for managing concurrently running trans- actions on a geographically replicated data-store. Alvin supports general-purpose transactions, and guarantees strong consistency criteria. Through a novel partial order broadcast protocol, Alvin maximizes the parallelism of ordering and local transaction processing, resulting in low client-perceived latency. Alvin can process read-only transactions either lo- cally or globally, according to the desired consistency criterion. Conflicting transactions are ordered across all sites. We built Alvin in the Go programming language. We conducted our evaluation study on Amazon EC2 infrastructure and compared against Paxos- and EPaxos- based state machine replication protocols. Our results reveal that Alvin provides significant speed-up for read-dominated TPC-C workloads: as much as 4.8x when compared to EPaxos on 7 datacenters, and up to 26% in write-intensive workloads.
Our fourth and final contribution is M2Paxos, a multi-leader implementation of Generalized Consensus. Single leader-based consensus protocols are known to stop scaling once the leader reaches its saturation point. Ordering commands based on conflicts is appealing due to the potentially higher parallelism, but is imperfect due to the higher quorum sizes required for fast decisions and the need to compare commands and track their dependencies. M2Paxos on the other hand exploits fast decisions (i.e., delivery of a command in two communication delays) by leveraging a classic quorum size, matching a majority of nodes deployed. M2Paxos does not establish command dependencies based on conflicts, but it binds accessed objects to nodes, making sure commands operating on the same object will be ordered by the same node. Our evaluation study of M2Paxos (also built in Go) confirms the effectiveness of this approach, getting up to 7⨉ improvements in performance over state- of-the-art consensus and generalized consensus algorithms. / Ph. D.
|
115 |
Efficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed DatabasesKhan, Mohammed Saquib Akmal 26 January 2015 (has links)
Real-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges.
In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35].
DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges.
We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11.
Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government. / Master of Science
|
116 |
A Low-latency Consensus Algorithm for Geographically Distributed SystemsArun, Balaji 15 May 2017 (has links)
This thesis presents Caesar, a novel multi-leader Generalized Consensus protocol for geographically replicated systems. Caesar is able to achieve near-perfect availability, provide high performance - low latency and high throughput compared to the existing state-of-the- art, and tolerate replica failures. Recently, a number of state-of-the-art consensus protocols that implement the Generalized Consensus definition have been proposed. However, the major limitation of these existing approaches is the significant performance degradation when application workload produces conflicting requests. Caesar's main goal is to overcome this limitation by changing the way a fast decision is taken: its ordering protocol does not reject a fast decision for a client request if a quorum of nodes reply with different dependency sets for that request. It only switches to a slow decision if there is no chance to agree on the proposed order for that request. Caesar is able to achieve this using a combination of wait condition and logical time stamping. The effectiveness of Caesar is demonstrated through an evaluation study performed on Amazon's EC2 infrastructure using 5 geo-replicated sites. Caesar outperforms other multi-leader (e.g., EPaxos) competitors by as much as 1.7x in presence of 30% conflicting requests, and single-leader (e.g., Multi-Paxos) by as much as 3.5x. The protocol is also resistant to heavy client loads unlike existing protocols. / Master of Science / Today, there exists a plethora of online services (e.g. Facebook, Google) that serve millions of users daily. Usually, each of these services have multiple subcomponents that work cohesively to deliver a rich user experience. One vital component that is prevalent in these services is the one that maintains the shared state. One example of a shared state component is a database, which enables operations on structured data. Such shared states are replicated across multiple server nodes, and even across multiple data centers to guarantee availability, i.e., if a node fails, other nodes can still serve requests on the shared state; low-latency, i.e., placing the copy of the shared state in a datacenter closer to the users will reduce the time required to serve the users; and scalability, i.e., the bottleneck that a single server node cannot serve millions of concurrent requests can be alleviated by having multiple nodes serve users at the same time. These replicated shared states need to be kept consistent i.e. every copy of the shared state must be the same in all the replicated nodes, and maintaining this consistency requires that each of these replicating nodes communicate with each other and reach an agreement on the order in which the operations on the shared data should be applied. In that regard, this thesis proposes Caesar, a consensus protocol with the aforementioned guarantees that will ease the deployment of services that contain a shared state. It addresses the problem of performance degradation in existing approaches when the same part of the shared state are accessed by multiple users that are connected to different server nodes. The effectiveness of Caesar is demonstrated through an evaluation study performed by deploying the protocol on five of Amazon’s data centers around the world. Caesar outperforms the existing state-of-the-art by as much as 3.5x. Caesar is also resistant to heavy client loads unlike existing protocols.
|
117 |
Using Application Benefit for Proactive Resource Allocation in Asynchronous Real-Time Distributed SystemsHegazy, Tamir A. 12 October 2001 (has links)
This thesis presents two proactive resource allocation algorithms, RBA* and OBA, for asynchronous real-time distributed systems. The algorithms consider an application model where timeliness requirements are expressed using Jensen's benefit functions and propose adaptation functions to describe anticipated workload for future time intervals. Furthermore, an adaptation model is considered where processes are replicated for sharing workload increases. A real-time Ethernet system model is considered where message collisions are resolved. Given such models, the objective is to maximize aggregate application benefit and minimize aggregate missed deadline ratio. Since determining the optimal allocation is computationally intractable, the algorithms heuristically compute the allocation so that it is as "close" as possible to the optimal allocation. While RBA* analyzes process response times to determine the allocation, OBA analyzes processor overloads to compute the decision in a much faster way. RBA* incurs a quadratic amortized complexity in terms of subtask arrivals for the most computationally intensive component when DASA is used as the underlying process-scheduling algorithm, whereas OBA incurs a logarithmic amortized complexity for the corresponding component. To study how different process-scheduling and message-scheduling algorithms affect the performance of the algorithms and to compare their performances, benchmark-driven experiments were conducted. The experimental results reveal that RBA* produces higher aggregate benefit and lower missed deadline ratio when DASA is used for process scheduling and message scheduling. Furthermore, it is observed that RBA* produces higher aggregate benefit and lower missed deadline ratio than OBA, confirming the intuition that accurate response time analysis can lead to better results. / Master of Science
|
118 |
An Efficient Parallel Three-Level Preconditioner for Linear Partial Differential EquationsYao, Aixiang I Song 26 February 1998 (has links)
The primary motivation of this research is to develop and investigate parallel preconditioners for linear elliptic partial differential equations. Three preconditioners are studied: block-Jacobi preconditioner (BJ), a two-level tangential preconditioner (D0), and a three-level preconditioner (D1). Performance and scalability on a distributed memory parallel computer are considered. Communication cost and redundancy are explored as well.
After experiments and analysis, we find that the three-level preconditioner D1 is the most efficient and scalable parallel preconditioner, compared to BJ and D0. The D1 preconditioner reduces both the number of iterations and computational time substantially. A new hybrid preconditioner is suggested which may combine the best features of D0 and D1. / Master of Science
|
119 |
Adaptive Knowledge Exchange with Distributed Partial Models@Run.timeWerner, Christopher 11 January 2016 (has links) (PDF)
Die wachsende Anzahl an Robotikanwendungen, in denen mehrere Roboter ein gemeinsames Ziel verfolgen, erfordert eine gesonderte Betrachtung der Interaktion zwischen diesen Robotern mit Bezug auf den damit entstehenden Datenaustausch. Dieser muss hierbei effizient betrieben werden und die Sicherheit des gesamt Systems gewährleisten. Diese Masterarbeit stellt eine Simulationsumgebung vor, welche anhand von Testszenarien und Austauschstrategien Roboterkonstellationen prüft und Messergebnisse ausliefert. Zu Beginn der Arbeit werden drei Datenaustauschverfahren betrachtet und anschließend Publikationen vorgestellt, in denen Datenaustausch betrieben wird und Simulatoren für die Nutzbarkeit der Simulationsumgebung untersucht. Die anschließenden Kapitel behandeln das Konzept und die Implementierung der Testumgebung erläutert, wobei Roboter aus einer Menge von Hardware Komponenten und Zielen beschrieben werden. Der Aufbau des Experiments umfasst die verschiedenen Umgebungen, Testszenarien und Roboterkonfiguration. Der Aufbau beschreibt die Grundlage für die Auswertung der Testergebnisse.
|
120 |
Fog Computing with Go: A Comparative StudyButterfield, Ellis H 01 January 2016 (has links)
The Internet of Things is a recent computing paradigm, de- fined by networks of highly connected things – sensors, actuators and smart objects – communicating across networks of homes, buildings, vehicles, and even people. The Internet of Things brings with it a host of new problems, from managing security on constrained devices to processing never before seen amounts of data. While cloud computing might be able to keep up with current data processing and computational demands, it is unclear whether it can be extended to the requirements brought forth by Internet of Things.
Fog computing provides an architectural solution to address some of these problems by providing a layer of intermediary nodes within what is called an edge network, separating the local object networks and the Cloud. These edge nodes provide interoperability, real-time interaction, routing, and, if necessary, computational delegation to the Cloud.
This paper attempts to evaluate Go, a distributed systems language developed by Google, in the context of requirements set forth by Fog computing. Similar methodologies of previous literature are simulated and benchmarked against in order to assess the viability of Go in the edge nodes of Fog computing architecture.
|
Page generated in 0.0487 seconds