Global ETD Search

1	Towards Reliable Federated Learning: Decentralization and Fault Tolerance Zhilin Wang (17805221) 04 December 2024 (has links) <p dir="ltr">In recent years, Federated Learning (FL) has emerged as a promising approach for training machine learning models across distributed data sources while preserving privacy. However, traditional FL faces significant challenges in reliabilities, including the risk of the single point of failure and vulnerabilities to adversarial attacks. </p><p dir="ltr">This research proposes an innovative framework, Blockchain-based FL(BCFL), leveraging blockchain to decentralize the FL system and enhance its reliability. To optimize BCFL in resource-constrained environments, we design incentive mechanisms and resource allocation schemes to maximize computational efficiency for clients engaging in both training and mining tasks. Additionally, we introduce a dual-task resource allocation scheme specifically tailored for Mobile Edge Computing (MEC), enabling edge servers to manage both BCFL and offloading tasks efficiently. To address the inherent risk of client dropout in distributed learning, we propose the HieAvg algorithm within a decentralized hierarchical FL framework, mitigating the impact of stragglers through historical weight-based aggregation. This research also introduces the Faker attack, a novel model poisoning approach that exploits weaknesses in similarity metrics commonly used in FL defenses. In response, we develop the Similarity of Partial Parameters (SPP) defense, a random parameter selection strategy that disrupts the predictability of similarity evaluations, offering robust protection against adaptive attacks.</p><p dir="ltr">Our research provides practical strategies to fortify FL systems against reliability vulnerabilities. This work lays the foundation for more secure, reliable, and efficient FL in various environments through decentralized architectures and novel fault </p> Modelling and simulation Distributed systems and algorithms Mobile computing federated learning platform
2	Pipelined Byzantine Fault Tolerance and Applications Adithya Bhat (17583018) 07 December 2023 (has links) <p dir="ltr">Practically, Byzantine faults are not assumed in cloud applications. Byzantine fault-tolerance adds significant cryptographic, communication, throughput, and latency overheads to applications, contributing to the resistance towards its widespread adoption. Existing Byzantine-fault tolerant protocols focus on optimal latency or optimal communication while ignoring the throughput and cryptographic overheads.</p><p dir="ltr">In this thesis, we explore pipelining for Byzantine fault-tolerant applications. Pipelining tasks is a common optimization in distributed systems that involves executing tasks in stages. The idea is that instead of executing a task in an iteration as an atomic unit, we split the execution into stages and execute all stages of <i>different</i> tasks per iteration. We observe significant performance benefits if executing later stages of a task helps other tasks in earlier stages, saving effort in each stage. The length of the pipeline, i.e., the number of stages, determines the latency of an individual task. However, if the pipeline improves the execution of every stage enough, then the latency improves.</p><p dir="ltr">We primarily explore three Byzantine Fault Tolerant (BFT) applications with pipelining: (i) unique chain-based State Machine Replication protocols: <i>Apollo</i>, <i>Artemis</i>, <i>Leto</i>, and <i>Zeus</i>, and (ii) energy-efficient State Machine Replication: <i>EESMR</i>. (iii) random beacon protocols: <i>GRandPiper</i>, <i>BRandPiper</i>, and <i>OptRand</i>. We design them with a pipeline-first approach to improve the throughput, cryptographic, and communication costs at every stage of the pipeline. With respect to latency, we show (i) pipelined SMR protocols where our pipeline stages have constant cryptographic and linear communication costs allowing our protocols to outperform state-of-the-art BFT-SMR protocols in throughput. (ii) pipelined SMR protocols with techniques to make each stage of the pipeline independent, thus achieving demonstrable energy efficiency while allowing an unbounded number of non-interactive parallel proposals. (iii) reduced latencies for reconfiguration-friendly random beacons by using two pipelines: an SMR pipeline to commit and a beacon pipeline to produce random numbers and decoupling the two pipelines thereby removing the impact of the high-latency SMR pipeline on the latency of the randomness output by the system. </p> Cryptography Distributed systems and algorithms Performance evaluation State Machine Replication Random Beacon Byzantine fault tolerance (BFT)
3	FINITE SAMPLE GUARANTEES FOR LEARNING THE DYNAMICS OF SYSTEMS Lei Xin (17410485) 20 November 2023 (has links) <p dir="ltr">The problem of system identification is to learn the system dynamics from data. While classical system identification theories focused primarily on achieving asymptotic consistency, recent efforts have sought to characterize the number of samples needed to achieve a desired level of accuracy in the learned model. This thesis focuses on finite sample analysis for identifying/learning dynamical systems.</p><p dir="ltr">In the first part of this thesis, we provide novel results on finite sample analysis for learning different linear systems. We first consider the system identification problem of a fully observed system (i.e., all states of the system can be perfectly measured), leveraging data generated from an auxiliary system that shares ``similar" dynamics. We provide insights on the benefits of using the auxiliary data, and guidelines on selecting the weight parameter during the model training process. Subsequently, we consider the system identification problem for a partially observed autonomous linear system, where only a subset of states and multiple short trajectories of the system can be observed. We present a finite sample error bound and characterize the learning rate. </p><p dir="ltr">In the second part of this thesis, we explore the practical usage of finite sample analysis under several different scenarios. We first consider a parameter learning problem in a distributed setting, where a group of agents wishes to collaboratively learn the underlying model. We propose a distributed parameter estimation algorithm and provide finite time bounds on the estimation error. We show that our analysis allows us to determine a time at which the communication can be stopped (due to the costs associated with communications), while meeting a desired estimation accuracy. Subsequently, we consider the problem of online change point detection for a linear system, where the user observes data in an online manner, and the goal is to determine when the underlying system dynamics change. We provide an online change point detection algorithm, and a data-dependent threshold that allows one to achieve a pre-specified upper bound on the probability of making a false alarm. We further provide a finite-sample-based lower bound for the probability of detecting a change point with a certain delay.</p><p dir="ltr">Finally, we extend the results to linear model identification from non-linear systems. We provide a data acquisition algorithm followed by a regularized least squares algorithm, along with an associated finite sample error bound on the learned linearized dynamics. Our error bound demonstrates a trade-off between the error due to nonlinearity and the error due to noise, and shows that one can learn the linearized dynamics with arbitrarily small error given sufficiently many samples.</p> Signal processing Control engineering Distributed systems and algorithms System Identification Machine Learning Multi Agent System
4	Parameterized Verification and Synthesis for Distributed Agreement-Based Systems Nouraldin Jaber (13796296) 19 September 2022 (has links) <p> </p> <p>Distributed agreement-based systems use common distributed agreement protocols such as leader election and consensus as building blocks for their target functionality—processes in these systems may need to agree on a leader, on the members of a group, on owners of locks, or on updates to replicated data. Such distributed agreement-based systems are common and potentially permit modular, scalable verification approaches that mimic their modular design. Interestingly, while there are many verification efforts that target agreement protocols themselves, little attention has been given to distributed agreement-based systems that build on top of these protocols. </p> <p>In this work, we aim to develop a fully-automated, modular, and usable parameterized verification approach for distributed agreement-based systems. To do so, we need to overcome the following challenges. First, the fully automated parameterized verification problem, i.e, the problem of algorithmically checking if the system is correct for any number of processes, is a well-known <em>undecidable </em>problem. Second, to enable modular verification that leverages the inherently-modular nature of these agreement-based systems, we need to be able to support <em>abstractions </em>of agreement protocols. Such abstractions can replace the agreement protocols’ implementations when verifying the overall system; enabling modular reasoning. Finally, even when the verification is fully automated, a system designer still needs assistance in <em>modeling </em>their distributed agreement-based systems. </p> <p>We systematically tackle these challenges through the following contributions. </p> <p>First, we support efficient, decidable verification of distributed agreement-based systems by developing a computational model—the GSP model—for reasoning about distributed (agreement-based) systems that admits decidability and <em>cutoff </em>results. Cutoff results enable practical verification by reducing the parameterized verification problem to the verification problem of a system with a fixed, finite number of processes. The GSP model supports generalized communication primitives and global guards, both of which are essential to enable abstractions of agreement protocols. </p> <p>Then, we address the usability and modularity aspects by developing a framework, QuickSilver, tailored for modeling and modular parameterized verification of distributed agreement-based systems. QuickSilver provides an intuitive domain-specific language, called Mercury, that is equipped with two agreement primitives capable of abstracting away agreement protocols when modeling agreement-based systems; enabling modular verification. QuickSilver extends the decidability and cutoff results of the GSP model to provide fully automated, efficient parameterized verification for a large class of systems modeled in Mercury. </p> <p>Finally, we leverage synthesis techniques to further enhance the usability of our approach and propose Cinnabar, a tool that supports synthesis of distributed agreement-based systems with efficiently-decidable parameterized verification. Cinnabar allows a system de- signer to provide a sketch of their Mercury model and uses a counterexample-guided synthesis procedure to search for model completions that both belong to the efficiently-decidable fragment of Mercury and are correct. </p> <p>We evaluate our contributions on various interesting distributed agreement-based systems adapted from real-world applications, such as a data store, a lock service, a surveillance system, a pathfinding algorithm for mobile robots, and more. </p> Distributed systems and algorithms Programming languages Distributed Systems Parameterized Verification Agreement-Based Systems
5	Efficient in-situ workflows for time-critical applications on heterogeneous ecosystems Item Feng Li (16627272) 21 July 2023 (has links) <p>In-situ workflows are a special class of scientific workflows, where different component applications (such as simulation, visualization, analysis) run concurrently, and data flows continuously between components during the whole workflow lifetime. Traditionally, simulations write large amounts of output data to persistent storage, which are later read for future analysis/visualization. In comparison, in-situ workflows allow analysis/visualization components to consume simulation data while the simulations are still running and thus reduce the I/O overhead. There are recent research works that focus on providing data transport libraries to help compose a group of applications into an integral in-situ workflow. However, only a few ``performance-oriented'' studies exist for in-situ workflows, and most of these works focus on workflows with simple structures (e.g., single producer and single consumer), also without consideration of heterogeneous environments for in-situ workflows. Being able to efficiently utilize heterogeneous computing resources such as multiple Clouds and HPCs can significantly accelerate real-world in-situ workflows, and benefit applications that require both significant computation power and real-time outputs(e.g., identifying abnormal patterns in fluid dynamics). The goal of this dissertation is to provide resource planning algorithms and runtime support, to improve in-situ workflow performance on heterogeneous environments.</p> <p><br></p> <p>This dissertation first investigates the emerging applications of in-situ workflows, which usually include parallel simulation, visualization, and analysis components. Two representative real-world in-situ workflows are studied in details-- a real-time CFD machine learning/visualization workflow and a wildfire spreading workflow. These workflows showcase the capability of in-situ workflows: e.g., decoupled and accelerated computation and fast near-real-time response time, however, there is a lack of resource planning and runtime support for general in-situ workflows. For resource planning, I first formulate the optimization problem, and then design and implement a heuristic algorithm called ``SNL'' (Scheduled-Neighbor-Lookup). SNL considers the pipelined execution pattern of in-situ workflows, and guides the resource planning of complex in-situ workflows to achieve higher workflow throughput. For the runtime support, I design and implement the ``INSTANT'' runtime framework, a runtime framework to configure, plan, launch, and monitor in-situ workflows for distributed computing environments. INSTANT provides intuitive interfaces to compose abstract in-situ workflows, manages in-site and cross-site data transfers with ADIOS2, and supports resource planning using profiled performance data. Experiments with the two use cases show that INSTANT can efficiently streamline the orchestration of complex in-situ workflows, and the resource planning capability allows INSTANT to plan and carry out fast workflow execution at different computing resource availabilities.</p> Distributed systems and algorithms High performance computing in-situ workflows scientific workflows high-performance computing resource planning runtime system
6	Performance and Cost Optimization for Distributed Cloud-native Systems Ashraf Y Mahgoub (13169517) 28 July 2022 (has links) <p> First, NoSQL data-stores provide a set of features that is demanded by high perfor?mance computing (HPC) applications such as scalability, availability and schema flexibility. High performance computing (HPC) applications, such as metagenomics and other big data systems, need to store and analyze huge volumes of semi-structured data. Such applica?tions often rely on NoSQL-based datastores, and optimizing these databases is a challenging endeavor, with over 50 configuration parameters in Cassandra alone. As the application executes, database workloads can change rapidly over time (e.g. from read-heavy to write-heavy), and a system tuned for one phase of the workload becomes suboptimal when the workload changes. </p> Cloud computing Distributed systems and algorithms Operating systems performance tuning
7	EFFICIENT AND PRODUCTIVE GPU PROGRAMMING Mengchi Zhang (13109886) 28 July 2022 (has links) <p> </p> <p>Productive programmable accelerators, like GPUs, have been developed for generations to support programming features. The ever-increasing performance improves the usability of programming features on GPUs, and these programming features further ease the porting of code and data structure from CPU to GPU. However, GPU programming features, such as function call or runtime polymorphism, have not been well explored or optimized.</p> <p>I identify efficient and productive GPU programming as a potential area to exploit. Although many programming paradigms are well studied and efficiently supported on CPU architectures, their performance on novel accelerators, like GPUs, has never been studied, evaluated, and made perfect. For instance, programming with functions is a commonplace programming paradigm that shapes software programs with modularity and simplifies code with reusability. A large amount of work has been proposed to alleviate function calling overhead on CPUs, however, less paper talked about its deficiencies on GPUs. On the other hand, polymorphism amplifies an object’s behaviors at runtime. A body of work targets</p> <p>efficient polymorphism on CPUs, but no work has ever discussed this feature under GPU contexts.</p> <p><br></p> <p>In this dissertation, I discussed those two programming features on GPU architectures. First, I performed the first study to identify the deficiency of GPU polymorphism. I created micro-benchmarks to evaluate virtual function overhead in controlled settings and the first GPU polymorphic benchmark suite, ParaPoly, to investigate real-world scenarios. The micro-benchmarks indicated that the virtual function overhead is usually negligible but can</p> <p>cause up to a 7x slowdown. Virtual functions in ParaPoly show a geometric meaning of 77% overhead on GPUs compared to the function’s inlined version. Second, I proposed two novel techniques that determine an object’s type only by its address pointer to improve GPU polymorphism. The first technique, Coordinated Object</p> <p>Allocation and function Lookup (COAL) is a software-only technique that uses the object’s address to determine its type. The second technique, TypePointer, needs hardware modification to embed the object’s type information into its address pointer. COAL achieves 80% and 6% improvements, and TypePointer achieves 90% and 12% over contemporary CUDA and our type-based SharedOA.</p> <p>Considering the growth of GPU programs, function calls become a pervasive paradigm to be consistently used on GPUs. I also identified the overhead of excessive register spilling with function calls on GPU. To diminish this cost, I proposed a novel Massively Multithreaded Register Windowing technique with Variable Size Register Window and Register-Conscious Warp Scheduling. Our techniques improve the representative workloads with a geometric</p> <p>mean of 1.18x with only 1.8% hardware storage overhead.</p> Digital processor architectures Distributed systems and algorithms Operating systems Programming languages GPU Programmability Function Virtual Function Polymorphism Object-Oriented Programming
8	Scalable and Energy-Efficient SIMT Systems for Deep Learning and Data Center Microservices Mahmoud Khairy A. Abdallah (12894191) 04 July 2022 (has links) <p> </p> <p>Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transistors per chip have reached their breaking point. As a result, High-Performance Computing (HPC) domain and cloud data centers are encountering significant energy, cost, and environmental hurdles that have led them to embrace custom hardware/software solutions. Single Instruction Multiple Thread (SIMT) accelerators, like Graphics Processing Units (GPUs), are compelling solutions to achieve considerable energy efficiency while still preserving programmability in the twilight of Moore’s Law.</p> <p>In the HPC and Deep Learning (DL) domain, the death of single-chip GPU performance scaling will usher in a renaissance in multi-chip Non-Uniform Memory Access (NUMA) scaling. Advances in silicon interposers and other inter-chip signaling technology will enable single-package systems, composed of multiple chiplets that continue to scale even as per-chip transistors do not. Given this evolving, massively parallel NUMA landscape, the placement of data on each chiplet, or discrete GPU card, and the scheduling of the threads that use that data is a critical factor in system performance and power consumption.</p> <p>Aside from the supercomputer space, general-purpose compute units are still the main driver of data center’s total cost of ownership (TCO). CPUs consume 60% of the total data center power budget, half of which comes from the CPU pipeline’s frontend. Coupled with the hardware efficiency crisis is an increased desire for programmer productivity, flexible scalability, and nimble software updates that have led to the rise of software microservices. Consequently, single servers are now packed with many threads executing the same, relatively small task on different data.</p> <p>In this dissertation, I discuss these new paradigm shifts, addressing the following concerns: (1) how do we overcome the non-uniform memory access overhead for next-generation multi-chiplet GPUs in the era of DL-driven workloads?; (2) how can we improve the energy efficiency of data center’s CPUs in the light of microservices evolution and request similarity?; and (3) how to study such rapidly-evolving systems with an accurate and extensible SIMT performance modeling?</p> Distributed systems and algorithms Operating systems Programming languages SIMT Deep Learning Microservices Systems GPU computing Data Center Energy Efficiency
9	New Approaches Towards Online, Distributed, and Robust Learning of Statistical Properties of Data Tong Yao (16644750) 07 August 2023 (has links) <p>In this thesis, we present algorithms to allow agents to estimate certain properties in a robust, online, and distributed manner. Each agent receives a sequence of observations, and through communication, collectively infers properties of the data gathered by all agents by communicating.</p> <p><br></p> <p>In the first part of the thesis, we provide algorithms to infer the correlations between interacting entities from these large datasets. Gaussian graphical models have been well studied to represent the relationships between the various random variables which generate data, and numerous algorithms have been proposed to learn the dependencies in such models. However, existing algorithms typically process data in a batch at a central location, limiting their applications in scenarios where data arrive in real-time and are gathered by different agents. </p> <p><br></p> <p>To address these challenges, first, we propose an online sparse inverse covariance algorithm to infer the static network structure (i.e., dependencies between nodes) in real-time from time-series data, in a centralized location. Subsequently, we propose a distributed algorithm to cooperatively learn the network structure in real-time from data collected by distributed agents. We characterize the theoretical convergence properties and provide simulations using synthetic datasets and real-world hurricane Twitter datasets in disaster management applications. </p> <p><br></p> <p>The second part of this thesis addresses the robustness of online and distributed learning under arbitrary data corruption. We propose online and distributed algorithms for robust mean, covariance, and sparse inverse covariance estimation. These algorithms are capable of operating effectively even in the presence of adversarial data attacks. We provide theoretical bounds on the error and rate of convergence of these methods and evaluate their performance under various settings.</p> <p><br></p> <p>Finally, we consider the problem of classification with a network of heterogeneous and partially informative agents, each receiving local data from an underlying true class, and equipped with a classifier that only distinguishes between a subset of the entire set of classes. We propose an iterative algorithm that uses the posterior probabilities of any classifier and recursively updates each agent's local belief based on its local signals and belief information from its neighbors. We then adopt a novel distributed min-rule to update each agent’s global belief and enable learning of the true class for all agents. We analyze the convergence properties of our proposed algorithm, and subsequently, demonstrate and compare its performance with local averaging and global average consensus through simulations and with a visual image dataset.</p> Signal processing Control engineering Distributed systems and algorithms Online Learning Robust Learning Distributed Learning Multi Agent System Machine Learning Control systems & control theory Statistical Inference
10	HDArray: PARALLEL ARRAY INTERFACE FOR DISTRIBUTED HETEROGENEOUS DEVICES Hyun Dok Cho (18620491) 30 May 2024 (has links) <p dir="ltr">Heterogeneous clusters with nodes containing one or more accelerators, such as GPUs, have become common. While MPI provides inter-address space communication, and OpenCL provides a process with access to heterogeneous computational resources, programmers are forced to write hybrid programs that manage the interaction of both of these systems. This paper describes an array programming interface that provides users with automatic and manual distributions of data and work. Using work distribution and kernel def and use information, communication among processes and devices in a process is performed automatically. By providing a unified programming model to the user, program development is simplified.</p> Distributed systems and algorithms High performance computing Programming languages Distributed Shared Memory Parallel programming (Computer science) Heterogeneous Systems MPI communication OpenCL programming models Array Programs

Search results