591 |
DecaFS: A Modular Distributed File System to Facilitate Distributed Systems EducationMeth, Halli Elaine 01 June 2014 (has links)
Data quantity, speed requirements, reliability constraints, and other factors encourage industry developers to build distributed systems and use distributed services. Software engineers are therefore exposed to distributed systems and services daily in the workplace. However, distributed computing is hard to teach in Computer Science courses due to the complexity distribution brings to all problem spaces. This presents a gap in education where students may not fully understand the challenges introduced with distributed systems. Teaching students distributed concepts would help better prepare them for industry development work.
DecaFS, Distributed Educational Component Adaptable File System, is a modular distributed file system designed for educational use. The goal of the system is to teach distributed computing concepts to undergraduate and graduate level students by allowing them to develop small, digestible portions of the system. The system is broken up into layers, and each layer is broken up into modules so that students can build or modify different components in small, assignment- sized portions. Students can replace modules or entire layers by following the DecaFS APIs and recompiling the system. This allows the behavior of the DFS (Distributed File System) to change based on student implementation, while providing base functionality for students to work from.
Our implementation includes a code base of core DecaFS Modules that students can work from and basic implementations of non-core DecaFS Modules. Our basic non-core modules can be modified to implement more complex distribution techniques without modifying core modules. We have shown the feasibility of developing a modular DFS, while adhering to requirements such as configurable sizes (file, stripe, chunk) and support of multiple data replication strategies.
|
592 |
Modeling a distributed energy system for California electricity production through 2050Azad, Vikas 01 January 2012 (has links)
Recent research shows that combining distributed generation (DG) with renewable resources will reduce fossil fuel dependency and carbon dioxide (C02) emissions. This thesis presents a framework to evaluate the benefits of DG in terms of C02 emission and transmission line losses with respect to the use of centralized power production through 2050. Due to availability of complete data, Sacramento Municipal Utility District (SMUD) in California is the main focus of this thesis; however other utility companies such as PG&E, SDG&E and SCE are also discussed. The test results based on SMUD show a decrease of about 11% to 4% in line losses when a 500 MW DG is placed at the consumption site. This thesis also shows that by adding a 40 MW DG at the central location, C02 can be reduced by 71% when compared to current standard business practices. By adding 40 MW DG every year near consumers, SMUD can eliminate inhouse electricity generation thus completely eliminating C02 emissions by 2034.
|
593 |
Provably efficient algorithms for decentralized optimizationLiu, Changxin 31 August 2021 (has links)
Decentralized multi-agent optimization has emerged as a powerful paradigm that finds broad applications in engineering design including federated machine learning and control of networked systems. In these setups, a group of agents are connected via a network with general topology. Under the communication constraint, they aim to solving a global optimization problem that is characterized collectively by their individual interests. Of particular importance are the computation and communication efficiency of decentralized optimization algorithms. Due to the heterogeneity of local objective functions, fostering cooperation across the agents over a possibly time-varying network is challenging yet necessary to achieve fast convergence to the global optimum. Furthermore, real-world communication networks are subject to congestion and bandwidth limit. To relieve the difficulty, it is highly desirable to design communication-efficient algorithms that proactively reduce the utilization of network resources. This dissertation tackles four concrete settings in decentralized optimization, and develops four provably efficient algorithms for solving them, respectively.
Chapter 1 presents an overview of decentralized optimization, where some preliminaries, problem settings, and the state-of-the-art algorithms are introduced. Chapter 2 introduces the notation and reviews some key concepts that are useful throughout this dissertation. In Chapter 3, we investigate the non-smooth cost-coupled decentralized optimization and a special instance, that is, the dual form of constraint-coupled decentralized optimization. We develop a decentralized subgradient method with double averaging that guarantees the last iterate convergence, which is crucial to solving decentralized dual Lagrangian problems with convergence rate guarantee. Chapter 4 studies the composite cost-coupled decentralized optimization in stochastic networks, for which existing algorithms do not guarantee linear convergence. We propose a new decentralized dual averaging (DDA) algorithm to solve this problem. Under a rather mild condition on stochastic networks, we show that the proposed DDA attains an $\mathcal{O}(1/t)$ rate of convergence in the general case and a global linear rate of convergence if each local objective function is strongly convex. Chapter 5 tackles the smooth cost-coupled decentralized constrained optimization problem. We leverage the extrapolation technique and the average consensus protocol to develop an accelerated DDA algorithm. The rate of convergence is proved to be $\mathcal{O}\left( \frac{1}{t^2}+ \frac{1}{t(1-\beta)^2} \right)$, where $\beta$ denotes the second largest singular value of the mixing matrix. To proactively reduce the utilization of network resources, a communication-efficient decentralized primal-dual algorithm is developed based on the event-triggered broadcasting strategy in Chapter 6. In this algorithm, each agent locally determines whether to generate network transmissions by comparing a pre-defined threshold with the deviation between the iterates at present and lastly broadcast. Provided that the threshold sequence is summable over time, we prove an $\mathcal{O}(1/t)$ rate of convergence for convex composite objectives. For strongly convex and smooth problems, linear convergence is guaranteed if the threshold sequence is diminishing geometrically. Finally, Chapter 7 provides some concluding remarks and research directions for future study. / Graduate
|
594 |
HopsWorks : A project-based access control model for HadoopMoré, Andre, Gebremeskel, Ermias January 2015 (has links)
The growth in the global data gathering capacity is producing a vast amount of data which is getting vaster at an increasingly faster rate. This data properly analyzed can represent great opportunity for businesses, but processing it is a resource-intensive task. Sharing can increase efficiency due to reusability but there are legal and ethical questions that arise when data is shared. The purpose of this thesis is to gain an in depth understanding of the different access control methods that can be used to facilitate sharing, and choose one to implement on a platform that lets user analyze, share, and collaborate on, datasets. The resulting platform uses a project based access control on the API level and a fine-grained role based access control on the file system to give full control over the shared data to the data owner. / I dagsläget så genereras och samlas det in oerhört stora mängder data som växer i ett allt högre tempo för varje dag som går. Den korrekt analyserade datan skulle kunna erbjuda stora möjligheter för företag men problemet är att det är väldigt resurskrävande att bearbeta. Att göra det möjligt för organisationer att dela med sig utav datan skulle effektivisera det hela tack vare återanvändandet av data men det dyker då upp olika frågor kring lagliga samt etiska aspekter när man delar dessa data. Syftet med denna rapport är att få en djupare förståelse för dom olika åtkomstmetoder som kan användas vid delning av data för att sedan kunna välja den metod som man ansett vara mest lämplig att använda sig utav i en plattform. Plattformen kommer att användas av användare som vill skapa projekt där man vill analysera, dela och arbeta med DataSets, vidare kommer plattformens säkerhet att implementeras med en projekt-baserad åtkomstkontroll på API nivå och detaljerad rollbaserad åtkomstkontroll på filsystemet för att ge dataägaren full kontroll över den data som delas
|
595 |
Overcoming local optima in control and optimization of cooperative multi-agent systemsWelikala, Shirantha 15 May 2021 (has links)
A cooperative multi-agent system is a collection of interacting agents deployed in a mission space where each agent is allowed to control its local state so that the fleet of agents collectively optimizes a common global objective. While optimization problems associated with multi-agent systems intend to determine the fixed set of globally optimal agent states, control problems aim to obtain the set of globally optimal agent controls. Associated non-convexities in these problems result in multiple local optima. This dissertation explores systematic techniques that can be deployed to either escape or avoid poor local optima while in search of provably better (still local) optima.
First, for multi-agent optimization problems with iterative gradient-based solutions, a distributed approach to escape local optima is proposed based on the concept of boosting functions. These functions temporarily transform gradient components at a local optimum into a set of boosted non-zero gradient components in a systematic manner so that it is more effective compared to the methods where gradient components are randomly perturbed. A novel variable step size adjustment scheme is also proposed to establish the convergence of this distributed boosting process. Developed boosting concepts are successfully applied to the class of coverage problems.
Second, as a means of avoiding convergence to poor local optima in multi-agent optimization, the use of greedy algorithms in generating effective initial conditions is explored. Such greedy methods are computationally cheap and can often exploit submodularity properties of the problem to provide performance bound guarantees to the obtained solutions. For the class of submodular maximization problems, two new performance bounds are proposed and their effectiveness is illustrated using the class of coverage problems.
Third, a class of multi-agent control problems termed Persistent Monitoring on Networks (PMN) is considered where a team of agents is traversing a set of nodes (targets) interconnected according to a network topology aiming to minimize a measure of overall node state. For this class of problems, a gradient-based parametric control solution developed in a prior work relies heavily on the initial selection of its `parameters' which often leads to poor local optima. To overcome this initialization challenge, the PMN system's asymptotic behavior is analyzed, and an off-line greedy algorithm is proposed to systematically generate an effective set of initial parameters.
Finally, for the same class of PMN problems, a computationally efficient distributed on-line Event-Driven Receding Horizon Control (RHC) solution is proposed as an alternative. This RHC solution is parameter-free as it automatically optimizes its planning horizon length and gradient-free as it uses explicitly derived solutions for each RHC problem invoked at each agent upon each event of interest. Hence, unlike the gradient-based parametric control solutions, the proposed RHC solution does not force the agents to converge to one particular behavior that is likely to be a poor local optimum. Instead, it keeps the agents actively searching for the optimum behavior.
In each of these four parts of the thesis, an interactive simulation platform is developed (and made available online) to generate extensive numerical examples that highlight the respective contributions made compared to the state of the art.
|
596 |
Estimation et optimisation distribuée dans les réseaux asynchrones / Distributed estimation and optimization in asynchronous networksIutzeler, Franck 06 December 2013 (has links)
Cette thèse s’intéresse au problème d’estimation et d’optimisation distribuée dans les réseaux asynchrones, c’est à dire en n’utilisant que des communication locales et asynchrones. A partir de multiples applications allant de l’apprentissage automatique aux réseaux de capteurs sans-fils, nous concevons et analysons théoriquement de nouveaux algorithmes résolvant trois problèmes de nature très différentes : la propagation de la plus grande des valeurs initiales, l’estimation de leur moyenne et enfin l’optimisation distribuée. / This thesis addresses the distributed estimation and optimization of a global value of interest over a network using only local and asynchronous (sometimes wireless) communications. Motivated by many different applications ranging from cloud computing to wireless sensor networks via machine learning, we design new algorithms and theoretically study three problems of very different nature : the propagation of the maximal initial value, the estimation of their average and finally distributed optimization.
|
597 |
Conservation Voltage Reduction of Active Distribution Systems with Networked MicrogridsConstante Flores, Gonzalo Esteban 12 October 2018 (has links)
No description available.
|
598 |
Coded Computation for Speeding up Distributed Machine LearningWang, Sinong 11 July 2019 (has links)
No description available.
|
599 |
Model-Based Autonomic Performance Management of Distributed Enterprise Systems and ApplicationsMehrotra, Rajat 14 December 2013 (has links)
Distributed computing systems (DCS) host a wide variety of enterprise applications in dynamic and uncertain operating environments. These applications require stringent reliability, availability, and quality of service (QoS) guarantee to maintain their service level agreements (SLAs). Due to the growing size and complexity of DCS, an autonomic performance management system is required to maintain SLAs of these applications. A model-based autonomic performance management structure is developed in this dissertation for applications hosted in DCS. A systematic application performance modeling approach is introduced in this dissertation to define the dependency relationships among the system parameters, which impact the application performance. The developed application performance model is used by a model-based predictive controller for managing multi-dimensional QoS objectives of the application. A distributed control structure is also developed to provide scalability for performance management and to eliminate the requirement of approximate behavior modeling in the hierarchical arrangement of DCS. A distributed monitoring system is also introduced in this dissertation to keep track of computational resources utilization, application performance statistics, and scientific application execution in a DCS, with minimum latency and controllable resource overhead. The developed monitoring system is self-configuring, self-aware, and fault-tolerant. It can also be deployed for monitoring of DCS with heterogeneous computing systems. A configurable autonomic performance management system is developed using modelintegrated computing methodologies, which allow administrators to define the initial settings of the application, QoS objectives, system components’ placement, and interaction among these components in a graphical domain specific modeling environment. This configurable performance management system facilitates reusability of the same components, algorithms, and application performance models in different deployment settings.
|
600 |
Ablation Programming for Machine LearningSheikholeslami, Sina January 2019 (has links)
As machine learning systems are being used in an increasing number of applications from analysis of satellite sensory data and health-care analytics to smart virtual assistants and self-driving cars they are also becoming more and more complex. This means that more time and computing resources are needed in order to train the models and the number of design choices and hyperparameters will increase as well. Due to this complexity, it is usually hard to explain the effect of each design choice or component of the machine learning system on its performance.A simple approach for addressing this problem is to perform an ablation study, a scientific examination of a machine learning system in order to gain insight on the effects of its building blocks on its overall performance. However, ablation studies are currently not part of the standard machine learning practice. One of the key reasons for this is the fact that currently, performing an ablation study requires major modifications in the code as well as extra compute and time resources.On the other hand, experimentation with a machine learning system is an iterative process that consists of several trials. A popular approach for execution is to run these trials in parallel, on an Apache Spark cluster. Since Apache Spark follows the Bulk Synchronous Parallel model, parallel execution of trials includes several stages, between which there will be barriers. This means that in order to execute a new set of trials, all trials from the previous stage must be finished. As a result, we usually end up wasting a lot of time and computing resources on unpromising trials that could have been stopped soon after their start.We have attempted to address these challenges by introducing MAGGY, an open-source framework for asynchronous and parallel hyperparameter optimization and ablation studies with Apache Spark and TensorFlow. This framework allows for better resource utilization as well as ablation studies and hyperparameter optimization in a unified and extendable API. / Eftersom maskininlärningssystem används i ett ökande antal applikationer från analys av data från satellitsensorer samt sjukvården till smarta virtuella assistenter och självkörande bilar blir de också mer och mer komplexa. Detta innebär att mer tid och beräkningsresurser behövs för att träna modellerna och antalet designval och hyperparametrar kommer också att öka. På grund av denna komplexitet är det ofta svårt att förstå vilken effekt varje komponent samt designval i ett maskininlärningssystem har på slutresultatet.En enkel metod för att få insikt om vilken påverkan olika komponenter i ett maskinlärningssytem har på systemets prestanda är att utföra en ablationsstudie. En ablationsstudie är en vetenskaplig undersökning av maskininlärningssystem för att få insikt om effekterna av var och en av dess byggstenar på dess totala prestanda. Men i praktiken så är ablationsstudier ännu inte vanligt förekommande inom maskininlärning. Ett av de viktigaste skälen till detta är det faktum att för närvarande så krävs både stora ändringar av koden för att utföra en ablationsstudie, samt extra beräkningsoch tidsresurser.Vi har försökt att ta itu med dessa utmaningar genom att använda en kombination av distribuerad asynkron beräkning och maskininlärning. Vi introducerar maggy, ett ramverk med öppen källkodsram för asynkron och parallell hyperparameteroptimering och ablationsstudier med PySpark och TensorFlow. Detta ramverk möjliggör bättre resursutnyttjande samt ablationsstudier och hyperparameteroptimering i ett enhetligt och utbyggbart API.
|
Page generated in 0.0644 seconds