Spelling suggestions: "subject:"distributed computing atemsystem"" "subject:"distributed computing systsystem""
1 |
Distributed Computation With Communication Delays: Design And Analysis Of Load Distribution StrategiesBharadwaj, V 06 1900 (has links)
Load distribution problems in distributed computing networks have attracted much attention in the literature. A major objective in these studies is to distribute the processing load so as to minimize the time of processing of the entire load. In general, the processing load can be indivisible or divisible. An indivisible load has to be processed in its entirety on a single processor. On the other hand, a divisible load can be partitioned and processed on more than one processor. Divisible loads are either modularly divisible or arbitrarily divisible. Modularly divisible loads can be divided into pre-defined modules and cannot be further sub-divided. Further, precedence relations between modules may exist. Arbitrarily divisible loads can be divided into several fractions of arbitrary lengths which usually do not have any precedence relations. Such type of loads are characterized by their large volume and the property that each data element requires an identical and independent processing. One of the important problems here is to obtain an optimal load distribution, which minimizes the processing time when the distribution is subject to communication delays in the interconnecting links. A specific application in which such loads are encountered is in edge-detection of images. Here the given image frame can be arbitrarily divided into many sub-frames and each of these can be independently processed. Other applications include processing of massive experimental data. The problems associated with the distribution of such arbitrarily divisible loads are usually analysed in the framework of what is known as divisible job theory.
The research work reported in this thesis is a contribution in the area of distributing arbitrarily divisible loads in distributed computing systems subject to communication delays. The main objective in this work is to design and analyseload distribution strategies to minimize the processing time of the entire load in a given network. Two types of networks are considered, namely (i) single-level tree (or star) network and (ii) linear network. In both the networks we assume that there is a non-zero delay associated with load transfer. Further, the processors in the network may or may not be equipped with front-ends (Le., communication co-processors). The main contributions in this thesis are summarized below.
First, a mathematical formulation of the load distribution problem in single-level tree and linear networks is presented. In both the networks, it is assumed that there are (m +1) processors and m communication links. In the case of single-level tree networks, the load to be processed is assumed to originate at the root processor, which divides the load into (m +1) fractions, keeps its own share of the load for processing, and distributes the rest to the child processors one at a time and in a fixed sequence. In all the earlier studies in the literature, it had been assumed that for a load distribution to be optimal, it should be such that all the processors must stop computing at the same time. In this thesis, it is shown that this assumption is in general not true, and holds only for a restricted class of single-level tree networks which satisfy a certain condition. The concept of an equivalent network is introduced to obtain a precise formulation of this condition in terms of the processor and link speed parameters. It is shown that this condition can be used to identify processor-link pairs which can be eliminated from a given network (i.e., these processors need not be given any computational load) without degrading its time performance. It is proved that the resultant reduced network (a network from which these inefficient processor-link pairs have been removed) gives the optimal time performance if and only if the load distribution is such that all the processors stop computing at the same time instant. These results are first proved for the case when the root processor is equipped with a front-end and then extended to the case when it is not. In the latter case, an additional condition, between the speed of the root processor and the speed of each of the links, to be satisfied by the network is specified. An optimal sequence for applying these conditions is also obtained. In the case of linear networks the processing load is assumed to originate at the processor situated at one end of the network. Each processor in the network keeps its own load fraction for computing and transmits the rest to its successor. Here too, in all the earlier studies in the literature, it has been assumed that for the processing time to be a minimum, the load distribution must be such that all the processors must stop computing at the same instant in time. Though this condition has been proved by others to be both necessary and sufficient, a different and more rigorous proof, similar to the case of single-level tree network, is presented here. Finally, the effect of inaccurate modelling on the processing time and on the above conditions are discussed through an illustrative example and it is shown that the model adopted in this thesis gives reasonably accurate results.
In the case of single-level tree networks, so far it has been assumed that the root processor distributes the processing load in a fixed sequence. However, since there are m child processors, a total of m! different sequences of load distribution are possible. Using the closed-form derived for the processing time, it is proved here that the optimal sequence of load distribution follows the decreasing order of link speeds. Further, if physical rearrangement of processors and links is allowed, then it is shown that the optimal arrangement follows a decreasing order of link and processor speeds with the fastest processor at the root. The entire analysis is first done for the case when the root processor is equipped with a front-end, and then extended to the case when it is not. In the without front-end case, it is shown that the same optimal sequencing result holds. However, in an optimal arrangement, the root processor need not be the fastest. In this case an algorithm has been proposed for obtaining optimal arrangement. Illustrative examples are given for all the cases considered.
Next, a new strategy of load distribution is proposed by which the processing time obtained in earlier studies can be further minimized. Here the load is distributed by the root processor to a child processor in more than one installment (instead of in a single installment) such that the processing time is further minimized. First; the case in which all the processors are equipped :tn front-ends is considered. Recursive equations are obtained for a heterogeneous network and these are solved for the special case of a homogeneous network (having identical processors and identical links). Using this closed-form solution, the ultimate limits of performance are explored through an asymptotic analysis with respect to the number of installments and number of processors in the network. Trade-off relationships between the number of installments and the number of processors in the network are also presented. These results are then extended to the case when the processors are not equipped with front-ends. Finally, the efficiency of this new strategy of load distribution is demonstrated by comparing it with the existing single-installment strategy in the literature.
The multi-installment strategy explained above is then applied to linear net-As. Here, .the processing load is assumed to originate at one extreme end of the network, First the case when all the processors are equipped with front-ends is considered. Recursive equations for a heterogeneous network are obtained and these are solved for the special case of a homogeneous network. Using this closed form solution, an asymptotic analysis is performed with respect to the number of installments. However, the asymptotic results with respect to the number of processors was obtained computationally since analytical results could not be obtained. It is found that for a given network, once the number of installments is fixed, there is an optimum number of processors to be used in the network, beyond which the time performance degrades. Trade-off relationships between the number of installments and the number of processors is obtained. These results are then extended to the case when the processors are not equipped with front-ends. Comparisions with the existing single-installment strategy is also done.
The single-installment strategy discussed in the literature has the disadvantage that the front-ends of the processors are not utilized efficiently in a linear network. This is due to the fact that a processor starts computing its own load fraction only after the entire load to be communicated through its front-end has been received. In this thesis, a new strategy is proposed in which a processor starts computing as soon as it receives its load fraction, simultaneously allowing its front-end to receive and transmit load to its successors. Recursive equations are developed and solved for the special case of a heterogeneous network in which the processors and links are arranged in the decreasing order of speeds. Further, it is shown that in this strategy, if the processing load originates in the interior of the network, the sequence of load distribution should- be such that the load should be first distributed to the side with a lesser number of processors. An expression for the optimal load origination point in the network is derived. A comparative study of this strategy with an earlier strategy is also presented. Finally, it is shown that even though the analysis is carried out for a special case of a heterogeneous network, this load distribution strategy can also be applied to a linear network in which the processors and links are arbitrarily arranged and still obtain a significant improvement in the time performance.
|
2 |
Novel Concepts In Divisible Load Scheduling With Realistic System ConstraintsSuresh, S 04 1900 (has links) (PDF)
No description available.
|
3 |
Automatic Reasoning Techniques for Non-Serializable Data-Intensive ApplicationsGowtham Kaki (7022108) 14 August 2019 (has links)
<div>
<div>
<div>
<p>The performance bottlenecks in modern data-intensive applications have induced
database implementors to forsake high-level abstractions and trade-off simplicity and
ease of reasoning for performance. Among the first casualties of this trade-off are the
well-known ACID guarantees, which simplify the reasoning about concurrent database
transactions. ACID semantics have become increasingly obsolete in practice due
to serializable isolation – an integral aspect of ACID, being exorbitantly expensive.
Databases, including the popular commercial offerings, default to weaker levels of
isolation where effects of concurrent transactions are visible to each other. Such weak
isolation guarantees, however, are extremely hard to reason about, and have led to
serious safety violations in real applications. The problem is further complicated
in a distributed setting with asynchronous state replications, where high availability
and low latency requirements compel large-scale web applications to embrace weaker
forms of consistency (e.g., eventual consistency) besides weak isolation. Given the
serious practical implications of safety violations in data-intensive applications, there
is a pressing need to extend the state-of-the-art in program verification to reach non-
serializable data-intensive applications operating in a weakly-consistent distributed
setting.
</p>
<p>This thesis sets out to do just that. It introduces new language abstractions, program logics, reasoning methods, and automated verification and synthesis techniques
that collectively allow programmers to reason about non-serializable data-intensive
applications in the same way as their serializable counterparts. The contributions
</p>
</div>
</div>
<div>
<div>
<p>xi
</p>
</div>
</div>
</div>
<div>
<div>
<div>
<p>made are broadly threefold. Firstly, the thesis introduces a uniform formal model to
reason about weakly isolated (non-serializable) transactions on a sequentially consistent (SC) relational database machine. A reasoning method that relates the semantics
of weak isolation to the semantics of the database program is presented, and an automation technique, implemented in a tool called ACIDifier is also described. The
second contribution of this thesis is a relaxation of the machine model from sequential
consistency to a specifiable level of weak consistency, and a generalization of the data
model from relational to schema-less or key-value. A specification language to express
weak consistency semantics at the machine level is described, and a bounded verification technique, implemented in a tool called Q9 is presented that bridges the gap
between consistency specifications and program semantics, thus allowing high-level
safety properties to be verified under arbitrary consistency levels. The final contribution of the thesis is a programming model inspired by version control systems that
guarantees correct-by-construction <i>replicated data types</i> (RDTs) for building complex distributed applications with arbitrarily-structured replicated state. A technique
based on decomposing inductively-defined data types into <i>characteristic relations</i> is
presented, which is used to reason about the semantics of the data type under state
replication, and eventually derive its correct-by-construction replicated variant automatically. An implementation of the programming model, called Quark, on top of
a content-addressable storage is described, and the practicality of the programming
model is demonstrated with help of various case studies.
</p>
</div>
</div>
</div>
|
4 |
HIGHER ORDER OPTIMIZATION TECHNIQUES FOR MACHINE LEARNINGSudhir B. Kylasa (5929916) 09 December 2019 (has links)
<div>
<div>
<div>
<p>First-order methods such as Stochastic Gradient Descent are methods of choice
for solving non-convex optimization problems in machine learning. These methods
primarily rely on the gradient of the loss function to estimate descent direction.
However, they have a number of drawbacks, including converging to saddle points
(as opposed to minima), slow convergence, and sensitivity to parameter tuning. In
contrast, second order methods that use curvature information in addition to the
gradient, have been shown to achieve faster convergence rates, theoretically. When
used in the context of machine learning applications, they offer faster (quadratic)
convergence, stability to parameter tuning, and robustness to problem conditioning.
In spite of these advantages, first order methods are commonly used because of their
simplicity of implementation and low per-iteration cost. The need to generate and
use curvature information in the form of a dense Hessian matrix makes each iteration
of second order methods more expensive.
</p><p><br></p>
<p>In this work, we address three key problems associated with second order methods
– (i) what is the best way to incorporate curvature information into the optimization
procedure; (ii) how do we reduce the operation count of each iteration in a second order method, while maintaining its superior convergence property; and (iii) how do we
leverage high-performance computing platforms to significant accelerate second order
methods. To answer the first question, we propose and validate the use of Fisher
information matrices in second order methods to significantly accelerate convergence.
The second question is answered through the use of statistical sampling techniques
that suitably sample matrices to reduce per-iteration cost without impacting convergence. The third question is addressed through the use of graphics processing units
(GPUs) in distributed platforms to deliver state of the art solvers.</p></div></div></div><div><div><div>
<p>Through our work, we show that our solvers are capable of significant improvement
over state of the art optimization techniques for training machine learning models.
We demonstrate improvements in terms of training time (over an order of magnitude
in wall-clock time), generalization properties of learned models, and robustness to
problem conditioning.
</p>
</div>
</div>
</div>
|
Page generated in 0.1046 seconds