1 |
Operating system support for warehouse-scale computingSchwarzkopf, Malte January 2018 (has links)
Modern applications are increasingly backed by large-scale data centres. Systems software in these data centre environments, however, faces substantial challenges: the lack of uniform resource abstractions makes sharing and resource management inefficient, infrastructure software lacks end-to-end access control mechanisms, and work placement ignores the effects of hardware heterogeneity and workload interference. In this dissertation, I argue that uniform, clean-slate operating system (OS) abstractions designed to support distributed systems can make data centres more efficient and secure. I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources. First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. Translucent abstractions free users from having to understand implementation details, but enable introspection for performance optimisation. Fine-grained access control is supported by combining storable, communicable identifier capabilities, and context-dependent, ephemeral handle capabilities. Finally, multi-phase I/O requests implement optimistically concurrent access to objects while supporting diverse application-level consistency policies. Second, I present the DIOS operating system, an implementation of my model as an extension to Linux. The DIOS system call API is centred around distributed objects, globally resolvable names, and translucent references that carry context-sensitive object meta-data. I illustrate how these concepts support distributed applications, and evaluate the performance of DIOS in microbenchmarks and a data-intensive MapReduce application. I find that it offers improved, finegrained isolation of resources, while permitting flexible sharing. Third, I present the Firmament cluster scheduler, which generalises prior work on scheduling via minimum-cost flow optimisation. Firmament can flexibly express many scheduling policies using pluggable cost models; it makes high-quality placement decisions based on fine-grained information about tasks and resources; and it scales the flow-based scheduling approach to very large clusters. In two case studies, I show that Firmament supports policies that reduce colocation interference between tasks and that it successfully exploits flexibility in the workload to improve the energy efficiency of a heterogeneous cluster. Moreover, my evaluation shows that Firmament scales the minimum-cost flow optimisation to clusters of tens of thousands of machines while still making sub-second placement decisions.
|
2 |
Flexible and efficient computation in large data centresGog, Ionel Corneliu January 2018 (has links)
Increasingly, online computer applications rely on large-scale data analyses to offer personalised and improved products. These large-scale analyses are performed on distributed data processing execution engines that run on thousands of networked machines housed within an individual data centre. These execution engines provide, to the programmer, the illusion of running data analysis workflows on a single machine, and offer programming interfaces that shield developers from the intricacies of implementing parallel, fault-tolerant computations. Many such execution engines exist, but they embed assumptions about the computations they execute, or only target certain types of computations. Understanding these assumptions involves substantial study and experimentation. Thus, developers find it difficult to determine which execution engine is best, and even if they did, they become “locked in” because engineering effort is required to port workflows. In this dissertation, I first argue that in order to execute data analysis computations efficiently, and to flexibly choose the best engines, the way we specify data analysis computations should be decoupled from the execution engines that run the computations. I propose an architecture for decoupling data processing, together with Musketeer, my proof-of-concept implementation of this architecture. In Musketeer, developers express data analysis computations using their preferred programming interface. These are translated into a common intermediate representation from which code is generated and executed on the most appropriate execution engine. I show that Musketeer can be used to write data analysis computations directly, and these can execute on many execution engines because Musketeer automatically generates code that is competitive with optimised hand-written implementations. The diverse execution engines cause different workflow types to coexist within a data centre, opening up both opportunities for sharing and potential pitfalls for co-location interference. However, in practice, workflows are either placed by high-quality schedulers that avoid co-location interference, but choose placements slowly, or schedulers that choose placements quickly, but with unpredictable workflow run time due to co-location interference. In this dissertation, I show that schedulers can choose high-quality placements with low latency. I develop several techniques to improve Firmament, a high-quality min-cost flow-based scheduler, to choose placements quickly in large data centres. Finally, I demonstrate that Firmament chooses placements at least as good as other sophisticated schedulers, but at the speeds associated with simple schedulers. These contributions enable more efficient and effective use of data centres for large-scale computation than current solutions.
|
3 |
Scheduling with Space-Time Soft Constraints In Heterogeneous Cloud DatacentersTumanov, Alexey 01 August 2016 (has links)
Heterogeneity in modern datacenters is on the rise, in hardware resource characteristics, in workload characteristics, and in dynamic characteristics (e.g., a memoryresident copy of input data). As a result, which machines are assigned to a given job can have a significant impact. For example, a job may run faster on the same machine as its input data or with a given hardware accelerator, while still being runnable on other machines, albeit less efficiently. Heterogeneity takes on more complex forms as sets of resources differ in the level of performance they deliver, even if they consist of identical individual units, such as with rack-level locality. We refer to this as combinatorial heterogeneity. Mixes of jobs with strict SLOs on completion time and increasingly available runtime estimates in production datacenters deepen the challenge of matching the right resources to the right workloads at the right time. In this dissertation, we hypothesize that it is possible and beneficial to simultaneously leverage all of this information in the form of declaratively specified spacetime soft constraints. To accomplish this, we first design and develop our principal building block—a novel Space-Time Request Language (STRL). It enables the expression of jobs’ preferences and flexibility in a general, extensible way by using a declarative, composable, intuitive algebraic expression structure. Second, building on the generality of STRL, we propose an equally general STRL Compiler that automatically compiles STRL expressions into Mixed Integer Linear Programming (MILP) problems that can be aggregated and solved to maximize the overall value of shared cluster resources. These theoretical contributions form the foundation for the system we architect, called TetriSched, that instantiates our conceptual contributions: (a) declarative soft constraints, (b) space-time soft constraints, (c) combinatorial constraints, (d) orderless global scheduling, and (e) in situ preemption. We also propose a set of mechanisms that extend the scope and the practicality of TetriSched’s deployment by analyzing and improving on its scalability, enabling and studying the efficacy of preemption, and featuring a set of runtime mis-estimation handling mechanisms to address runtime prediction inaccuracy. In collaboration with Microsoft, we adapt some of these ideas as we design and implement a heterogeneity-aware resource reservation system called Aramid with support for ordinal placement preferences targeting deployment in production clusters at Microsoft scale. A combination of simulation and real cluster experiments with synthetic and production-derived workloads, a range of workload intensities, degrees of burstiness, preference strengths, and input inaccuracies support our hypothesis that leveraging space-time soft constraints (a) significantly improves scheduling quality and (b) is possible to achieve in a practical deployment.
|
4 |
Integrated Scheduling For Clustered VLIW ProcessorsNagpal, Rahul 12 1900 (has links)
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Scheduling for clustered architectures involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). Various clustered VLIW configurations, connectivity types, and inter-cluster communication models present different performance trade-offs to a scheduler. The scheduler is responsible for resolving the conflicting requirements of exploiting the parallelism offered by the hardware and limiting the communication among clusters to achieve better performance.
Earlier proposals for cluster scheduling fall into two main categories, viz., phase-decoupled scheduling and phase-coupled scheduling and they focus on clustered architectures which provide inter-cluster communication by an explicit inter-cluster copy operation. However, modern commercial clustered architectures provide snooping capabilities (apart from the support for inter-cluster communication using an explicit MV operation) by allowing some of the functional units to read operands from the register file of some of the other clusters without any extra delay. The phase-decoupled approach of scheduling suffers from the well known phase-ordering problem which becomes severe for such a machine model (with snooping) because communication and resource constraints are tightly coupled and thus are exposed only during scheduling. Tight integration of communication and resource constraints further requires taking into account the resource and communication requirements of other instructions ready to be scheduled in the current cycle while binding an instruction, in order to carry out effective binding. However, earlier proposals on integrated scheduling consider instructions and clusters for binding using a fixed order and thus they show different widely varying performance characteristics in terms of execution time and code size. Other shortcomings of earlier integrated algorithms (that lead to suboptimal cluster scheduling decisions) are due to non-consideration of future communication (that may arise due to a binding) and functional unit binding.
In this thesis, we propose a pragmatic scheme and also a generic graph matching based framework for cluster scheduling based on a generic and realistic clustered machine model. The proposed scheme effectively utilizes the exact knowledge of available communication slots, functional units, and load on different clusters as well as future resource and communication requirements known only at schedule time to attain significant performance improvement without code size penalty over earlier algorithms. The proposed graph matching based framework for cluster scheduling resolves the phase-ordering and fixed-ordering problem associated with scheduling on clustered VLIW architectures. The framework provides a mechanism to exploit the slack of instructions
by dynamically varying the freedom available in scheduling an instruction and hence the cost of scheduling an instruction using different alternatives to reduce the inter-cluster communication. An experimental evaluation of the proposed framework and some of the earlier proposals is presented in the context of a state-of-art commercial clustered architecture.
|
5 |
EXPLOITING THE SPATIAL DIMENSION OF BIG DATA JOBS FOR EFFICIENT CLUSTER JOB SCHEDULINGAkshay Jajoo (9530630) 16 December 2020 (has links)
With the growing business impact of distributed big data analytics jobs, it has become crucial to optimize their execution and resource consumption. In most cases, such jobs consist of multiple sub-entities called tasks and are executed online in a large shared distributed computing system. The ability to accurately estimate runtime properties and coordinate execution of sub-entities of a job allows a scheduler to efficiently schedule jobs for optimal scheduling. This thesis presents the first study that highlights spatial dimension, an inherent property of distributed jobs, and underscores its importance in efficient cluster job scheduling. We develop two new classes of spatial dimension based algorithms to<br>address the two primary challenges of cluster scheduling. First, we propose, validate, and design two complete systems that employ learning algorithms exploiting spatial dimension. We demonstrate high similarity in runtime properties between sub-entities of the same job by detailed trace analysis on four different industrial cluster traces. We identify design challenges and propose principles for a sampling based learning system for two examples, first for a coflow scheduler, and second for a cluster job scheduler.<br>We also propose, design, and demonstrate the effectiveness of new multi-task scheduling algorithms based on effective synchronization across the spatial dimension. We underline and validate by experimental analysis the importance of synchronization between sub-entities (flows, tasks) of a distributed entity (coflow, data analytics jobs) for its efficient execution. We also highlight that by not considering sibling sub-entities when scheduling something it may also lead to sub-optimal overall cluster performance. We propose, design, and implement a full coflow scheduler based on these assertions.
|
Page generated in 0.1218 seconds