Spelling suggestions: "subject:"datacenter""
1 |
Operating system support for warehouse-scale computingSchwarzkopf, Malte January 2018 (has links)
Modern applications are increasingly backed by large-scale data centres. Systems software in these data centre environments, however, faces substantial challenges: the lack of uniform resource abstractions makes sharing and resource management inefficient, infrastructure software lacks end-to-end access control mechanisms, and work placement ignores the effects of hardware heterogeneity and workload interference. In this dissertation, I argue that uniform, clean-slate operating system (OS) abstractions designed to support distributed systems can make data centres more efficient and secure. I present a novel distributed operating system for data centres, focusing on two OS components: the abstractions for resource naming, management and protection, and the scheduling of work to compute resources. First, I introduce a reference model for a decentralised, distributed data centre OS, based on pervasive distributed objects and inspired by concepts in classic 1980s distributed OSes. Translucent abstractions free users from having to understand implementation details, but enable introspection for performance optimisation. Fine-grained access control is supported by combining storable, communicable identifier capabilities, and context-dependent, ephemeral handle capabilities. Finally, multi-phase I/O requests implement optimistically concurrent access to objects while supporting diverse application-level consistency policies. Second, I present the DIOS operating system, an implementation of my model as an extension to Linux. The DIOS system call API is centred around distributed objects, globally resolvable names, and translucent references that carry context-sensitive object meta-data. I illustrate how these concepts support distributed applications, and evaluate the performance of DIOS in microbenchmarks and a data-intensive MapReduce application. I find that it offers improved, finegrained isolation of resources, while permitting flexible sharing. Third, I present the Firmament cluster scheduler, which generalises prior work on scheduling via minimum-cost flow optimisation. Firmament can flexibly express many scheduling policies using pluggable cost models; it makes high-quality placement decisions based on fine-grained information about tasks and resources; and it scales the flow-based scheduling approach to very large clusters. In two case studies, I show that Firmament supports policies that reduce colocation interference between tasks and that it successfully exploits flexibility in the workload to improve the energy efficiency of a heterogeneous cluster. Moreover, my evaluation shows that Firmament scales the minimum-cost flow optimisation to clusters of tens of thousands of machines while still making sub-second placement decisions.
|
2 |
Optimisation énergétique du rafraichissement des datacenters / Energy optimization of datacenters cooling processDurand-Estebe, Baptiste 04 July 2014 (has links)
De nos, jours avec la démocratisation des équipements électroniques et l’explosions des services informatiques proposés sur le web, la consommation des datacenters devient un enjeu énergétique et économique majeur. Ce terme qui peut être traduit par « centre de calcul », désigne les infrastructures qui hébergent et font fonctionner en permanence des serveurs informatiques. Son rôle est de fournir aux équipements électroniques un environnement thermique adapté, ainsi qu’une alimentation électrique stable de manière à assurer une très grande sécurité de fonctionnement. Mais l’activité permanente des serveurs génère de grandes quantités de chaleurs, et un refroidissement permanent est nécessaire. Cette étude à pour objectif de mieux comprendre les phénomènes physiques qui interviennent dans le fonctionnement des datacenters afin d’apporter des solutions pour optimiser leur fonctionnement et diminuer leur consommation. A l’aide de simulations numériques, nous étudions les écoulements d’air et les transferts de chaleur qui interviennent dans la salle informatique, et nous proposons un nouveau modèle numérique qui permet de simuler le comportement des serveurs de nouvelle génération. Puis, grâce à une méthode de type POD, couplée au logiciel TRNSYS, nous développons un modèle « transversal » capable de simuler le fonctionnement complet d’un centre de calcul depuis les équipements informatiques, jusqu’au système de production d’air froid. Finalement, ce dernier est employé pour concevoir et tester un système de régulation adaptatif qui permet de réduire significativement les consommations d’énergie. / Nowadays, with the constant evolution of Information Technology (IT) equipments, the energy consumption of datacenter over the world becomes a major concern. These infrastructures are designed to provide an adapted thermal environment and an uninterrupted power supply to the IT servers, in order to guarantee a high level of reliability. However, the constant activity of electronic equipments releases a large amount of heat, and requires a constant cooling. Thus the objective of this work is to study the physical phenomena involved in an operating datacenter, in order to optimize the process and to reduce its energy consumption. Using numerical simulation, we study the air flow and the heat transfers happening in the servers’ room. To quantify the impact of new generation servers on the cooling process, we propose a numerical model that simulates the behavior of “blade” server. Then, using a Proper Orthogonal Decomposition (POD) method linked to the software TRNSYS, we propose a new “transversal” model, that simulates a datacenter behavior from the servers to the cooling plant. This model is used to develop a new adaptive regulation strategy, which constantly optimizes the system in order to ensure a safe thermal environment, and provides large energy savings.
|
3 |
Uma estratÃgia de gerenciamento de infraestrutura de datacenters baseada em tÃcnicas de monitoramento distribuÃdo e controle centralizado / A Datacenter Infrastructure Management strategy based on distributed monitoring and centralized controlThiago Teixeira SÃ 29 August 2013 (has links)
A ComputaÃÃo em Nuvem desponta como um paradigma de utilizaÃÃo de recursos computacionais segundo o qual infraestrutura de hardware, software e plataformas para o desenvolvimento de novas aplicaÃÃes sÃo oferecidas como serviÃos disponÃveis em escala global. Tais serviÃos sÃo disponibilizados por meio de datacenters de larga escala onde à recorrente o emprego de tecnologias de virtualizaÃÃo para o uso compartilhado de recursos. Neste contexto, a gerÃncia eficiente da infraestrutura do datacenter pode levar a eduÃÃes significativas de seus custos de operaÃÃo. Este trabalho apresenta uma estratÃgia de gerenciamento de
infraestrutura de datacenters virtualizados que aplica tÃcnicas de monitoramento distribuÃdo combinadas a aÃÃes centralizadas de controle. Tal estratÃgia busca diminuir os efeitos de sobrecarga observados em modelos tradicionais de gerÃncia baseados em um nà controlador que acumula responsabilidades de controle e monitoramento. Por conseguinte, busca-se elevar o poder de escalabilidade da
infraestrutura e melhorar sua eciÃncia energÃtica sem comprometer a Qualidade de ServiÃo (QoS) oferecida ao usuÃrio final. A anÃlise de desempenho da estratÃgia
proposta à realizada atravÃs de mÃltiplos experimentos de simulaÃÃo realizados com ferramentas voltadas especificamente para a modelagem de nuvens computacionais
e com suporte à representaÃÃo do consumo energÃtico da infraestrutura simulada.
|
4 |
Theory and Practice in Cloud Datacenters with Distributed SchedulersAlshahrani, Reem Abdullah 06 August 2019 (has links)
No description available.
|
5 |
Scheduling with Space-Time Soft Constraints In Heterogeneous Cloud DatacentersTumanov, Alexey 01 August 2016 (has links)
Heterogeneity in modern datacenters is on the rise, in hardware resource characteristics, in workload characteristics, and in dynamic characteristics (e.g., a memoryresident copy of input data). As a result, which machines are assigned to a given job can have a significant impact. For example, a job may run faster on the same machine as its input data or with a given hardware accelerator, while still being runnable on other machines, albeit less efficiently. Heterogeneity takes on more complex forms as sets of resources differ in the level of performance they deliver, even if they consist of identical individual units, such as with rack-level locality. We refer to this as combinatorial heterogeneity. Mixes of jobs with strict SLOs on completion time and increasingly available runtime estimates in production datacenters deepen the challenge of matching the right resources to the right workloads at the right time. In this dissertation, we hypothesize that it is possible and beneficial to simultaneously leverage all of this information in the form of declaratively specified spacetime soft constraints. To accomplish this, we first design and develop our principal building block—a novel Space-Time Request Language (STRL). It enables the expression of jobs’ preferences and flexibility in a general, extensible way by using a declarative, composable, intuitive algebraic expression structure. Second, building on the generality of STRL, we propose an equally general STRL Compiler that automatically compiles STRL expressions into Mixed Integer Linear Programming (MILP) problems that can be aggregated and solved to maximize the overall value of shared cluster resources. These theoretical contributions form the foundation for the system we architect, called TetriSched, that instantiates our conceptual contributions: (a) declarative soft constraints, (b) space-time soft constraints, (c) combinatorial constraints, (d) orderless global scheduling, and (e) in situ preemption. We also propose a set of mechanisms that extend the scope and the practicality of TetriSched’s deployment by analyzing and improving on its scalability, enabling and studying the efficacy of preemption, and featuring a set of runtime mis-estimation handling mechanisms to address runtime prediction inaccuracy. In collaboration with Microsoft, we adapt some of these ideas as we design and implement a heterogeneity-aware resource reservation system called Aramid with support for ordinal placement preferences targeting deployment in production clusters at Microsoft scale. A combination of simulation and real cluster experiments with synthetic and production-derived workloads, a range of workload intensities, degrees of burstiness, preference strengths, and input inaccuracies support our hypothesis that leveraging space-time soft constraints (a) significantly improves scheduling quality and (b) is possible to achieve in a practical deployment.
|
6 |
Reducing the Cost of Operating a Datacenter NetworkCurtis, Andrew January 2012 (has links)
Datacenters are a significant capital expense for many enterprises. Yet, they are difficult to manage and are hard to design and maintain. The initial design of a datacenter network tends to follow vendor guidelines, but subsequent upgrades and expansions to it are mostly ad hoc, with equipment being upgraded piecemeal after its amortization period runs out and equipment acquisition is tied to budget cycles rather than changes in workload.
These networks are also brittle and inflexible. They tend to be manually managed, and cannot perform dynamic traffic engineering.
The high-level goal of this dissertation is to reduce the total cost of owning a datacenter by improving its network. To achieve this, we make the following contributions. First, we develop an automated, theoretically well-founded approach to planning cost-effective datacenter upgrades and expansions. Second, we propose a scalable traffic management framework for datacenter networks. Together, we show that these contributions can significantly reduce the cost of operating a datacenter network.
To design cost-effective network topologies, especially as the network expands over time, updated equipment must coexist with legacy equipment, which makes the network heterogeneous. However, heterogeneous high-performance network designs are not well understood. Our first step, therefore, is to develop the theory of heterogeneous Clos topologies. Using our theory, we propose an optimization framework, called LEGUP, which designs a heterogeneous Clos network to implement in a new or legacy datacenter. Although effective, LEGUP imposes a certain amount of structure on the network. To deal with situations when this is infeasible, our second contribution is a framework, called REWIRE, which using optimization to design unstructured DCN topologies. Our results indicate that these unstructured topologies have up to 100-500\% more bisection bandwidth than a fat-tree for the same dollar cost.
Our third contribution is two frameworks for datacenter network traffic engineering. Because of the multiplicity of end-to-end paths in DCN fabrics, such as Clos networks and the topologies designed by REWIRE, careful traffic engineering is needed to maximize throughput. This requires timely detection of elephant flows---flows that carry large amount of data---and management of those flows. Previously proposed approaches incur high monitoring overheads, consume significant switch resources, or have long detection times.
We make two proposals for elephant flow detection. First, in the Mahout framework, we suggest that such flows be detected by observing the end hosts' socket buffers, which provide efficient visibility of flow behavior. Second, in the DevoFlow framework, we add efficient stats-collection mechanisms to network switches. Using simulations and experiments, we show that these frameworks reduce traffic engineering overheads by at least an order of magnitude while still providing near-optimal performance.
|
7 |
Harnessing Data Parallel Hardware for Server WorkloadsAgrawal, Sandeep R. January 2015 (has links)
<p>Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on using general purpose processors, however these might not be the most efficient platforms for server workloads. Data parallel hardware achieves high energy efficiency by amortizing instruction costs across multiple data streams, and high throughput by enabling massive parallelism across independent threads. These benefits are considered traditionally applicable to scientific workloads, and common server tasks like page serving or search are considered unsuitable for a data parallel execution model.</p><p>Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. For a high enough arrival rate, a server has the opportunity to launch cohorts of similar requests on data parallel hardware, improving server performance and power/energy efficiency. We present a framework---called Rhythm---for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluation. </p><p>Similarity search is another ubiquitous server workload that involves identifying the nearest neighbors to a given query across a large number of points. We explore the performance, power and dollar benefits of using accelerators to perform similarity search for query cohorts in very high dimensions under tight deadlines, and demonstrate an implementation on GPUs that searches across a corpus of billions of documents and is significantly cheaper than commercial deployments. We show that with software and system modifications, data parallel designs can greatly outperform common task parallel implementations.</p> / Dissertation
|
8 |
Reducing the Cost of Operating a Datacenter NetworkCurtis, Andrew January 2012 (has links)
Datacenters are a significant capital expense for many enterprises. Yet, they are difficult to manage and are hard to design and maintain. The initial design of a datacenter network tends to follow vendor guidelines, but subsequent upgrades and expansions to it are mostly ad hoc, with equipment being upgraded piecemeal after its amortization period runs out and equipment acquisition is tied to budget cycles rather than changes in workload.
These networks are also brittle and inflexible. They tend to be manually managed, and cannot perform dynamic traffic engineering.
The high-level goal of this dissertation is to reduce the total cost of owning a datacenter by improving its network. To achieve this, we make the following contributions. First, we develop an automated, theoretically well-founded approach to planning cost-effective datacenter upgrades and expansions. Second, we propose a scalable traffic management framework for datacenter networks. Together, we show that these contributions can significantly reduce the cost of operating a datacenter network.
To design cost-effective network topologies, especially as the network expands over time, updated equipment must coexist with legacy equipment, which makes the network heterogeneous. However, heterogeneous high-performance network designs are not well understood. Our first step, therefore, is to develop the theory of heterogeneous Clos topologies. Using our theory, we propose an optimization framework, called LEGUP, which designs a heterogeneous Clos network to implement in a new or legacy datacenter. Although effective, LEGUP imposes a certain amount of structure on the network. To deal with situations when this is infeasible, our second contribution is a framework, called REWIRE, which using optimization to design unstructured DCN topologies. Our results indicate that these unstructured topologies have up to 100-500\% more bisection bandwidth than a fat-tree for the same dollar cost.
Our third contribution is two frameworks for datacenter network traffic engineering. Because of the multiplicity of end-to-end paths in DCN fabrics, such as Clos networks and the topologies designed by REWIRE, careful traffic engineering is needed to maximize throughput. This requires timely detection of elephant flows---flows that carry large amount of data---and management of those flows. Previously proposed approaches incur high monitoring overheads, consume significant switch resources, or have long detection times.
We make two proposals for elephant flow detection. First, in the Mahout framework, we suggest that such flows be detected by observing the end hosts' socket buffers, which provide efficient visibility of flow behavior. Second, in the DevoFlow framework, we add efficient stats-collection mechanisms to network switches. Using simulations and experiments, we show that these frameworks reduce traffic engineering overheads by at least an order of magnitude while still providing near-optimal performance.
|
9 |
Études du refroidissement par free cooling indirect d’un bâtiment exothermique : application au centre de données / Indirect free cooling studies in an exothermic building : application to data centersKaced, Yazid 06 September 2018 (has links)
Un centre de données est un site comportant des salles hébergeant un grand nombre d’équipements informatiques. Le fonctionnement de ces équipements informatiques induit des apports de chaleur très conséquents qui doivent être compensés par des systèmes de refroidissement. En effet, les normes imposent des plages restreintes de température et d’humidité dans les salles qui induisent de fortes consommations d’énergie. Il est donc nécessaire de développer et d’optimiser des solutions moins énergivores. Le refroidissement par free cooling consiste à refroidir les équipements en exploitant les conditions climatiques favorables. Les travaux réalisés durant cette thèse s’appuient sur une expérimentation menée dans des conditions climatiques réelles au sein d’un bâtiment. Il s’agit d’étudier le refroidissement de baies informatiques. Pour mettre en place un refroidissement par « free cooling » indirect, la configuration du bâtiment a été modifiée au cours de la thèse et une instrumentation conséquente mise en place. Les objectifs sont de déterminer à partir de séquences de mesures des coefficients de performance, de développer et de valider un modèle numérique destiné à la prédiction du comportement thermo-aéraulique en usage de ce type de solution. Dans un premier temps, des expériences sont menées avec une puissance dissipée à l’intérieur du bâtiment et un refroidissement assuré uniquement par une circulation de l’air extérieur au sein de trois parois. Des modifications ont ensuite été apportées au sein du bâtiment. Une circulation d’air en circuit fermé a été créée à l’intérieure afin de refroidir les baies par un flux d’air traversant. Afin de disposer d’une base de données probante, de nombreuses séquences de mesures avec une ou plusieurs baies sont réalisées dans différentes conditions. La variation des paramètres opératoires permet de bien appréhender le fonctionnement de l’installation et définir les paramètres d’optimisation énergétique. Les modèles numériques sont développés par le biais de TRNSYS / TRNFLOW. La confrontation des simulations à des mesures montre la pertinence de la démarche mise en œuvre. / A data center is a warehouse that contains telecommunication equipment, network infrastructure, servers, and computers. This equipment leads to a very high heat dissipation which must be compensated by the use of cooling systems. Telecommunication standards impose restricted climatic ranges (temperatures and humidity) leading to a very high energy consumption devote to air conditioning. The reduction of this energy consumption constitutes a real challenge which should be raised and solved. Many cooling solutions are proposed as the free cooling solution, which consists in cooling equipment by using external air in propitious climatic conditions. The work carried out during this thesis is based on experiments conducted within a building in real climatic conditions in order to study the cooling of telecom cabinets. During this study, the building configuration was modified, an indirect "free cooling" system was set up and a significant instrumentation was implemented. The objectives are to establish performance factors issued from measurements, to develop and to validate a numerical model in order to predict the thermoaeraulic behavior for this type of solution. Initially, experiments are carried out with a power dissipated inside the building and a cooling provided only by an outside air circulation. Then, significant modifications were made into the building to introduce an internal air circulation in a closed loop in order to evacuate the heat dissipated inside cabinets by a crossing airflow. In order to get a convincing database, measurements were conducted by using one and then several cabinets in different conditions. Modifications are made to operating parameters in order to better understand the installation operation and to define the energy optimization parameters. Numerical models are developed through TRNSYS / TRNFLOW. The confrontation of simulations with measurements shows the implemented approach relevance.
|
10 |
Contributions à la mise en place d'une infrastructure de Cloud Computing à large échelle / Contributions to massively distributed Cloud Computing infrastructuresPastor, Jonathan 18 October 2016 (has links)
La croissance continue des besoins en puissance de calcul a conduit au triomphe du modèle de Cloud Computing. Des clients demandeurs en puissance de calcul vont s’approvisionner auprès de fournisseurs d’infrastructures de Cloud Computing, mises à disposition via Internet. Pour réaliser des économies d’échelles, ces infrastructures sont toujours plus grandes et concentrées en quelques endroits, conduisant à des problèmes tels que l’approvisionnement en énergie, la tolérance aux pannes et l’éloignement des utilisateurs. Cette thèse s’est intéressée à la mise en place d’un système d’IaaS massivement distribué et décentralisé exploitant un réseau de micros centres de données déployés sur la dorsale Internet, utilisant une version d’OpenStack revisitée pendant cette thèse autour du support non intrusif de bases de données non relationnelles. Des expériences sur Grid’5000 ont montré des résultats intéressants sur le plan des performances, toutefois limités par le fait qu’OpenStack ne tirait pas avantage nativement d’un fonctionnement géographiquement réparti. Nous avons étudié la prise en compte de la localité réseau pour améliorer les performances des services distribués en favorisant les collaborations proches. Un prototype de l’algorithme de placement de machines virtuelles DVMS, fonctionnant sur une topologie non structurée basée sur l’algorithme Vivaldi, a été validé sur Grid’5000. Ce prototype a fait l’objet d’un prix scientifique lors de l’école de printemps Grid’50002014. Enfin, ces travaux nous ont amenés à participer au développement du simulateur VMPlaceS. / The continuous increase of computing power needs has favored the triumph of the Cloud Computing model. Customers asking for computing power will receive supplies via Internet resources hosted by providers of Cloud Computing infrastructures. To make economies of scale, Cloud Computing that are increasingly large and concentrated in few attractive places, leading to problems such energy supply, fault tolerance and the fact that these infrastructures are far from most of their end users. During this thesis we studied the implementation of an fully distributed and decentralized IaaS system operating a network of micros data-centers deployed in the Internet backbone, using a modified version of OpenStack that leverages non relational databases. A prototype has been experimentally validated onGrid’5000, showing interesting results, however limited by the fact that OpenStack doesn’t take advantage of a geographically distributed functioning. Thus, we focused on adding the support of network locality to improve performance of Cloud Computing services by favoring collaborations between close nodes. A prototype of the DVMS algorithm, working with an unstructured topology based on the Vivaldi algorithm, has been validated on Grid’5000. This prototype got the first prize at the large scale challenge of the Grid’5000 spring school in 2014. Finally, the work made with DVMS enabled us to participate at the development of the VMPlaceS simulator.
|
Page generated in 0.1026 seconds