• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 34
  • 34
  • 11
  • 10
  • 8
  • 8
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

A compiler-based leakage reduction technique by power-gating functional units in embedded microprocessors

Roy, Soumyaroop 01 June 2006 (has links)
Power-gating is a technique investigated widely for reducing leakage energy in the functional units of microprocessors at the architectural level. Effective power-gating involves deactivating idle functional units for sustained periods incurring little or no performance degradation. Accurate prediction of long idle periods is essential, which, in turn, depends on the application program characteristics.In this thesis, we propose a compiler-based leakage reduction technique for embedded architectures by exploiting the well-known attributes of embedded applications, namely, small code size and intensive loops. From the control flow graph (CFG) representation of the source program, we construct a forest of loop hierarchy trees (LHTs), which capture the nesting loop properties of the program. As an LHT satisfies the partial ordering on the loop nesting, we exploit this property to identify maximal subgraphs (of functional unit idleness) in the original program. For each subgraph so found, a sleep instruction is introduced at the entry point of the corresponding code segement, thus optimizing the number of sleep instructions. The sleep instruction has one operand, a bit-vector comprised of ON/OFF controlbits for all functional units in the data path. Our target architecture is a modified ARM processor model comprising of functional units with power-gating ability. We obtained an average leakage energy reduction of 34.1% for 12 benchmarks chosen from the MiBench suite, with range of 19.5% and standard deviation of 6.5%.
22

Topics in Power and Performance Optimization of Embedded Systems

January 2011 (has links)
abstract: The ubiquity of embedded computational systems has exploded in recent years impacting everything from hand-held computers and automotive driver assistance to battlefield command and control and autonomous systems. Typical embedded computing systems are characterized by highly resource constrained operating environments. In particular, limited energy resources constrain performance in embedded systems often reliant on independent fuel or battery supplies. Ultimately, mitigating energy consumption without sacrificing performance in these systems is paramount. In this work power/performance optimization emphasizing prevailing data centric applications including video and signal processing is addressed for energy constrained embedded systems. Frameworks are presented which exchange quality of service (QoS) for reduced power consumption enabling power aware energy management. Power aware systems provide users with tools for precisely managing available energy resources in light of user priorities, extending availability when QoS can be sacrificed. Specifically, power aware management tools for next generation bistable electrophoretic displays and the state of the art H.264 video codec are introduced. The multiprocessor system on chip (MPSoC) paradigm is examined in the context of next generation many-core hand-held computing devices. MPSoC architectures promise to breach the power/performance wall prohibiting advancement of complex high performance single core architectures. Several many-core distributed memory MPSoC architectures are commercially available, while the tools necessary to effectively tap their enormous potential remain largely open for discovery. Adaptable scalability in many-core systems is addressed through a scalable high performance multicore H.264 video decoder implemented on the representative Cell Broadband Engine (CBE) architecture. The resulting agile performance scalable system enables efficient adaptive power optimization via decoding-rate driven sleep and voltage/frequency state management. The significant problem of mapping applications onto these architectures is additionally addressed from the perspective of instruction mapping for limited distributed memory architectures with a code overlay generator implemented on the CBE. Finally runtime scheduling and mapping of scalable applications in multitasking environments is addressed through the introduction of a lightweight work partitioning framework targeting streaming applications with low latency and near optimal throughput demonstrated on the CBE. / Dissertation/Thesis / Ph.D. Computer Science 2011
23

Leakage Temperature Dependency Aware Real-Time Scheduling for Power and Thermal Optimization

Chaturvedi, Vivek 26 March 2013 (has links)
Catering to society’s demand for high performance computing, billions of transistors are now integrated on IC chips to deliver unprecedented performances. With increasing transistor density, the power consumption/density is growing exponentially. The increasing power consumption directly translates to the high chip temperature, which not only raises the packaging/cooling costs, but also degrades the performance/reliability and life span of the computing systems. Moreover, high chip temperature also greatly increases the leakage power consumption, which is becoming more and more significant with the continuous scaling of the transistor size. As the semiconductor industry continues to evolve, power and thermal challenges have become the most critical challenges in the design of new generations of computing systems. In this dissertation, we addressed the power/thermal issues from the system-level perspective. Specifically, we sought to employ real-time scheduling methods to optimize the power/thermal efficiency of the real-time computing systems, with leakage/ temperature dependency taken into consideration. In our research, we first explored the fundamental principles on how to employ dynamic voltage scaling (DVS) techniques to reduce the peak operating temperature when running a real-time application on a single core platform. We further proposed a novel real-time scheduling method, “M-Oscillations” to reduce the peak temperature when scheduling a hard real-time periodic task set. We also developed three checking methods to guarantee the feasibility of a periodic real-time schedule under peak temperature constraint. We further extended our research from single core platform to multi-core platform. We investigated the energy estimation problem on the multi-core platforms and developed a light weight and accurate method to calculate the energy consumption for a given voltage schedule on a multi-core platform. Finally, we concluded the dissertation with elaborated discussions of future extensions of our research.
24

Ant colony optimisation algorithms for solving multi-objective power-aware metrics for mobile ad hoc networks

Constantinou, Demetrakis 01 July 2011 (has links)
A mobile ad hoc network (MANET) is an infrastructure-less multi-hop network where each node communicates with other nodes directly or indirectly through intermediate nodes. Thus, all nodes in a MANET basically function as mobile routers participating in some routing protocol required for deciding and maintaining the routes. Since MANETs are infrastructure-less, self-organizing, rapidly deployable wireless networks, they are highly suitable for applications such as military tactical operations, search and rescue missions, disaster relief operations, and target tracking. Building such ad-hoc networks poses a significant technical challenge because of energy constraints and specifically in relation to the application of wireless network protocols. As a result of its highly dynamic and distributed nature, the routing layer within the wireless network protocol stack, presents one of the key technical challenges in MANETs. In particular, energy efficient routing may be the most important design criterion for MANETs since mobile nodes are powered by batteries with limited capacity and variable recharge frequency, according to application demand. In order to conserve power it is essential that a routing protocol be designed to guarantee data delivery even should most of the nodes be asleep and not forwarding packets to other nodes. Load distribution constitutes another important approach to the optimisation of active communication energy. Load distribution enables the maximisation of the network lifetime by facilitating the avoidance of over-utilised nodes when a route is in the process of being selected. Routing algorithms for mobile networks that attempt to optimise routes while at- tempting to retain a small message overhead and maximise the network lifetime has been put forward. However certain of these routing protocols have proved to have a negative impact on node and network lives by inadvertently over-utilising the energy resources of a small set of nodes in favour of others. The conservation of power and careful sharing of the cost of routing packets would ensure an increase in both node and network lifetimes. This thesis proposes simultaneously, by using an ant colony optimisation (ACO) approach, to optimise five power-aware metrics that do result in energy-efficient routes and also to maximise the MANET's lifetime while taking into consideration a realistic mobility model. By using ACO algorithms a set of optimal solutions - the Pareto-optimal set - is found. This thesis proposes five algorithms to solve the multi-objective problem in the routing domain. The first two algorithms, namely, the energy e±ciency for a mobile network using a multi-objective, ant colony optimisation, multi-pheromone (EEMACOMP) algorithm and the energy efficiency for a mobile network using a multi-objective, ant colony optimisation, multi-heuristic (EEMACOMH) algorithm are both adaptations of multi-objective, ant colony optimisation algorithms (MOACO) which are based on the ant colony system (ACS) algorithm. The new algorithms are constructive which means that in every iteration, every ant builds a complete solution. In order to guide the transition from one state to another, the algorithms use pheromone and heuristic information. The next two algorithms, namely, the energy efficiency for a mobile network using a multi-objective, MAX-MIN ant system optimisation, multi-pheromone (EEMMASMP) algorithm and the energy efficiency for a mobile network using a multi-objective, MAX- MIN ant system optimisation, multi-heuristic (EEMMASMH) algorithm, both solve the above multi-objective problem by using an adaptation of the MAX-MIN ant system optimisation algorithm. The last algorithm implemented, namely, the energy efficiency for a mobile network using a multi-objective, ant colony optimisation, multi-colony (EEMACOMC) algorithm uses a multiple colony ACO algorithm. From the experimental results the final conclusions may be summarised as follows:<ul><li> Ant colony, multi-objective optimisation algorithms are suitable for mobile ad hoc networks. These algorithms allow for high adaptation to frequent changes in the topology of the network. </li><li> All five algorithms yielded substantially better results than the non-dominated sorting genetic algorithm (NSGA-II) in terms of the quality of the solution. </li><li> All the results prove that the EEMACOMP outperforms the other four ACO algorithms as well as the NSGA-II algorithm in terms of the number of solutions, closeness to the true Pareto front and diversity. </li></ul> / Thesis (PhD)--University of Pretoria, 2010. / Computer Science / unrestricted
25

Mac Layer And Routing Protocols For Wireless Ad Hoc Networks With Asymmetric Links And Performance Evaluation Studies

Wang, Guoqiang 01 January 2007 (has links)
In a heterogeneous mobile ad hoc network (MANET), assorted devices with different computation and communication capabilities co-exist. In this thesis, we consider the case when the nodes of a MANET have various degrees of mobility and range, and the communication links are asymmetric. Many routing protocols for ad hoc networks routinely assume that all communication links are symmetric, if node A can hear node B and node B can also hear node A. Most current MAC layer protocols are unable to exploit the asymmetric links present in a network, thus leading to an inefficient overall bandwidth utilization, or, in the worst case, to lack of connectivity. To exploit the asymmetric links, the protocols must deal with the asymmetry of the path from a source node to a destination node which affects either the delivery of the original packets, or the paths taken by acknowledgments, or both. Furthermore, the problem of hidden nodes requires a more careful analysis in the case of asymmetric links. MAC layer and routing protocols for ad hoc networks with asymmetric links require a rigorous performance analysis. Analytical models are usually unable to provide even approximate solutions to questions such as end-to-end delay, packet loss ratio, throughput, etc. Traditional simulation techniques for large-scale wireless networks require vast amounts of storage and computing cycles rarely available on single computing systems. In our search for an effective solution to study the performance of wireless networks we investigate the time-parallel simulation. Time-parallel simulation has received significant attention in the past. The advantages, as well as, the theoretical and practical limitations of time-parallel simulation have been extensively researched for many applications when the complexity of the models involved severely limits the applicability of analytical studies and is unfeasible with traditional simulation techniques. Our goal is to study the behavior of large systems consisting of possibly thousands of nodes over extended periods of time and obtain results efficiently, and time-parallel simulation enables us to achieve this objective. We conclude that MAC layer and routing protocols capable of using asymmetric links are more complex than traditional ones, but can improve the connectivity, and provide better performance. We are confident that approximate results for various performance metrics of wireless networks obtained using time-parallel simulation are sufficiently accurate and able to provide the necessary insight into the inner workings of the protocols.
26

Large scale platform : Instantiable models and algorithmic design of communication schemes

Uznanski, Przemyslaw 11 October 2013 (has links) (PDF)
The increasing popularity of Internet bandwidth-intensive applications prompts us to consider followingproblem: How to compute efficient collective communication schemes on large-scale platform?The issue of designing a collective communication in the context of a large scale distributed networkis a difficult and a multi-level problem. A lot of solutions have been extensively studied andproposed. But a new, comprehensive and systematic approach is required, that combines networkmodels and algorithmic design of solutions.In this work we advocate the use of models that are able to capture real-life network behavior,but also are simple enough that a mathematical analysis of their properties and the design of optimalalgorithms is achievable.First, we consider the problem of the measuring available bandwidth for a given point-topointconnection. We discuss how to obtain reliable datasets of bandwidth measurements usingPlanetLab platform, and we provide our own datasets together with the distributed software usedto obtain it. While those datasets are not a part of our model per se, they are necessary whenevaluating the performance of various network algorithms. Such datasets are common for latencyrelatedproblems, but very rare when dealing with bandwidth-related ones.Then, we advocate for a model that tries to accurately capture the capabilities of a network,named LastMile model. This model assumes that essentially the congestion happens at the edgesconnecting machines to the wide Internet. It has a natural consequence in a bandwidth predictionalgorithm based on this model. Using datasets described earlier, we prove that this algorithm is ableto predict with an accuracy comparable to best known network prediction algorithm (DistributedMatrix Factorization) available bandwidth between two given nodes. While we were unable toimprove upon DMF algorithm in the field of point-to-point prediction, we show that our algorithmhas a clear advantage coming from its simplicity, i.e. it naturally extends to the network predictionsunder congestion scenario (multiple connections sharing a bandwidth over a single link). We areactually able to show, using PlanetLab datasets, that LastMile prediction is better in such scenarios.In the third chapter, we propose new algorithms for solving the large scale broadcast problem.We assume that the network is modeled by the LastMile model. We show that under thisassumption, we are able to provide algorithms with provable, strong approximation ratios. Takingadvantage of the simplicity and elasticity of the model, we can even extend it, so that it captures theidea of connectivity artifacts, in our case firewalls preventing some nodes to communicate directlybetween each other. In the extended case we are also able to provide approximation algorithmswith provable performance.The chapters 1 to 3 form three successful steps of our program to develop from scratch amathematical network communication model, prove it experimentally, and show that it can beapplied to develop algorithms solving hard problems related to design of communication schemesin networks.In the chapter 4 we show how under different network cost models, using some simplifyingassumptions on the structure of network and queries, one can design very efficient communicationschemes using simple combinatorial techniques. This work is complementary to the previous chapter in the sense that previously when designing communication schemes, we assumed atomicityof connections, i.e. that we have no control over routing of simple connections. In chapter 4 weshow how to solve the problem of an efficient routing of network request, given that we know thetopology of the network. It shows the importance of instantiating the parameters and the structureof the network in the context of designing efficient communication schemes.
27

Διαχείριση κοινόχρηστων πόρων σε πολυεπεξεργαστικά συστήματα ενός ολοκληρωμένου

Πετούμενος, Παύλος 06 October 2011 (has links)
Στην παρούσα διατριβή προτείνονται μέθοδοι διαχείρισης των κοινόχρηστων πόρων σε υπολογιστικά συστήματα όπου πολλαπλοί επεξεργαστές μοιράζονται το ίδιο ολοκληρωμένο (Chip Multiprocessors – CMPs). Ενώ μέχρι πρόσφατα ο σχεδιασμός ενός υπολογιστικού συστήματος στόχευε στην ικανοποίηση των απαιτήσεων μόνο μίας εφαρμογής ανά χρονική περίοδο, τώρα πια απαιτείται και η εξισορρόπηση των απαιτήσεων διαφορετικών εφαρμογών που ανταγωνίζονται για την κατοχή των ίδιων πόρων. Σε πολλές περιπτώσεις, όμως, αυτό δεν αρκεί από μόνο του. Ακόμη και αν επιτευχθεί κάποιος ιδανικός διαμοιρασμός του πόρου, αν δεν βελτιστοποιηθεί ο τρόπος με τον οποίο χρησιμοποιούν οι επεξεργαστές τον κοινόχρηστο πόρο, δεν θα καταφέρει να εξυπηρετήσει ικανοποιητικά το αυξημένο φορτίο. Για να αντιμετωπιστούν τα προβλήματα που πηγάζουν από τον διαμοιρασμό των κοινόχρηστων πόρων, στην παρούσα εργασία προτείνονται τρεις εναλλακτικοί μηχανισμοί διαχείρισης. Η πρώτη μεθοδολογία εισάγει μία νέα θεωρητική μοντελοποίηση του διαμοιρασμού της κρυφής μνήμης, η οποία μπορεί να χρησιμοποιηθεί παράλληλα με την εκτέλεση των προγραμμάτων που διαμοιράζονται την κρυφή μνήμη. Η μεθοδολογία αξιοποιεί στην συνέχεια αυτήν την μοντελοποίηση, για να ελέγξει τον διαμοιρασμό της κρυφής μνήμης και να επιτύχει δικαιοσύνη στο πως κατανέμεται ο χώρος της κρυφής μνήμης μεταξύ των επεξεργαστών. Η δεύτερη μεθοδολογία παρουσιάζει μία νέα τεχνική για την πρόβλεψη της τοπικότητας των προσπελάσεων της κρυφής μνήμης. Καθώς η τοπικότητα είναι η βασική παράμετρος που καθορίζει την χρησιμότητα των δεδομένων της κρυφής μνήμης, χρησιμοποιώντας αυτήν την τεχνική πρόβλεψης μπορούν να οδηγηθούν μηχανισμοί διαχείρισης που βελτιώνουν την αξιοποίηση του χώρου της κρυφής μνήμης. Στα πλαίσια της μεθοδολογίας παρουσιάζουμε έναν τέτοιο μηχανισμό, ο οποίος στοχεύει στην ελαχιστοποίηση των αστοχιών της κρυφής μνήμης μέσω μίας νέας πολιτικής αντικατάστασης. Η τελευταία μεθοδολογία που παρουσιάζεται είναι μία μεθοδολογία για την μείωση της κατανάλωσης ενέργειας της ουράς εντολών, που είναι μία από τις πιο ενεργειακά απαιτητικές δομές του επεξεργαστή. Στα πλαίσια της μεθοδολογίας, δείχνεται ότι το κλειδί για την αποδοτική μείωση της κατανάλωσης ενέργειας της ουράς εντολών βρίσκεται στην αλληλεπίδραση της με το υποσύστημα μνήμης. Με βάση αυτό το συμπέρασμα, παρουσιάζουμε έναν νέο μηχανισμό δυναμικής διαχείρισης του μεγέθους της ουράς εντολών, ο οποίος συνδυάζει επιθετική μείωση της κατανάλωσης ενέργειας του επεξεργαστή με διατήρηση της υψηλής απόδοσής του. / This dissertation proposes methodologies for the management of shared resources in chip multi-processors (CMP). Until recently, the design of a computing system had to satisfy the computational and storage needs of a single program during each time period. Now instead, the designer has to balance the, perhaps conflicting, needs of multiple programs competing for the same resources. But, in many cases, even this is not enough. Even if we could invent a perfect way to manage sharing, without optimizing the way that each processor uses the shared resource, the resource could not deal efficiently with the increased load. In order to handle the negative effects of resource sharing, this dissertation proposes three management mechanisms. The first one introduces a novel theoretical model of the sharing of the shared cache, which can be used at run-time. Furthermore, out methodology uses the model to control sharing and to achieve a sense of justice in the way the cache is shared among the processors. Our second methodology presents a new technique for predicting the locality of cache accesses. Since locality determines, almost entirely, the usefulness of cache data, our technique can be used to drive any management mechanism which strives to improve the efficiency of the cache. As part of our methodology, we present such a mechanism, a new cache replacement policy which tries to minimize cache misses by near-optimal replacement decisions. The last methodology presented in this dissertation, targets the energy consumption of the processor. To that end, our methodology shows that the key to reducing the power consumption of the Issue Queue, without disproportional performance degradation, lies at the interaction of the Issue Queue with the memory subsystem: as long as the management of the Issue Queue doesn’t reduce the utilization of the memory subsystem, the effects of the management on the processor’s performance will be minimal. Based on this conclusion, we introduce a new mechanism for dynamically resizing the Issue Queue, which achieves aggressive downsizing and energy savings with almost no performance degradation.
28

Study and Reduction of Power Consumption during Test of Digital Circuits / Etude et Réduction de la Consommation de Puissance Durant le Test de Circuits Digitaux

Wu, Fangmei 12 October 2011 (has links)
Cette thèse concerne l'étude et la réduction de la consommation de puissance durant le test par scan des circuits digitaux. Afin de détecter les défauts de délai de transition, les deux principales structures sont utilisés dans la pratique: Launch-Off-Shift (LOS) et launch-Off-Capture (LOC). L'ensemble des travaux réalises montre que le test LOS est plus efficace que le test LOC en terme de couverture de fautes de transition et la longueur de test. Toutefois, le test LOS nécessite une puissance plus élevée lors du launch-to-capture (LTC) du cycle, notamment en terme de consommation de puissance de pic. Ainsi, nous proposons une nouvelle approche de génération de vecteurs de test LOS basée sur la consommation. La technique proposée est capable de réduire et d'évaluer la puissance de pic de test se rapprochant le plus possible de la puissance fonctionnelle. Les avantages qui en résultent permettent de résoudre le problème lié à la perte de rendement et de s'abstenir du test se produisant lorsque la puissance de test est trop faible par rapport à la puissance fonctionnelle. / This thesis relates to study and reduction of power consumption during at-speed scan delay testing for digital circuits. To detect transition delay faults, two main testing schemes are used in practice: Launch-Off-Shift (LOS) and Launch-Off-Capture (LOC). In this work, we prove that LOS testing is more efficient than LOC testing in terms of transition fault coverage (TFC) and test length. However, LOS presents higher power during the launch-to-capture (LTC) cycle, especially in terms of peak power. For this purpose, we propose a novel power-aware test pattern generation technique for LOS testing. The proposed approach is able to reduce and map the test peak power as close as possible to the functional power. The important feature of this framework is that, in additional to solving the yield loss problem, it also avoids test escape that may occur when test power is too much reduced compared to functional power.
29

Large scale platform : Instantiable models and algorithmic design of communication schemes / Modélisation des communications sur plates-formes à grande echelles

Uznanski, Przemyslaw 11 October 2013 (has links)
La popularité croissante des applications Internet très gourmandes en bande passante (P2P, streaming,...) nous pousse à considérer le problème suivant :Comment construire des systèmes de communications collectives efficaces sur une plateforme à grande échelle ? Le développement de schéma de communications collectives dans le cadre d'un réseau distribué à grande échelle est une tâche difficile, qui a été largement étudiée et dont de multiples solutions ont été proposées. Toutefois, une nouvelle approche globale et systématique est nécessaire, une approche qui combine des modèles de réseaux et la conception algorithmique.Dans ce mémoire nous proposons l'utilisation de modèles capables de capturer le comportement d'un réseau réel et suffisamment simples pour que leurs propriétés mathématiques puissentêtre étudiées et pour qu'il soit possible de créer des algorithmesoptimaux. Premièrement, nous considérons le problème d'évaluation de la bande passante disponible pour une connexion point-à-point donnée. Nousétudions la façon d'obtenir des jeux de données de bande passante, utilisant plateforme PlanetLab. Nous présentons aussi nos propres jeux de données, jeux obtenus avec bedibe, un logiciel que nous avons développé. Ces données sont nécessaires pour évaluer les performances des différents algorithmesde réseau. Bien qu'on trouve de nombreux jeux de données de latence,les jeux de données de bande passante sont très rares. Nous présentons ensuite un modèle, appelé LastMile, qui estime la bande passante. En profitant des jeux de données décrits précédemment, nous montrons que cet algorithme est capable de prédire la bande passante entre deux noeuds donnés avec une précision comparable au meilleur algorithme connu de prédiction (DMF). De plus le modèle LastMile s'étend naturellement aux prédictions dans le scénario de congestion (plusieurs connexions partageant un même lien). Nous sommes effectivement en mesure de démontrer, à l'aide des ensembles de données PlanetLab, que la prédiction LastMile est préférable dans des tels scénarios.Dans le troisième chapitre, nous proposons des nouveaux algorithmes pour résoudre le problème de diffusion. Nous supposons que le réseau est modélisé par le modèle LastMile. Nous montrons que, sous cette hypothèse, nous sommes en mesure de fournir des algorithmes avec des ratios d'approximation élevés. De plus nous étendons le modèle LastMile, de manière à y intégrer des artéfacts de connectivité, dans notre cas ce sont des firewalls qui empêchent certains nœuds de communiquer directement entre eux. Dans ce dernier cas, nous sommes également en mesure de fournir des algorithmes d'approximation avec des garanties de performances prouvables. Les chapitres 1 à 3 forment les trois étapes accomplies de notre programme qui visent trois buts. Premièrement, développer à partir dezéro un modèle de réseau de communication. Deuxièmement, prouver expérimentalement sa performance. Troisièmement, montrer qu'il peut être utilisé pour développer des algorithmes qui résolvent les problèmes de communications collectives. Dans le 4e chapitre, nous montrons comment on peut concevoir dessystèmes de communication efficaces, selon différents modèles decoûts, en utilisant des techniques combinatoires,tout en utilisant des hypothèses simplificatrices sur la structure duréseau et les requêtes. Ce travail est complémentaire au chapitre précédent puisque auparavant, nous avons adopté l'hypothèse que les connectionsétaient autonomes (i.e. nous n'avons aucun contrôle sur le routage des connexions simples). Dans le chapitre 4, nous montrons comment résoudre le problème du routage économe en énergie, étant donnée une topologie fixée. / The increasing popularity of Internet bandwidth-intensive applications prompts us to consider followingproblem: How to compute efficient collective communication schemes on large-scale platform?The issue of designing a collective communication in the context of a large scale distributed networkis a difficult and a multi-level problem. A lot of solutions have been extensively studied andproposed. But a new, comprehensive and systematic approach is required, that combines networkmodels and algorithmic design of solutions.In this work we advocate the use of models that are able to capture real-life network behavior,but also are simple enough that a mathematical analysis of their properties and the design of optimalalgorithms is achievable.First, we consider the problem of the measuring available bandwidth for a given point-topointconnection. We discuss how to obtain reliable datasets of bandwidth measurements usingPlanetLab platform, and we provide our own datasets together with the distributed software usedto obtain it. While those datasets are not a part of our model per se, they are necessary whenevaluating the performance of various network algorithms. Such datasets are common for latencyrelatedproblems, but very rare when dealing with bandwidth-related ones.Then, we advocate for a model that tries to accurately capture the capabilities of a network,named LastMile model. This model assumes that essentially the congestion happens at the edgesconnecting machines to the wide Internet. It has a natural consequence in a bandwidth predictionalgorithm based on this model. Using datasets described earlier, we prove that this algorithm is ableto predict with an accuracy comparable to best known network prediction algorithm (DistributedMatrix Factorization) available bandwidth between two given nodes. While we were unable toimprove upon DMF algorithm in the field of point-to-point prediction, we show that our algorithmhas a clear advantage coming from its simplicity, i.e. it naturally extends to the network predictionsunder congestion scenario (multiple connections sharing a bandwidth over a single link). We areactually able to show, using PlanetLab datasets, that LastMile prediction is better in such scenarios.In the third chapter, we propose new algorithms for solving the large scale broadcast problem.We assume that the network is modeled by the LastMile model. We show that under thisassumption, we are able to provide algorithms with provable, strong approximation ratios. Takingadvantage of the simplicity and elasticity of the model, we can even extend it, so that it captures theidea of connectivity artifacts, in our case firewalls preventing some nodes to communicate directlybetween each other. In the extended case we are also able to provide approximation algorithmswith provable performance.The chapters 1 to 3 form three successful steps of our program to develop from scratch amathematical network communication model, prove it experimentally, and show that it can beapplied to develop algorithms solving hard problems related to design of communication schemesin networks.In the chapter 4 we show how under different network cost models, using some simplifyingassumptions on the structure of network and queries, one can design very efficient communicationschemes using simple combinatorial techniques. This work is complementary to the previous chapter in the sense that previously when designing communication schemes, we assumed atomicityof connections, i.e. that we have no control over routing of simple connections. In chapter 4 weshow how to solve the problem of an efficient routing of network request, given that we know thetopology of the network. It shows the importance of instantiating the parameters and the structureof the network in the context of designing efficient communication schemes.
30

I/O Aware Power Shifting

Savoie, Lee, Lowenthal, David K., Supinski, Bronis R. de, Islam, Tanzima, Mohror, Kathryn, Rountree, Barry, Schulz, Martin 05 1900 (has links)
Power limits on future high-performance computing (HPC) systems will constrain applications. However, HPC applications do not consume constant power over their lifetimes. Thus, applications assigned a fixed power bound may be forced to slow down during high-power computation phases, but may not consume their full power allocation during low-power I/O phases. This paper explores algorithms that leverage application semantics-phase frequency, duration and power needs-to shift unused power from applications in I/O phases to applications in computation phases, thus improving system-wide performance. We design novel techniques that include explicit staggering of applications to improve power shifting. Compared to executing without power shifting, our algorithms can improve average performance by up to 8% or improve performance of a single, high-priority application by up to 32%.

Page generated in 0.0572 seconds