Spelling suggestions: "subject:"schedule""
61 |
Virtual time-aware virtual machine systemsYoginath, Srikanth B. 27 August 2014 (has links)
Discrete dynamic system models that track, maintain, utilize, and evolve virtual time are referred to as virtual time systems (VTS). The realization of VTS using virtual machine (VM) technology offers several benefits including fidelity, scalability, interoperability, fault tolerance and load balancing. The usage of VTS with VMs appears in two ways: (a) VMs within VTS, and (b) VTS over VMs. The former is prevalent in high-fidelity cyber infrastructure simulations and cyber-physical system simulations, wherein VMs form a crucial component of VTS. The latter appears in the popular Cloud computing services, where VMs are offered as computing commodities and the VTS utilizes VMs as parallel execution platforms.
Prior to our work presented here, the simulation community using VM within VTS (specifically, cyber infrastructure simulations) had little awareness of the existence of a fundamental virtual time-ordering problem. The correctness problem was largely unnoticed and unaddressed because of the unrecognized effects of fair-share multiplexing of VMs to realize virtual time evolution of VMs within VTS. The dissertation research reported here demonstrated the latent incorrectness of existing methods, defined key correctness benchmarks, quantitatively measured the incorrectness, proposed and implemented novel algorithms to overcome incorrectness, and optimized the solutions to execute without a performance penalty. In fact our novel, correctness-enforcing design yields better runtime performance than the traditional (incorrect) methods.
Similarly, the VTS execution over VM platforms such as Cloud computing services incurs large performance degradation, which was not known until our research uncovered the fundamental mismatch between the scheduling needs of VTS execution and those of traditional parallel workloads. Consequently, we designed a novel VTS-aware hypervisor scheduler and showed significant performance gains in VTS execution over VM platforms. Prior to our work, the performance concern of VTS over VM was largely unaddressed due to the absence of an understanding of execution policy mismatch between VMs and VTS applications. VTS follows virtual-time order execution whereas the conventional VM execution follows fair-share policy. Our research quantitatively uncovered the exact cause of poor performance of VTS in VM platforms. Moreover, we proposed and implemented a novel virtual time-aware execution methodology that relieves the degradation and provides over an order of magnitude faster execution than the traditional virtual time-unaware execution.
|
62 |
A Preliminary Exploration of Memory Controller Policies on Smartphone WorkloadsNarancic, Goran 26 November 2012 (has links)
This thesis explores memory performance for smartphone workloads. We design a Video Conference Workload (VCW) to model typical smartphone usage. We describe a trace-based methodology which uses a software implementation to mimic the behaviour of specialised hardware accelerators. Our methodology stores dataflow information from the original application to maintain the relationships between requests.
We first study seven address mapping schemes with our VCW, using a first-ready, first-come-first-served (FR-FCFS) memory scheduler. Our results show the best performing scheme is up to 82% faster than the worst. The VCW is memory intensive, with up to 86.8% bandwidth utilisation using the best performing scheme. We also test a Web Browsing and a set of computer vision workloads. Most are not memory intensive, with utilisation under 15%.
Finally, we compare four schedulers and find that the FR-FCFS scheduler using the Write Drain mode [8] performed the best, outperforming the worst scheduler by 6.3%.
|
63 |
A virtualized quality of service packet scheduler acceleratorChuang, Kangtao Kendall 25 August 2008 (has links)
Resource virtualization is emerging as a technology to enable the management and sharing of hardware resources including multiple core processors and accelerators such as Digital Signal Processors (DSP), Graphics Processing Units (GPU), and Field Programmable Gate Arrays (FPGA). Accelerators present unique problems for virtualization and sharing due to their specialized architectures and interaction modes. This thesis explores and proposes solutions for the virtualized operation of high performance, quality of service (QoS) packet scheduling accelerators. It specifically concentrates on challenges to meet 10Gbps Ethernet wire speeds.
The packet scheduling accelerator is realized in a FPGA and implements the Sharestreams-V architecture. ShareStreams-V implements the Dynamic Window-Constrained Scheduler (DWCS) algorithm, and virtualizes the previous ShareStreams architecture. The original ShareStreams architecture, implemented on Xilinx Virtex-I and Virtex-II FPGAs, was able to schedule 128 streams at 10Gbps Ethernet throughput for 1500-byte packets. Sharestreams-V provides both hardware and software extensions to enable a single implementation to host isolated, independent virtual schedulers.
Four methods for virtualization of the packet scheduler accelerator are presented: coarse- and fine-grained temporal partitioning, spatial partitioning, and dynamic spatial partitioning. In addition to increasing the utilization of the scheduler, the decision throughput of the physical scheduler can be increased when sharing the physical scheduler across multiple virtual schedulers among multiple processes. This leads to the hypothesis for this work:
Virtualization of a quality of service packet scheduler accelerator through dynamic spatial partitioning is an effective and efficient approach to the accelerator virtualization supporting scalable decision throughput across multiple processes.
ShareStreams-V was synthesized targeting a Xilinx Virtex-4 FPGA. While sharing among four processes, designs that supported up to 16, 32, and 64 total streams are able to reach 10Gbps Ethernet scheduling throughput for 64-byte packets. When sharing among 32 processes, a scheduler supporting 64 total streams was able to reach the same throughput. An access API presents the virtual scheduler abstraction to individual processes in order to allocate, deallocate, update and control virtual the scheduler allocated to a process. Practically, the bottleneck for the test system is the software to hardware interface. Effective future implementations are anticipated to use a tightly-coupled host CPU to accelerator interconnect.
|
64 |
The virtual time function and rate-based schedulers for real-time communications over packet networksDevadason, Tarith Navendran January 2007 (has links)
[Truncated abstract] The accelerating pace of convergence of communications from disparate application types onto common packet networks has made quality of service an increasingly important and problematic issue. Applications of different classes have diverse service requirements at distinct levels of importance. Also, these applications offer traffic to the network with widely variant characteristics. Yet a common network is expected at all times to meet the individual communication requirements of each flow from all of these application types. One group of applications that has particularly critical service requirements is the class of real-time applications, such as packet telephony. They require both the reproduction of a specified timing sequence at the destination, and nearly instantaneous interaction between the users at the endpoints. The associated delay limits (in terms of upper bound and variation) must be consistently met; at every point where these are violated, the network transfer becomes worthless, as the data cannot be used at all. In contrast, other types of applications may suffer appreciable deterioration in quality of service as a result of slower transfer, but the goal of the transfer can still largely be met. The goal of this thesis is to evaluate the potential effectiveness of a class of packet scheduling algorithms in meeting the specific service requirements of real-time applications in a converged network environment. Since the proposal of Weighted Fair Queueing, there have been several schedulers suggested to be capable of meeting the divergent service requirements of both real-time and other data applications. ... This simulation study also sheds light on false assumptions that can be made about the isolation produced by start-time and finish-time schedulers based on the deterministic bounds obtained. The key contributions of this work are as follows. We clearly show how the definition of the virtual time function affects both delay bounds and delay distributions for a real-time flow in a converged network, and how optimality is achieved. Despite apparent indications to the contrary from delay bounds, the simulation analysis demonstrates that start-time rate-based schedulers possess useful characteristics for real-time flows that the traditional finish-time schedulers do not. Finally, it is shown that all the virtual time rate-based schedulers considered can produce isolation problems over multiple hops in networks with high loading. It becomes apparent that the benchmark First-Come-First-Served scheduler, with spacing and call admission control at the network ingresses, is a preferred arrangement for real-time flows (although lower priority levels would also need to be implemented for dealing with other data flows).
|
65 |
Téléchargement de Contenus dans les réseaux véhiculaires / Content download in the Vehicular NetworksAstudillo Salinas, Darwin Fabián 27 September 2013 (has links)
L’évolution des systèmes de communications sans fil a permis d’envisager de très nombreuses applications pour les systèmes de transport intelligents (ITS). Elles peuvent ou non utiliser une infrastructure et iront de la sécurité routière aux applications de confort du conducteur ou aux jeux en réseaux. La mise à jour de cartes constitue de notre point de vue une application représentative dans la mesure où ce n’est pas une application de sécurité en tant que telle, mais qu’en revanche elle peut contribuer à réduire les embouteillages en améliorant l’efficacité dans la prise de décisions des conducteurs. Elle possède des caractéristiques facilement identifiables : volume élevé de données, faible contrainte de délai, possibilité de mise en œuvre par des communications d’infrastructure à véhicule, entre véhicules, et hybrides. L’objectif est que les contenus soient téléchargés intégralement par tous les véhicules en un temps minimal, en utilisant le moins de ressources possible et au moindre coût. Les solutions qui sont apparues comme les plus adaptées ont concerné l’utilisation de solutions 802.11p avec ou sans infrastructure. Dans le cas de solutions avec infrastructure, un certain nombre de points d’accès diffuseront des informations avec des zones de couverture le plus souvent disjointes. Vu les tailles de zone retenues et/ou le débit consacré à ce type d’applications, le passage devant un seul point d’accès ne suffira pas à télécharger de telles cartes. Il s’agit alors de définir des stratégies de diffusion d’information. Une première étude a consisté à comparer une stratégie unicast à du broadcast/multicast. Cette dernière se révèle largement meilleure. Une combinaison de ces principes n’améliore pas les performances du système, car le débit consacré à la transmission unicast ne compense pas le débit non utilisé par le broadcast. Le problème provient des doublons reçus par les véhicules en passant auprès de plusieurs points d’accès consécutifs. Afin d’atténuer le phénomène des doublons, nous avons eu recours au Codage Réseau linéaire pseudo-aléatoire. L’idée est que le point d’accès diffuse des combinaisons linéaires de morceaux de fichiers. Le grand nombre de ces combinaisons linéaires réduit de façon significative ce phénomène. De façon complémentaire, nous avons étudié l’utilisation de communications ad-hoc pour combler les morceaux de fichier manquants, en particulier dans le cas d’absence d’infrastructure. Nous avons vérifié que l’on pouvait atteindre de bons résultats dans ce contexte en fonction de la diversité des morceaux de fichiers appartenant aux véhicules rencontrés. / The evolution of wireless communications systems have enabled to consider many applications for Intelligent Transportation Systems (ITS). They may or may not use the infrastructure. They will consider from the traffic safety applications up to the driver’s comfort or network games. The map updates are, from our point of view, a representative application but in the other hand it can help to reduce congestion in improving efficiency in decision making. It has well-defined characteristics : high volume of data, low delay constraint, possibility of implementation of infrastructure-to-vehicle communications, between vehicles and hybrids. The objective is that the contents are fully downloaded by all vehicles in minimum time, using fewer resources and lower costs. The solutions that have emerged as the most suitable concerned the use of the technology 802.11p with or without infrastructure. In the case of solutions with infrastructure, a number of access points broadcast information with coverage areas most often disjointed. Given the size of area used and/or flow devoted to this type of applications, the transition to a single access point is not enough to download these maps. It is then to define strategies of information dissemination. A first study was to compare a unicast strategy face to broadcast/multicast strategy. The latter appears largely improved. A combination of these principles does not improve system performance, because the flow devoted to unicast transmission does not compensate for the flow not used by the broadcast. The problem is duplicate chunks received by vehicles passing from several consecutive access points. To mitigate the phenomenon of duplication, we used the linear network coding pseudorandom. The idea is that the access point broadcasts linear combinations of chunks of files. The large number of these linear combinations significantly reduces this phenomenon. In a complementary manner, we investigated the use of ad hoc communications to fill the missing chunks of file, particularly in the absence of infrastructure. We verified that we could achieve good results in this context based on the diversity of chunks of files which are owned by the encountered vehicles.
|
66 |
Politiques polyvalentes et efficientes d'allocation de ressources pour les systèmes parallèles / Multi-Purpose Efficient Resource Allocation for Parallel SystemsMendonca, Fernando 23 May 2017 (has links)
Les plateformes de calcul à grande échelle ont beaucoup évoluées dernières années. La réduction des coûts des composants simplifie la construction de machines possédant des multicœurs et des accélérateurs comme les GPU.Ceci a permis une propagation des plateformes à grande échelle,dans lesquelles les machines peuvent être éloignées les unes des autres, pouvant même être situées sur différents continents. Le problème essentiel devient alors d'utiliser ces ressources efficacement.Dans ce travail nous nous intéressons d'abord à l'allocation efficace de tâches sur plateformes hétérogènes composées CPU et de GPU. Pour ce faire, nous proposons un outil nommé SWDUAL qui implémente l'algorithme de Smith-Waterman simultanément sur CPU et GPU, en choisissant quelles tâches il est plus intéressant de placer sur chaque type de ressource. Nos expériences montrent que SWDUAL donne de meilleurs résultats que les approches similaires de l'état de l'art.Nous analysons ensuite une nouvelle méthode d'ordonnancement enligne de tâches indépendantes de différentes tailles. Nous proposons une nouvelle technique qui optimise la métrique du stretch. Elle consiste à déplacer les jobs qui retardent trop de petites tâches sur des machines dédiées. Nos résultats expérimentaux montrent que notre méthode obtient de meilleurs résultats que la politique standard et qu'elle s'approche dans de nombreux cas des résultats d'une politique préemptive, qui peut être considérée comme une borne inférieure.Nous nous intéressons ensuite à l'impact de différentes contraintes sur la politique FCFS avec backfilling. La contrainte de contiguïté essaye de compacter les jobs et de réduire la fragmentation dans l'ordonnancement. La contrainte de localité basique place les jobs de telle sorte qu'ils utilisent le plus petit nombre de groupes de processeurs appelés textit. Nos résultats montrent que les bénéfices de telles contraintes sont suffisants pour compenser la réduction du nombre de jobs backfillés due à la réduction de la fragmentation.Nous proposons enfin une nouvelle contrainte nommée localité totale, dans laquelle l'ordonnanceur modélise la plateforme par un fat tree et se sert de cette information pour placer les jobs là où leur coût de communication est minimal.Notre campagne d'expériences montre que cette contrainte obtient de très bons résultats par rapport à un backfilling basique, et de meilleurs résultats que les contraintes précédentes. / The field of parallel supercomputing has been changing rapidly inrecent years. The reduction of costs of the parts necessary to buildmachines with multicore CPUs and accelerators such as GPUs are ofparticular interest to us. This scenario allowed for the expansion oflarge parallel systems, with machines far apart from each other,sometimes even located on different continents. Thus, the crucialproblem is how to use these resources efficiently.In this work, we first consider the efficient allocation of taskssuitable for CPUs and GPUs in heterogeneous platforms. To that end, weimplement a tool called SWDUAL, which executes the Smith-Watermanalgorithm simultaneously on CPUs and GPUs, choosing which tasks aremore suited to one or another. Experiments show that SWDUAL givesbetter results when compared to similar approaches available in theliterature.Second, we study a new online method for scheduling independent tasksof different sizes on processors. We propose a new technique thatoptimizes the stretch metric by detecting when a reasonable amount ofsmall jobs is waiting while a big job executes. Then, the big job isredirected to separate set of machines, dedicated to running big jobsthat have been redirected. We present experiment results that show thatour method outperforms the standard policy and in many cases approachesthe performance of the preemptive policy, which can be considered as alower bound.Next, we present our study on constraints applied to the Backfillingalgorithm in combination with the FCFS policy: Contiguity, which is aconstraint that tries to keep jobs close together and reducefragmentation during the schedule, and Basic Locality, that aims tokeep jobs as much as possible inside groups of processors calledclusters. Experiment results show that the benefits of using theseconstrains outweigh the possible decrease in the number of backfilledjobs due to reduced fragmentation.Finally, we present an additional constraint to the Backfillingalgorithm called Full Locality, where the scheduler models the topologyof the platform as a fat tree and uses this model to assign jobs toregions of the platform where communication costs between processors isreduced. The experiment campaign is executed and results show that FullLocality is superior to all the previously proposed constraints, andspecially Basic Backfilling.
|
67 |
Placement de tâches dynamique et flexible sur processeur multicoeur asymétrique en fonctionnalités / Dynamic and flexible task mapping on functionally asymmetric multi-core processorAminot, Alexandre 01 October 2015 (has links)
Pour répondre aux besoins de plus en plus hétérogènes des applications (puissance et efficacité énergétique), nous nous intéressons dans cette thèse aux architectures émergentes de type multi-cœur asymétrique en fonctionnalités (FAMP). Ces architectures sont caractérisées par une mise en œuvre non-uniforme des extensions matérielles dans les cœurs (ex. unitée de calculs à virgule flottante (FPU)). Les avantages en surface sont apparents, mais qu'en est-il de l'impact au niveau logiciel, énergétique et performance?Pour répondre à ces questions, la thèse explore la nature de l'utilisation des extensions dans des applications de l'état de l'art et compare différentes méthodes existantes. Pour optimiser le placement de tâches et ainsi augmenter l'efficacité, la thèse propose une solution dynamique au niveau ordonnanceur, appelée ordonnanceur relaxé.Les extensions matérielles sont intéressantes car elles permettent des accélérations difficilement atteignables par la parallélisation sur un multi-cœur. Néanmoins, leurs utilisations par les applications sont faibles et leur coût en termes de surface et consommation énergétique sont importants.En se basant sur ces observations, les points suivants ont été développés:Nous présentons une étude approfondie sur l'utilisation de l'extension vectorielle et FPU dans des applications de l'état de l'artNous comparons plusieurs solutions de gestion des extensions à différent niveaux de granularité temporelle d'action pour comprendre les limites de ces solutions et ainsi définir à quel niveau il faut agir. Peu d'études traitent la question de la granularité d'action pour gérer les extensions.Nous proposons une solution pour estimer en ligne la dégradation de performance à exécuter une tâche sur un cœur sans extension. Afin de permettre la mise à l'échelle des multi-cœurs, le système d'exploitation doit avoir de la flexibilité dans le placement de tâches. Placer une tâche sur un cœur sans extension peut avoir d'importantes conséquences en énergie et en performance. Or à ce jour, il n'existe pas de solution pour estimer cette potentielle dégradation.Nous proposons un ordonnanceur relaxé, basé notre modèle d'estimation de dégradation, qui place les tâches sur un ensemble de cœurs hétérogènes de manière efficace. Nous étudions la flexibilité gagnée ainsi que les conséquences aux niveaux performances et énergie.Les solutions existantes proposent des méthodes pour placer les tâches sur un ensemble de cœurs hétérogènes, or, celles-ci n'étudient pas le compromis entre qualité de service et gain en consommation pour les architectures FAMP.Nos expériences sur simulateur ont montré que l'ordonnanceur peut atteindre une flexibilité de placement significative avec une dégradation en performance de moins de 2%. Comparé à un multi-cœur symétrique, notre solution permet un gain énergétique moyen au niveau cœur de 11 %. Ces résultats sont très encourageant et contribuent au développement d'une plateforme complète FAMP. Cette thèse a fait l'objet d'un dépôt de brevet, de trois communications scientifiques internationales (plus une en soumission), et a contribué à deux projets européens. / To meet the increasingly heterogeneous needs of applications (in terms of power and efficiency), this thesis focus on the emerging functionally asymmetric multi-core processor (FAMP) architectures. These architectures are characterized by non-uniform implementation of hardware extensions in the cores (ex. Floating Point Unit (FPU)). The area savings are apparent, but what about the impact in software, energy and performance?To answer these questions, the thesis investigates the nature of the use of extensions in state-of-the-art's applications and compares various existing methods. To optimize the tasks mapping and increase efficiency, the thesis proposes a dynamic solution at scheduler level, called relaxed scheduler.Hardware extensions are valuable because they speed up part of code where the parallelization on multi-core isn't efficient. However, the hardware extensions are under-exploited by applications and their cost in terms of area and power consumption are important.Based on these observations, the following contributions have been proposed:We present a detailed study on the use of vector and FPU extensions in state-of-the-art's applicationsWe compare multiple extension management solutions at different levels of temporal granularity of action, to understand the limitations of these solutions and thus define at which level we must act. Few studies address the issue of the granularity of action to manage extensions.We offer a solution for estimating online performance degradation to run a task on a core without a given extension. To allow the scalability of multi-core, the operating system must have flexibility in the placement of tasks. Placing a task on a core with no extension can have important consequences for energy and performance. But to date, there is no way to estimate this potential degradation.We offer a relaxed scheduler, based on our degradation estimation model, which maps the tasks on a set of heterogeneous cores effectively. We study the flexibility gained and the implications for performance and energy levels. Existing solutions propose methods to map tasks on a heterogeneous set of cores, but they do not study the tradeoff between quality of service and consumption gain for FAMP architectures.Our experiments with simulators have shown that the scheduler can achieve a significantly higher mapping flexibility with a performance degradation of less than 2 %. Compared to a symmetrical multi-core, our solution enables an average energy gain at core level of 11 %. These results are very encouraging and contribute to the development of a comprehensive FAMP platform . This thesis has been the subject of a patent application, three international scientific communications (plus one submission), and contributes to two active european projects.
|
68 |
Incorporação de informações secundárias para gerenciar o risco no planejamento de lavra de curto prazo. / Incorporation of secondary information for risk planning short time manage.Carlos Carrasco Arbieto 30 November 2006 (has links)
O planejamento de lavra de curto prazo é normalmente executado utilizando-se número reduzido de informações de sondagem. Para aprimorar o gerenciamento de riscos geológicos no planejamento de lavra de curto prazo é necessário utilizar um universo maior de informações. Como é normalmente inviável obter novas informações de sondagem, esta dissertação propõe uma metodologia de utilização de amostras de pó de perfuratriz (a partir de furos de desmonte) como uma fonte de informação secundária e assim aprimorar a qualidade das estimativas. Neste sentido, foi adotada uma técnica de co-estimativa da variável P2O5 das sondagens (variável primária) em conjunto com a variável P2O5 do desmonte (variável secundária) baseado no modelo Marcoviano MM2, pelo qual é possível combinar as duas informações (sondagem e desmonte) na estimativa de um modelo de blocos. Este processo permitiu a modelagem de atributos geológicos de forma mais detalhada o que contribuiu para uma melhor interface entre o planejamento de curto prazo e a operação da mina. A metodologia proposta também possibilitou acessar uma população maior de informações geológicas o que contribui para a criação de planos operacionais mais aderentes aos objetivos de produção mensal ou semanal, e, ao mesmo tempo, respeitando o sequenciamento importado do planejamento de longo e médio prazo. Como resultado, foi demonstrado que é possível criar programas operacionais mais precisos com base em estimativas de áreas próximas à lavra mesmo quando apenas um pequeno número de informações primárias (sondagens) esteja disponível. / Short-term mine planning is normally carried out over a limited number of drillhole information. In order to improve the management of geological risks in mine planning, a larger population of samples is required. However, it is normally very difficult to obtain additional drillhole samples once mining takes place. This research addresses that issue by proposing a methodology for the incorporation of additional information from blastholes (secondary information) to the original drillhole samples (primary information). A co-estimation technique for using P2O5 samples from the drillholes (primary variable) in conjunction with P2O5 from blastholes (secondary variable) based on the Markovian estimation model (MM2), through which is possible to combine both sources of information for a better estimation of mineable blocks. This process has allowed more detailed modeling of geological attributes and a better interface between short-term mine planning and mine operations. The proposed methodology also allowed the access to a larger sample population which meant more accurate mine plans for the daily and weekly mine schedules. As a result, it has been demonstrated that it is possible to crate operational plans that are more precise through the use of models that are properly estimated even in those areas where only a small amount of primary information (drillhole samples) is available.
|
69 |
HCLogP: um modelo computacional para clusters heterogêneosSoares, Thiago Marques 09 March 2017 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-15T14:39:21Z
No. of bitstreams: 1
thiagomarquessoares.pdf: 1372109 bytes, checksum: 0decc31aa35ac2d0364f017e2f671861 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-17T15:59:41Z (GMT) No. of bitstreams: 1
thiagomarquessoares.pdf: 1372109 bytes, checksum: 0decc31aa35ac2d0364f017e2f671861 (MD5) / Made available in DSpace on 2017-05-17T15:59:41Z (GMT). No. of bitstreams: 1
thiagomarquessoares.pdf: 1372109 bytes, checksum: 0decc31aa35ac2d0364f017e2f671861 (MD5)
Previous issue date: 2017-03-09 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O modelo LogP foi desenvolvido em 1993 para medir os efeitos da latência de
comunicação, ocupação dos processadores e banda passante em multiprocessadores
com memória distribuída. A ideia era caracterizar multiprocessadores de memória
distribuída usando estes parâmetros chave, analisando seus impactos no desempenho.
Este trabalho propõe um novo modelo, baseado no LogP, que descreve a influência destes
parâmetros no desempenho de aplicações regulares executadas em um agregado (cluster)
de computadores heterogêneos. O modelo considera que um agregado heterogêneo é
composto por diferentes tipos de processadores, aceleradores e controladores de rede.
Os resultados mostram que o pior erro nas estimativas feitas pelo modelo para o tempo
de execução paralelo foi de 19,2%, e, em muitos casos, a execução estimada foi igual
ou próxima do tempo real. Além disso, com base neste modelo, foi desenvolvido um
escalonador, que baseado nas características da aplicação e do ambiente, escolhe um
subconjunto de componentes que minimizem o tempo total de execução paralelo. O
escalonador obteve êxito na escolha da melhor configuração para a execução de aplicações
com diferentes comportamentos. / The LogP model was proposed in 1993 to measure the effects of communication latency,
processor occupancy and bandwidth in distributed memory multiprocessors. The idea
was to characterize distributed memory multiprocessor using these key parameters and
study their impact on performance in simulation environments. This work proposes a new
model, based on LogP, that describes the impacts on performance of regular applications
executing on a heterogeneous cluster. The model considers that a heterogeneous cluster is
composed of distinct types of processors, accelerators and networks. The results show that
the worst error in the estimations of the parallel execution time was about 19,2%, and,
in many cases, the estimated execution time is equal to or very close to the real one. In
addition, based on this model, a scheduler was developed. Based on the applications and
computational environment characteristics, the scheduler chooses the subset of processors,
accelerators and networks that minimize the parallel execution time. For applications with
different behaviors, the scheduler successfully chose the best configuration.
|
70 |
Comparative Analysis of Static Recovery Schemes for Distributed Computing / Komparativ analys av statisk Recovery Program för Distributed ComputingHusain, Rashid, Kazmi, Syed Muhammad Husnain January 2008 (has links)
The primary objective of this thesis is to evaluate how grid computing works with their infrastructure. It also provides study to recognize the cases in which dynamic scheduler is preferable, how can the static recovery schemes play an effective role in large distributed system where load balancing is a key element and how can we get optimality in maximum number of crash computers using dynamic or static recovery schemes. This thesis consists of two parts: construction of Golomb and Trapezium modules, and performance comparison of Golomb and Trapezium recovery schemes with dynamic recovery scheme. In the first part we construct two modules that generate the recovery list of n computers, one for Golomb and one for Trapezium. In second part we make three schedulers, two for static recovery scheme and one for dynamic recovery scheme. In static recovery scheme we compare the performance of Golomb and Trapezium recovery scheme then we compare this performance with dynamic recovery scheme by using GridSim. / 0046735991980, 0046766503096
|
Page generated in 0.0773 seconds