Global ETD Search

11	Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources / Ordonnancement de noyaux d'algèbre linéaire dense sur ressources hétérogènes Kumar, Suraj 12 April 2017 (has links) Du fait des énormes capacités de calculs des accélérateurs tels que les GPUs et les Xeon Phi, l’utilisation de machines multicoques pourvues d’accélérateurs est devenue commune dans le domaine du calcul haute performance (HPC). La complexité induite par ces accélérateurs a suscité le développement de systèmes d’exécution à base de tâches, dans lesquels les dépendances entre les applications sont exprimées sous la forme de graphe de tâches et où les tâches sont ordonnancées dynamiquement sur les ressources de calcul. La difficulté est alors de concevoir des stratégies d’ordonnancement qui font une utilisation efficace des ressources de calculs et le développement de telles stratégies, même pour un unique noeud hybride, est un enjeu essentiel de la performance des systèmes HPC. Nous considérons dans cette thèse l’ordonnancement de noyaux d’algèbre linéaire dense sur des noeuds complètement hétérogènes et constitués de CPUs et de GPUs. Les performances relatives des accélérateurs par rapport aux coeurs classique dépend très fortement du noyau considéré. Par exemple, les accélérateurs sont beaucoup plus efficaces pour les produits de matrices, par exemple, que pour les factorisations. Dans cette thèse, nous analysons les performances de stratégies statiques et dynamiques d’ordonnancement et nous proposons un ensemble de stratégies intermédiaires, en ajoutant des composantes statiques (respectivement dynamiques) à des stratégies d’ordonnancements dynamique (respectivement statiques). Récemment, une stratégie appelée HeteroPrio a été proposée, qui s’appuie sur les affinités entre les tâches et les ressources pour un petit ensemble de tâches différentes s’exécutant sur deux types de ressources. Nous avons étendu cette stratégie d’ordonnancement pour des graphes de tâches généraux pour deux types de ressources puis pour plus de deux types. De manière complémentaire, nous avons également démontré des facteurs d’approximation et des pires cas pour HeteroPrio dans le cas d’un ensemble de tâches indépendantes sur différents types de plates-formes. / Due to massive computation power of accelerators such as GPU, Xeon phi, multicore machines equipped with accelerators are becoming popular in High Performance Computing (HPC). The added complexity led to the development of different task-based runtime systems, which allow computations to be expressed as graphs of tasks and rely on runtime systems to schedule those tasks among all resources of the platform. The real challenge is to design efficient schedulers for such runtimes to make effective utilization of all resources. Developing good schedulers, even for a single hybrid node, and analyzing them can thus have a strong impact on the performance of current HPC systems. We consider the problem of scheduling dense linear algebra applications on fully hybrid platforms made of CPUs and GPUs. The relative performance of CPU and GPU highly depends on the sub-routine. For instance, GPUs are much more efficient to process matrix-matrix multiplications than matrix factorizations. In this thesis, we analyze the performance of static and dynamic scheduling strategies and we propose a set of intermediate strategies, by adding static (resp. dynamic) features into dynamic (resp. static) strategies. A resource centric dynamic scheduler, HeteroPrio, which is based on affinity between tasks and resources, has been proposed recently for a set of small independent tasks on two types of resources. We extend and analyze this scheduler for general task graphs first on two types of resources and then on more than two types of resources. Additionally, we provide approximation ratios and worst case examples of HeteroPrio for a set of independent tasks on different platform sizes. Algèbre linéaire dense Ordonnancement dynamique Plates-formes hétérogènes Systèmes d’ordonnancement dynamiques Dense Linear Algebra Dynamic Schedulers Task-based Scheduling Heterogeneous Platforms Runtime Systems
12	A method of evaluation of high-performance computing batch schedulers Futral, Jeremy Stephen 01 January 2019 (has links) According to Sterling et al., a batch scheduler, also called workload management, is an application or set of services that provide a method to monitor and manage the flow of work through the system [Sterling01]. The purpose of this research was to develop a method to assess the execution speed of workloads that are submitted to a batch scheduler. While previous research exists, this research is different in that more complex jobs were devised that fully exercised the scheduler with established benchmarks. This research is important because the reduction of latency even if it is miniscule can lead to massive savings of electricity, time, and money over the long term. This is especially important in the era of green computing [Reuther18]. The methodology used to assess these schedulers involved the execution of custom automation scripts. These custom scripts were developed as part of this research to automatically submit custom jobs to the schedulers, take measurements, and record the results. There were multiple experiments conducted throughout the course of the research. These experiments were designed to apply the methodology and assess the execution speed of a small selection of batch schedulers. Due to time constraints, the research was limited to four schedulers. x The measurements that were taken during the experiments were wall time, RAM usage, and CPU usage. These measurements captured the utilization of system resources of each of the schedulers. The custom scripts were executed using, 1, 2, and 4 servers to determine how well a scheduler scales with network growth. The experiments were conducted on local school resources. All hardware was similar and was co-located within the same data-center. While the schedulers that were investigated as part of the experiments are agnostic to whether the system is grid, cluster, or super-computer; the investigation was limited to a cluster architecture. Thesis University of North Florida UNF Computing batch schedulers high performance OS and Networks Systems Architecture
13	Estudo de desempenho do sistema 3G 1xEV-DO atraves de modelos reais de trafego / Performance study of the 3G 1xEV-DO system through traffic real models Marques, Leandro Bento Sena 05 August 2018 (has links) Orientador: Shusaburo Motoyama / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-05T03:22:31Z (GMT). No. of bitstreams: 1 Marques_LeandroBentoSena_M.pdf: 2511927 bytes, checksum: 046043b18faff2ea54b0ccb236d8ae9b (MD5) Previous issue date: 2005 / Resumo: Neste trabalho é apresentado um estudo do desempenho do enlace direto do sistema 1xEV-DO. O estudo enfatiza o impacto da adoção de modelos de tráfegos reais tais como HTTP, FTP e WAP e de diferentes escalonadores. Através de simulação são analisados: o atraso médio dos pacotes, a vazão e percentual de perda de cada tipo de tráfego em função da carga de tráfego. É adotado neste estudo o escalonamento FIFO, com prioridade, WFQ e DRR. Os resultados mostram que o sistema 1xEV-DO padronizado permite transmissões de dados a altas taxas de bits. É mostrado, também, que a introdução de outros escalonadores podem melhorar ainda mais o desempenho do sistema / Abstract: The forward link performance study of the 1xEV- DO system is presented in this work. The study emphasizes the impact of the adoption of the real traf c mode1s such as HTTP, FTP and WAP and several different schedulers. The average packet delay, the throughput and each traf c loss percentage are analyzed through simulation in function of traf c load. It is adopted in this study schedulers FIFO, with priority, WFQ and DRR. The results show that standardized 1xEV-DO system allows data transmission at high bits rates. It is also shown that the system performance can still enhanced by introducing others schedulers / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica Telecomunicações - Tráfego Sistemas de comunicação sem fio Real traffic HTTP FTP WAP Schedulers FIFO,with priority WFQ DRR
14	Optimizations In Storage Area Networks And Direct Attached Storage Dharmadeep, M C 02 1900 (has links) The thesis consists of three parts. In the first part, we introduce the notion of device-cache-aware schedulers. Modern disk subsystems have many megabytes of memory for various purposes such as prefetching and caching. Current disk scheduling algorithms make decisions oblivious of the underlying device cache algorithms. In this thesis, we propose a scheduler architecture that is aware of underlying device cache. We also describe how the underlying device cache parameters can be automatically deduced and incorporated into the scheduling algorithm. In this thesis, we have only considered adaptive caching algorithms as modern high end disk subsystems are by default configured to use such algorithms. We implemented a prototype for Linux anticipatory scheduler, where we observed, compared with the anticipatory scheduler, upto 3 times improvement in query execution times with Benchw benchmark and upto 10 percent improvement with Postmark benchmark. The second part deals with implementing cooperative caching for the Redhat Global File System. The Redhat Global File System (GFS) is a clustered shared disk file system. The coordination between multiple accesses is through a lock manager. On a read, a lock on the inode is acquired in shared mode and the data is read from the disk. For a write, an exclusive lock on the inode is acquired and data is written to the disk; this requires all nodes holding the lock to write their dirty buffers/pages to disk and invalidate all the related buffers/pages. A DLM (Distributed Lock Manager) is a module that implements the functions of a lock manager. GFS’s DLM has some support for range locks, although it is not being used by GFS. While it is clear that a data sourced from a memory copy is likely to have lower latency, GFS currently reads from the shared disk after acquiring a lock (just as in other designs such as IBM’s GPFS) rather than from remote memory that just recently had the correct contents. The difficulties are mainly due to the circular relationships that can result between GFS and the generic DLM architecture while integrating DLM locking framework with cooperative caching. For example, the page/buffer cache should be accessible from DLM and yet DLM’s generality has to be preserved. The symmetric nature of DLM (including the SMP concurrency model) makes it even more difficult to understand and integrate cooperative caching into it (note that GPFS has an asymmetrical design). In this thesis, we describe the design of a cooperative caching scheme in GFS. To make it more effective, we also have introduced changes to the locking protocol and DLM to handle range locks more efficiently. Experiments with micro benchmarks on our prototype implementation reveal that, reading from a remote node over gigabit Ethernet can be upto 8 times faster than reading from a enterprise class SCSI disk for random disk reads. Our contributions are an integrated design for cooperative caching and lock manager for GFS, devising a novel method to do interval searches and determining when sequential reads from a remote memory perform better than sequential reads from a disk. The third part deals with selecting a primary network partition in a clustered shared disk system, when node/network failures occur. Clustered shared disk file systems like GFS, GPFS use methods that can fail in case of multiple network partitions and also in case of a 2 node cluster. In this thesis, we give an algorithm for fault-tolerant proactive leader election in asynchronous shared memory systems, and later its formal verification. Roughly speaking, a leader election algorithm is proactive if it can tolerate failure of nodes even after a leader is elected, and (stable) leader election happens periodically. This is needed in systems where a leader is required after every failure to ensure the availability of the system and there might be no explicit events such as messages in the (shared memory) system. Previous algorithms like DiskPaxos are not proactive. In our model, individual nodes can fail and reincarnate at any point in time. Each node has a counter which is incremented every period, which is same across all the nodes (modulo a maximum drift). Different nodes can be in different epochs at the same time. Our algorithm ensures that per epoch there can be at most one leader. So if the counter values of some set of nodes match, then there can be at most one leader among them. If the nodes satisfy certain timeliness constraints, then the leader for the epoch with highest counter also becomes the leader for the next epoch (stable property). Our algorithm uses shared memory proportional to the number of processes, the best possible. We also show how our protocol can be used in clustered shared disk systems to select a primary network partition. We have used the state machine approach to represent our protocol in Isabelle HOL logic system and have proved the safety property of the protocol. Computer Storage Computer Memory Devices Adaptive Caching Algorithms Cooperative Caching Redhat Global File System - Caching Device-Cache-Aware Schedulers Clustered Shared Disk Systems Scheduler Architecture Storage Area Networks Direct Attached Storage Distributed Lock Manager (DLM) Global File System (GFS) Computer Science

Page generated in 0.0419 seconds