Global ETD Search

1	Dynamic scheduling in multicore processors Rosas Ham, Demian January 2012 (has links) The advent of multi-core processors, particularly with projections that numbers of cores will continue to increase, has focused attention on parallel programming. It is widely recognized that current programming techniques, including those that are used for scientific parallel programming, will not allow the easy formulation of general purpose applications. An area which is receiving interest is the use of programming styles which do not have side-effects. Previous work on parallel functional programming demonstrated the potential of this to permit the easy exploitation of parallelism. This thesis investigates a dynamic load balancing system for shared memory Chip Multiprocessors. This system is based on a parallel computing model called SLAM (Spreading Load with Active Messages), which makes use of functional language evaluation techniques. A novel hardware/software mechanism for exploiting fine grain parallelism is presented. This mechanism comprises a runtime system which performs dynamic scheduling and synchronization automatically when executing parallel applications. Additionally the interface for using this mechanism is provided in the form of an API. The proposed system is evaluated using cycle-level models and multithreaded applications running in a full system simulation environment. 004.1
2	Dynamic Load Balancing of Virtual Machines Hosted on Xen Wilcox, Terry Clyde 10 December 2008 (has links) (PDF) Currently systems of virtual machines are load balanced statically which can create load imbalances for systems where the load changes dynamically over time. For throughput and response time of a system to be maximized it is necessary for load to be evenly distributed among each part of the system. We implement a prototype policy engine for the Xen virtual machine monitor which can dynamically load balance virtual machines. We compare the throughput and response time of our system using the cpu2000 and the WEB2005 benchmarks from SPEC. Under the loads we tested, dynamic load balancing had 5%-8% higher throughput than static load balancing. Virtual Machines Dynamic Load Balancing Computer Sciences
3	A Framework for Performance Optimization of TensorContraction Expressions Lai, Pai-Wei January 2014 (has links) No description available. Computer Science tensor contraction operation minimization dynamic load-balancing caching
4	Biophysically Accurate Brain Modeling and Simulation using Hybrid MPI/OpenMP Parallel Processing Hu, Jingzhen 2012 May 1900 (has links) In order to better understand the behavior of the human brain, it is very important to perform large scale neural network simulation which may reveal the relationship between the whole network activity and the biophysical dynamics of individual neurons. However, considering the complexity of the network and the large amount of variables, researchers choose to either simulate smaller neural networks or use simple spiking neuron models. Recently, supercomputing platforms have been employed to greatly speedup the simulation of large brain models. However, there are still limitations of these works such as the simplicity of the modeled network structures and lack of biophysical details in the neuron models. In this work, we propose a parallel simulator using biophysically realistic neural models for the simulation of large scale neural networks. In order to improve the performance of the simulator, we adopt several techniques such as merging linear synaptic receptors mathematically and using two level time steps, which significantly accelerate the simulation. In addition, we exploit the efficiency of parallel simulation through three parallel implementation strategies: MPI parallelization, MPI parallelization with dynamic load balancing schemes and Hybrid MPI/OpenMP parallelization. Through experimental studies, we illustrate the limitation of MPI implementation due to the imbalanced workload among processors. It is shown that the two developed MPI load balancing schemes are not able to improve the simulation efficiency on the targeted parallel platform. Using 32 processors, the proposed hybrid approach, on the other hand, is more efficient than the MPI implementation and is about 31X faster than a serial implementation of the simulator for a network consisting of more than 100,000 neurons. Finally, it is shown that for large neural networks, the presented approach is able to simulate the transition from the 3Hz delta oscillation to epileptic behaviors due to the alterations of underlying cellular mechanisms. Biophysially model Large scale network Parallel MPI OpenMP Dynamic load balancing
5	GRAPHICAL MODELING AND SIMULATION OF A HYBRID HETEROGENEOUS AND DYNAMIC SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE Zheng, Chunfang 01 January 2004 (has links) A single-chip, hybrid, heterogeneous, and dynamic shared memory multiprocessor architecture is being developed which may be used for real-time and non-real-time applications. This architecture can execute any application described by a dataflow (process flow) graph of any topology; it can also dynamically reconfigure its structure at the node and processor architecture levels and reallocate its resources to maximize performance and to increase reliability and fault tolerance. Dynamic change in the architecture is triggered by changes in parameters such as application input data rates, process execution times, and process request rates. The architecture is a Hybrid Data/Command Driven Architecture (HDCA). It operates as a dataflow architecture, but at the process level rather than the instruction level. This thesis focuses on the development, testing and evaluation of a new graphic software (hdca) developed to first do a static resource allocation for the architecture to meet timing requirements of an application and then hdca simulates the architecture executing the application using statically assigned resources and parameters. While simulating the architecture executing an application, the software graphically and dynamically displays parameters and mechanisms important to the architectures operation and performance. The new graphical software is able to show system and node level dynamic capability of the HDCA. The newly developed software can model a fixed or varying input data rate. The model also allows fault tolerance analysis of the architecture.
6	Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design Krasteva, Denitza Tchavdarova Jr. 10 October 1998 (has links) Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance. / Master of Science dynamic load balancing multidisciplinary design optimization parallel computation random polling global round robin
7	Redistribution dynamique parallèle efficace de la charge pour les problèmes numériques de très grande taille / Efficient parallel dynamic load balancing for very large numerical problems Fourestier, Sébastien 20 June 2013 (has links) Cette thèse traite du problème de la redistribution dynamique parallèle efficace de la charge pour les problèmes numériques de très grande taille. Nous présentons tout d'abord un état de l'art des algorithmes permettant de résoudre les problèmes du partitionnement, du repartitionnement, du placement statique et du re-placement. Notre première contribution vise à étudier, dans un cadre séquentiel, les caractéristiques algorithmiques souhaitables pour les méthodes parallèles de repartitionnement. Nous y présentons notre contribution à la conception d'un schéma multi-niveaux k-aire pour le calcul sequentiel de repartitionnements. La partie la plus exigeante de cette adaptation concerne la phase d'expansion. L'une de nos contributions majeures a été de nous inspirer des méthodes d'influence afin d'adapter un algorithme de raffinement par diffusion au problème du repartitionnement.Notre deuxième contribution porte sur la mise en oeuvre de ces méthodes sur machines parallèles. L'adaptation du schéma multi-niveaux parallèle a nécessité une évolution des algorithmes et des structures de données mises en oeuvre pour le partitionnement. Ce travail est accompagné d'une analyse expérimentale, qui est rendue possible grâce à la mise en oeuvre des algorithmes considérés au sein de la bibliothèque Scotch. / This thesis concerns efficient parallel dynamic load balancing for large scale numerical problems. First, we present a state of the art of the algorithms used to solve the partitioning, repartitioning, mapping and remapping problems. Our first contribution, in the context of sequential processing, is to define the desirable features that parallel repartitioning tools need to possess. We present our contribution to the conception of a k-way multilevel framework for sequential repartitioning. The most challenging part of this work regards the uncoarsening phase. One of our main contributions is the adaptation of influence methods to a global diffusion-based heuristic for the repartitioning problem. Our second contribution is the parallelization of these methods. The adaptation of the aforementioned algorithms required some modification of the algorithms and data structure used by existing parallel partitioning routines. This work is backed by a thorough experimental analysis, which is made possible thanks to the implementation of our algorithms into the Scotch library. Parallélisme Repartitionnement Redistribution dynamique Graphe Multi-niveaux Heuristiques Petascale Parallelism Repartitioning Dynamic load balancing Graph Multilevel Heuristics Petascale
8	Equilibrage de charges dynamique avec un nombre variable de processeurs basé sur des méthodes de partitionnement de graphe / Dynamic Load-Balancing with Variable Number of Processors based on Graph Partitioning Vuchener, Clement 07 February 2014 (has links) L'équilibrage de charge est une étape importante conditionnant les performances des applications parallèles. Dans le cas où la charge varie au cours de la simulation, il est important de redistribuer régulièrement la charge entre les différents processeurs. Dans ce contexte, il peut s'avérer pertinent d'adapter le nombre de processeurs au cours d'une simulation afin d'obtenir une meilleure efficacité, ou de continuer l'exécution quand toute la mémoire des ressources courantes est utilisée. Contrairement au cas où le nombre de processeurs ne varie pas, le rééquilibrage dynamique avec un nombre variable de processeurs est un problème peu étudié que nous abordons ici.Cette thèse propose différentes méthodes basées sur le repartitionnement de graphe pour rééquilibrer la charge tout en changeant le nombre de processeurs. Nous appelons ce problème « repartitionnement M x N ». Ces méthodes se décomposent en deux grandes étapes. Dans un premier temps, nous étudions la phase de migration et nous construisons une « bonne » matrice de migration minimisant plusieurs critères objectifs comme le volume total de migration et le nombre total de messages échangés. Puis, dans un second temps, nous utilisons des heuristiques de partitionnement de graphe pour calculer une nouvelle distribution optimisant la migration en s'appuyant sur les résultats de l'étape précédente. En outre, nous proposons un algorithme de partitionnement k-aire direct permettant d'améliorer le partitionnement biaisé. Finalement, nous validons cette thèse par une étude expérimentale en comparant nos méthodes aux partitionneursactuels. / Load balancing is an important step conditioning the performance of parallel programs. If the workload varies drastically during the simulation, the load must be redistributed regularly among the processors. Dynamic load balancing is a well studied subject but most studies are limited to an initially fixed number of processors. Adjusting the number of processors at runtime allows to preserve the parallel code efficiency or to keep running the simulation when the memory of the current resources is exceeded.In this thesis, we propose some methods based on graph repartitioning in order to rebalance the load while changing the number of processors. We call this problem \M x N repartitioning". These methods are split in two main steps. Firstly, we study the migration phase and we build a \good" migration matrix minimizing several metrics like the migration volume or the number of exchanged messages. Secondly, we use graph partitioning heuristics to compute a new distribution optimizing the migration according to the previous step results. Besides, we propose a direct k-way partitioning algorithm that allows us to improve our biased partitioning. Finally, an experimental study validates our algorithms against state-of-the-art partitioning tools. Simulation numérique Repartitionnement Partitionnement de graphe Redistribution Equilibrage de charge dynamique Parallélisme Numerical simulation Repartitioning. Graph partitioning Redistribution Dynamic load-balancing Parallelism
9	Parallélisme et équilibrage de charges dans le traitement de la jointure sur des architectures distribuées / Parallelism and load balancing in the treatment of the join on distributed architectures Al Hajj Hassan, Mohamad 16 December 2009 (has links) L’émergence des applications de bases de données dans les domaines tels que le data warehousing,le data mining et l’aide à la décision qui font généralement appel à de très grands volumes de donnéesrend la parallélisation des algorithmes des jointures nécessaire pour avoir un temps de réponse acceptable.Une accélération linéaire est l’objectif principal des algorithmes parallèles, cependant dans les applicationsréelles, elle est difficilement atteignable : ceci est dû généralement d’une part aux coûts de communicationsinhérents aux systèmes multi-processeurs et d’autre part au déséquilibre des charges des différents processeurs.En plus, dans un environnement hétérogène multi-utilisateur, la charge des différents processeurspeut varier de manière dynamique et imprévisible.Dans le cadre de cette thèse, nous nous intéressons au traitement de la jointure et de la multi-jointure surles architectures distribuées hétérogènes, les grilles de calcul et les systèmes de fichiers distribués. Nousavons proposé une variété d’algorithmes, basés sur l’utilisation des histogrammes distribués, pour traiterde manière efficace le déséquilibre des données, tout en garantissant un équilibrage presque parfait dela charge des différents processeurs même dans un environnement hétérogène et multi-utilisateur. Cesalgorithmes sont basés sur une approche dynamique de redistribution des données permettant de réduire lescoûts de communication à un minimum tout en traitant de manière très efficace le problème de déséquilibredes valeurs de l’attribut de jointure.L’analyse de complexité de nos algorithmes et les résultats expérimentaux obtenus montrent que cesalgorithmes possèdent une accélération presque linéaire. / The appeal of parallel processing becomes very strong in applications which require ever higher performanceand particularly in applications such as : data-warehousing, decision support, On-Line Analytical Processing(OLAP) and more generally DBMS. A linear speed-up is the main objective of parallel algorithms. However,in real applications, it’s not obvious to reach this objective due to the high communication cost in parallel anddistributed systems and to the possible skew in the charge of different processors. In addition, on heterogeneousmulti-user architectures, the load of each processor may highly vary in a dynamic and unpredictableway.In this thesis, we are interested in treating the join and multi-join queries on distributed multi-user heteregeneoussystems, grid systems and distributed file systems. We have proposed several algorithms based onusing distributed histograms. These algorithms are based on a dynamic data distribution and task allocationwhich makes them insensitive to data skew and ensure perfect balancing properties during all stages of joincomputation even on heteregeneous multi-user environment. The complexity analysis of our algorithms andthe experimental results show that they have a near-linear speedup. Jointures parallèles Multi-jointure Déséquilibre des données Equilibrage dynamique de charges Parallel joins Multi-join Data skew Dynamic load balancing
10	Dynamic load-balancing : a new strategy for weather forecast models Rodrigues, Eduardo Rocha January 2011 (has links) Weather forecasting models are computationally intensive applications and traditionally they are executed in parallel machines. However, some issues prevent these models from fully exploiting the available computing power. One of such issues is load imbalance, i.e., the uneven distribution of load across the processors of the parallel machine. Since weather models are typically synchronous applications, that is, all tasks synchronize at every time-step, the execution time is determined by the slowest task. The causes of such imbalance are either static (e.g. topography) or dynamic (e.g. shortwave radiation, moving thunderstorms). Various techniques, often embedded in the application’s source code, have been used to address both sources. However, these techniques are inflexible and hard to use in legacy codes. In this thesis, we explore the concept of processor virtualization for dynamically balancing the load in weather models. This means that the domain is over-decomposed in more tasks than the available processors. Assuming that many tasks can be safely executed in a single processor, each processor is put in charge of a set of tasks. In addition, the system can migrate some of them from overloaded processors to underloaded ones when it detects load imbalance. This approach has the advantage of decoupling the application from the load balancing strategy. Our objective is to show that processor virtualization can be applied to weather models as long as an appropriate strategy for migrations is used. Our proposal takes into account the communication pattern of the application in addition to the load of each processor. In this text, we present the techniques used to minimize the amount of change needed in order to apply processor virtualization to a real-world application. Furthermore, we analyze the effects caused by the frequency at which the load balancer is invoked and a threshold that activates rebalancing. We propose an automatic strategy to find an optimal threshold to trigger load balancing. These strategies are centralized and work well for moderately large machines. For larger machines, we present a fully distributed algorithm and analyze its performance. As a study case, we demonstrate the effectiveness of our approach for dynamically balancing the load in Brams, a mesoscale weather forecasting model based on MPI parallelization. We choose this model because it presents a considerable load imbalance caused by localized thunderstorms. In addition, we analyze how other effects of processor virtualization can improve performance. Processamento paralelo Metereologia Processamento : Alto desempenho High performance computing Dynamic load balancing Weather forecast models Processor virtualization

Search results