Global ETD Search

181	aIOLi : Contrôle, Ordonnancement et Régulation des Accès aux Données Persistantes dans les Environnements Multi-applicatifs Haute Performance Lèbre, Adrien 15 October 2006 (has links) (PDF) De nombreuses applications scientifiques utilisent et génèrent d'énormes quantités de données. Ces applications qui exploitent des modèles d'accès parallèles spécifiques (principalement des accès disjoints) sont souvent pénalisées par des systèmes de stockage inadaptés. Pour éviter les dégradations de performances, les bibliothèques d'Entrées/Sorties parallèles telles que ROMIO sont généralement utilisées pour agréger les petites requêtes séparées en de plus grosses contiguës habituellement plus performantes. Toutefois, les optimisations apportées pour un programme ne tiennent pas compte de l'ensemble des interactions avec d'autres applications s'exécutant en concurrence sur la grappe. La conséquence est que ces routines spécifiques visant à optimiser les accès d'une application vont s'avérer inutiles, car leur effet va être perturbé par les autres applications !<br /><br />Ce document décrit une nouvelle approche, appelée aIOLi, permettant le<br />contrôle, le réordonnancement et la régulation de l'ensemble des interactions générées par les différentes applications s'exécutant simultanément sur une grappe et ce, en s'appuyant uniquement sur l'interface POSIX.<br /><br />Dans un tel contexte, la performance, l'interactivité et l'équité sont des critères pour lesquels il est important de trouver un bon compromis. Pour y parvenir, une stratégie d'ordonnancement globale prenant en compte également les problématiques d'Entrées/Sorties parallèles locales aux applications a été définie. Le service aIOLi consiste en un support d'ordonnancement générique pouvant être rattaché à différentes parties d'un système de fichiers. L'exécution concurrente de jeux de tests IOR sur un serveur NFS traditionnel ont montré des améliorations particulièrement significatives pour les accès en lecture en comparaison aux performances pouvant être atteintes avec les routines POSIX ou MPI I/O. E/S parallèle calcul intensif MPI I/O grappe systèmes de fichiers parallèles
182	Contribution à la conception d'une plate-forme haute performance d'intégration d'exécutifs communicants pour la programmation des grilles de calcul Denis, Alexandre 12 December 2003 (has links) (PDF) Cette thèse étudie un modèle de plate-forme de communication pour la programmation des grilles de calcul. Notre but est d'étendre la portée des grilles en permettant l'exécution d'applications parallèles et/ou réparties sans imposer de contrainte de programmation ou d'exécutif particulier. Le modèle proposé permet l'utilisation d'exécutifs variés, adaptés à l'application plutôt que ceux dictés par les réseaux disponibles. Notre approche est basée sur : un arbitrage des accès aux ressources, pour permettre plusieurs exécutifs simultanément ; une adaptation d'abstraction qui permet de voir les ressources selon le paradigme choisi par l'utilisateur ; une virtualisation des ressources, qui permet l'utilisation d'exécutifs existants sans les modifier. Nous avons mis en oeuvre ce modèle dans la plate-forme PadicoTM, et porté divers exécutifs sur cette plate-forme, tels que MPI, CORBA, et SOAP. Les réseaux utilisables vont des SAN jusqu'aux WAN. Les performances obtenues sont excellentes. grille de calcul parallélisme systèmes répartis intergiciels CORBA MPI plate-forme de communication
183	Biophysically Accurate Brain Modeling and Simulation using Hybrid MPI/OpenMP Parallel Processing Hu, Jingzhen 2012 May 1900 (has links) In order to better understand the behavior of the human brain, it is very important to perform large scale neural network simulation which may reveal the relationship between the whole network activity and the biophysical dynamics of individual neurons. However, considering the complexity of the network and the large amount of variables, researchers choose to either simulate smaller neural networks or use simple spiking neuron models. Recently, supercomputing platforms have been employed to greatly speedup the simulation of large brain models. However, there are still limitations of these works such as the simplicity of the modeled network structures and lack of biophysical details in the neuron models. In this work, we propose a parallel simulator using biophysically realistic neural models for the simulation of large scale neural networks. In order to improve the performance of the simulator, we adopt several techniques such as merging linear synaptic receptors mathematically and using two level time steps, which significantly accelerate the simulation. In addition, we exploit the efficiency of parallel simulation through three parallel implementation strategies: MPI parallelization, MPI parallelization with dynamic load balancing schemes and Hybrid MPI/OpenMP parallelization. Through experimental studies, we illustrate the limitation of MPI implementation due to the imbalanced workload among processors. It is shown that the two developed MPI load balancing schemes are not able to improve the simulation efficiency on the targeted parallel platform. Using 32 processors, the proposed hybrid approach, on the other hand, is more efficient than the MPI implementation and is about 31X faster than a serial implementation of the simulator for a network consisting of more than 100,000 neurons. Finally, it is shown that for large neural networks, the presented approach is able to simulate the transition from the 3Hz delta oscillation to epileptic behaviors due to the alterations of underlying cellular mechanisms. Biophysially model Large scale network Parallel MPI OpenMP Dynamic load balancing
184	E-AMOM: An Energy-Aware Modeling and Optimization Methodology for Scientific Applications on Multicore Systems Lively, Charles 2012 May 1900 (has links) Power consumption is an important constraint in achieving efficient execution on High Performance Computing Multicore Systems. As the number of cores available on a chip continues to increase, the importance of power consumption will continue to grow. In order to achieve improved performance on multicore systems scientific applications must make use of efficient methods for reducing power consumption and must further be refined to achieve reduced execution time. In this dissertation, we introduce a performance modeling framework, E-AMOM, to enable improved execution of scientific applications on parallel multicore systems with regards to a limited power budget. We develop models for each application based upon performance hardware counters. Our models utilize different performance counters for each application and for each performance component (runtime, system power consumption, CPU power consumption, and memory power consumption) that are selected via our performance-tuned principal component analysis method. Models developed through E-AMOM provide insight into the performance characteristics of each application that affect performance for each component on a parallel multicore system. Our models are more than 92% accurate across both Hybrid (MPI/OpenMP) and MPI implementations for six scientific applications. E-AMOM includes an optimization component that utilizes our models to employ run-time Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Concurrency Throttling to reduce power consumption of the scientific applications. Further, we optimize our applications based upon insights provided by the performance models to reduce runtime of the applications. Our methods and techniques are able to save up to 18% in energy consumption for Hybrid (MPI/OpenMP) and MPI scientific applications and reduce the runtime of the applications up to 11% on parallel multicore systems. Performance Modeling Power consumption Multicore Parallel Programming MPI Hybrid Power prediction
185	Scientific High Performance Computing (HPC) Applications On The Azure Cloud Platform Agarwal, Dinesh 10 May 2013 (has links) Cloud computing is emerging as a promising platform for compute and data intensive scientific applications. Thanks to the on-demand elastic provisioning capabilities, cloud computing has instigated curiosity among researchers from a wide range of disciplines. However, even though many vendors have rolled out their commercial cloud infrastructures, the service offerings are usually only best-effort based without any performance guarantees. Utilization of these resources will be questionable if it can not meet the performance expectations of deployed applications. Additionally, the lack of the familiar development tools hamper the productivity of eScience developers to write robust scientific high performance computing (HPC) applications. There are no standard frameworks that are currently supported by any large set of vendors offering cloud computing services. Consequently, the application portability among different cloud platforms for scientific applications is hard. Among all clouds, the emerging Azure cloud from Microsoft in particular remains a challenge for HPC program development both due to lack of its support for traditional parallel programming support such as Message Passing Interface (MPI) and map-reduce and due to its evolving application programming interfaces (APIs). We have designed newer frameworks and runtime environments to help HPC application developers by providing them with easy to use tools similar to those known from traditional parallel and distributed computing environment set- ting, such as MPI, for scientific application development on the Azure cloud platform. It is challenging to create an efficient framework for any cloud platform, including the Windows Azure platform, as they are mostly offered to users as a black-box with a set of application programming interfaces (APIs) to access various service components. The primary contributions of this Ph.D. thesis are (i) creating a generic framework for bag-of-tasks HPC applications to serve as the basic building block for application development on the Azure cloud platform, (ii) creating a set of APIs for HPC application development over the Azure cloud platform, which is similar to message passing interface (MPI) from traditional parallel and distributed setting, and (iii) implementing Crayons using the proposed APIs as the first end-to-end parallel scientific application to parallelize the fundamental GIS operations. Cloud Computing GIS computations using Cloud Windows Azure MPI on cloud Bag of tasks Cloud frameworks Cloud Computing
186	Design, Implementation, and Formal Verification of On-demand Connection Establishment Scheme for TCP Module of MPICH2 Library Muthukrishnan, Sankara Subbiah 2012 August 1900 (has links) Message Passing Interface (MPI) is a standard library interface for writing parallel programs. The MPI specification is broadly used for solving engineering and scientific problems on parallel computers, and MPICH2 is a popular MPI implementation developed at Argonne National Laboratory. The scalability of MPI implementations is very important for building high performance parallel computing applications. The initial TCP (Transmission Control Protocol) network module developed for Nemesis communication sub-system in the MPICH2 library, however, was not scalable in how it established connections: pairwise connections between all of an application's processes were established during the initialization of the application (the library call to MPI_Init), regardless of whether the connections were eventually needed or not. In this work, we have developed a new TCP network module for Nemesis that establishes connections on-demand. The on-demand connection establishment scheme is designed to improve the scalability of the TCP network module in MPICH2 library, aiming to reduce the initialization time and the use of operating system resources of MPI applications. Our performance benchmark results show that MPI_Init in the on-demand connection establishment scheme becomes a fast constant time operation, and the additional cost of establishing connections later is negligible. The on-demand connection establishment between two processes, especially when two processes attempt to connect to each other simultaneously, is a complex task due to race-conditions and thus prone to hard-to-reproduce defects. To assure ourselves of the correctness of the TCP network module, we modeled its design using the SPIN model checker, and verified safety and liveness properties stated as Linear Temporal Logic claims. MPI MPICH2 on-demand connection establishment Nemesis model checking formal verification SPIN LTL abstraction
187	Image Annotation With Semi-supervised Clustering Sayar, Ahmet 01 December 2009 (has links) (PDF) Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this thesis, we propose a new image annotation technique, which improves the representation and quantization of the visual information by employing the available but unused information, called side information, which is hidden in the system. This side information is used to semi-supervise the clustering process which creates the visterms. The selection of side information depends on the visual image content, the annotation words and the relationship between them. Although there may be many different ways of defining and selecting side information, in this thesis, three types of side information are proposed. The first one is the hidden topic probability information obtained automatically from the text document associated with the image. The second one is the orientation and the third one is the color information around interest points that correspond to critical locations in the image. The side information provides a set of constraints in a semi-supervised K-means region clustering algorithm. Consequently, in generation of the visual terms from the regions, not only low level features are clustered, but also side information is used to complement the visual information, called visterms. This complementary information is expected to close the semantic gap between the low level features extracted from each region and the high level textual information. Therefore, a better match between visual codebook and the annotation words is obtained. Moreover, a speedup is obtained in the modified K-means algorithm because of the constraints brought by the side information. The proposed algorithm is implemented in a high performance parallel computation environment. QA General 15707
188	Performance analysis and modeling of GYRO Lively, Charles Wesley, III 30 October 2006 (has links) Efficient execution of scientific applications requires an understanding of how system features impact the performance of the application. Performance models provide significant insight into the performance relationships between an application and the system used for execution. In particular, models can be used to predict the relative performance of different systems used to execute an application. Recently, a significant effort has been devoted to gaining a more detailed understanding of the performance characteristics of a fusion reaction application, GYRO. GYRO is a plasma-physics application used to gain a better understanding of the interaction of ions and electrons in fusion reactions. In this thesis, we use the well-known Prophesy system to analyze and model the performance of GYRO across various supercomputer platforms. Using processor partitioning, we determine that utilizing the smallest number of processors per node is the most effective processor configuration for executing the application. Further, we explore trends in kernel coupling values across platforms to understand how kernels of GYRO interact. In this work, experiments are conducted on the supercomputers Seaborg and Jacquard at the DOE National Energy Research Scientific Computing Center and the supercomputers DataStar P655 and P690 at the San Diego Supercomputing Center. Across all four platforms, our results show that utilizing one processor per node (ppn) yields better performance than full or half ppn usage. Our experimental results also show that using kernel coupling to model and predict the performance of GYRO is more accurate than summation. On average, kernel coupling provides for prediction estimates that have less than a 7% error. The performance relationship between kernel coupling values and the sharing of information throughout the GYRO application is explored by understanding the global communication within the application and data locality. performance supercomputing GYRO kernel coupling processor partitioning performance prediction MPI modeling
189	Ablaufszenarien fuer Client-Server Anwendungen mit CORBA 2.0 Falk, Edelmann 12 November 1997 (has links) Die Common Object Request Broker Architecture (CORBA) der Object Management Group (OMG) bietet die Chance, nicht nur eine Plattform fuer neue verteilte Anwendungen zu sein, sondern erlaubt es auch, bestehende Anwendungen und Altsoftware hersteller- und systemuebergreifend zu integrieren. Diese Eigenschaft hebt CORBA von anderen Programmierplattformen ab und gibt CORBA das Potential, eine aussichtsreiche Basis fuer kuenftige Anwendungssysteme zu sein. Das Ziel dieser Studienarbeit besteht darin, die Umsetzbarkeit verschiedener Interaktionsarten in CORBA zu untersuchen und an Beispielen praktisch auszuprobieren. Moegliche Ablaufformen aus der Literatur, aus den Systemen DCE und MPI und anhand eigener Ueberlegungen werden im ersten Teil dieser Arbeit systematisch zusammengefasst. Danach folgt eine ausfuerliche Behandlung der Architektur von CORBA und der hier moeglichen Ablaufformen und Interaktionsszenarien. Abschliessend werden acht verschiedene Versionen eines einfachen verteilten Woerterbuches vorgestellt, um einige der in CORBA realisierten Konzepte am praktischen Beispiel zu verdeutlichen. Als CORBA-Plattform stand Orbix-MT 2.0.1 (multi-threaded) der Firma IONA Technologies Ltd. unter Solaris 2.x zur Verfuegung. CORBA OMA DCE MPI verteilte Systeme parallele Systeme Client-Server-Anwendungen Ablaufformen Interaktionsszenarien ddc:004
190	Blocking vs. Non-blocking Communication under MPI on a Master-Worker Problem Andr&eacute,, Fachat,, Hoffmann, Karl Heinz 30 October 1998 (has links) (PDF) In this report we describe the conversion of a simple Master-Worker parallel program from global blocking communications to non-blocking communications. The program is MPI-based and has been run on different computer architectures. By moving the communication to the background the processors can use the former waiting time for computation. However we find that the computing time increases by the time the communication time decreases in the used MPICH implementation on a cluster of workstations. Also using non-global communication instead of the global communication slows the algorithm down on computers with optimized global communication routines like the Cray T3D. MPICH blocking communication non-blocking communication MSC 68-04 ddc:530 ddc:004 MPI

Search results