Global ETD Search

91	Förbättring av mjukvarubibliotek för parallellberäkningar med programmeringsmodellen Chunks and Tasks El Harbiti, Deeb January 2015 (has links) Chunks and Tasks is a programming model based on the C ++ programming language. This programming model is used for electronic structure calculations, among other things.The purpose of this project is to improve the CHT-MPI software library for Chunks and tasks, so that calculations of matrix-matrix multiplications are performed more efficiently than they do with the existing software library. The software library is based on the work stealing method, which is a method the software library for Chunks and Tasks uses for the distribution of the calculation work. The considered way to improve the software library is by modifying the work stealing method in a way that makes the distribution of calculation work happen in a more efficient way , which will lead to calculations performed faster than before.Two different modifications of the work stealing method were tested and it led to two new methods, Method 1 and Method 2, which distributed the calculation work differently. Method 1 did not give results that were compatible with the theory, since the calculation time with this method was much longer than the previous method. The results for method 2 were compatible with the theory for the method. Method 2 distributed the calculation work more efficiently than before which decreased the amount of data sent during the calculations, which led to a shorter calculation time than with the previous method. This method made an improvement of the software library for the programming model Chunks and Tasks. / Chunks and Tasks är en programmeringsmodell baserad på programspråket C++. Denna programmeringsmodell används vid bl.a. metoder för lösningar av Schrödingerekvationen för elektronerna i molekyler. Syftet med detta projekt är att förbättra mjukvarubiblioteket för Chunks and Tasks, så att beräkningar av matris-matris-multiplikationer utförs på ett effektivare sätt än vad de gör med det existerande mjukvarubiblioteket. Mjukvarubiblioteket använder sig av work stealing-metoden vid fördelning av beräkningsarbetet. Det är tänkt att mjukvarubiblioteket ska förbättras genom att just modifiera work stealing-metoden på ett sätt som får arbetsfördelningen att ske på ett smidigare sätt, vilket i sin tur ska leda till att beräkningarna utförs under en kortare tid än tidigare. Två olika ändringar av work stealing-metoden testades och man fick två nya metoder, metod 1 och metod 2, som fördelade beräkningsarbetet olika sätt. Det som söktes var en metod som kunde minska mängden data som skickades under beräkningarna av olika matris-matrismultiplikationer, då en minskad data-mängd innebar en förkortning av beräkningstiden. Med metod 1 fick man en försämring, då beräkningstiderna blev mycket längre än tidigare. Med metod 2 erhöll man ett bättre resultat, med denna metod fördelades arbetet på ett effektivare sätt som ledde till att mängden data som skickades minskade, vilket även betydde att beräkningstiderna kortades ner. Med denna metod fick man en förbättring av mjukvarubiblioteket för programmeringsmodellen Chunks and Tasks. Chunks and Tasks work stealing CHT-MPI C++ programmering Cygwin Uppmax
92	Computational kinetics of a large scale biological process on GPU workstations : DNA bending Ruymgaart, Arnold Peter 30 October 2013 (has links) It has only recently become possible to study the dynamics of large time scale biological processes computationally in explicit solvent and atomic detail. This required a combination of advances in computer hardware, utilization of parallel and special purpose hardware as well as numerical and theoretical approaches. In this work we report advances in these areas contributing to the feasibility of a work of this scope in a reasonable time. We then make use of them to study an interesting model system, the action of the DNA bending protein 1IHF and demonstrate such an effort can now be performed on GPU equipped PC workstations. Many cellular processes require DNA bending. In the crowded compartment of the cell, DNA must be efficiently stored but this is just one example where bending is observed. Other examples include the effects of DNA structural features involved in transcription, gene regulation and recombination. 1IHF is a bacterial protein that binds and kinks DNA at sequence specific sites. The 1IHF binding to DNA is the cause or effect of bending of the double helix by almost 180 degrees. Most sequence specific DNA binding proteins bind in the major groove of the DNA and sequence specificity results from direct readout. 1IHF is an exception; it binds in the minor groove. The final structure of the binding/bending reaction was crystallized and shows the protein arm like features "latched" in place wrapping the DNA in the minor grooves and intercalating the tips between base pairs at the kink sites. This sequence specific, mostly indirect readout protein-DNA binding/bending interaction is therefore an interesting test case to study the mechanism of protein DNA binding and bending in general. Kinetic schemes have been proposed and numerous experimental studies have been carried out to validate these schemes. Experiments have included rapid kinetics laser T jump studies providing unprecedented temporal resolution and time resolved (quench flow) DNA foot-printing. Here we complement and add to those studies by investigating the mechanism and dynamics of the final latching/initial unlatching at an atomic level. This is accomplished with the computational tools of molecular dynamics and the theory of Milestoning. Our investigation begins by generating a reaction coordinate from the crystal structure of the DNA-protein complex and other images generated through modelling based on biochemical intuition. The initial path is generated by steepest descent minimization providing us with over 100 anchor images along the Steepest Descent Path (SDP) reaction coordinate. We then use the tools of Milestoning to sample hypersurfaces (milestones) between reaction coordinate anchors. Launching multiple trajectories from each milestone allowed us to accumulate average passage times to adjacent milestones and obtain transition probabilities. A complete set of rates was obtained this way allowing us to draw important conclusions about the mechanism of DNA bending. We uncover two possible metastable intermediates in the dissociation unkinking process. The first is an unexpected stable intermediate formed by initial unlatching of the IHF arms accompanied by a complete "psi-0" to "psi+140" conformational change of the IHF arm tip prolines. This unlatching (de-intercalation of the IHF tips from the kink sites) is required for any unkinking to occur. The second intermediate is formed by the IHF protein arms sliding over the DNA phosphate backbone and refolding in the next groove. The formation of this intermediate occurs on the millisecond timescale which is within experimental unkinking rate results. We show that our code optimization and parallelization enhancements allow the entire computational process of these millisecond timescale events in about one month on 10 or less GPU equipped workstations/cluster nodes bringing these studies within reach of researchers that do not have access to supercomputer clusters. / text GPU Molecular dynamics OMP MPI SHAKE CUDA Computational kinetics Milestoning
93	Overlapping Computation and Communication through Offloading in MPI over InfiniBand Inozemtsev, Grigori 30 May 2014 (has links) As the demands of computational science and engineering simulations increase, the size and capabilities of High Performance Computing (HPC) clusters are also expected to grow. Consequently, the software providing the application programming abstractions for the clusters must adapt to meet these demands. Specifically, the increased cost of interprocessor synchronization and communication in larger systems must be accommodated. Non-blocking operations that allow communication latency to be hidden by overlapping it with computation have been proposed to mitigate this problem. In this work, we investigate offloading a portion of the communication processing to dedicated hardware in order to support communication/computation overlap efficiently. We work with the Message Passing Interface (MPI), the de facto standard for parallel programming in HPC environments. We investigate both point-to-point non-blocking communication and collective operations; our work with collectives focuses on the allgather operation. We develop designs for both flat and hierarchical cluster topologies and examine both eager and rendezvous communication protocols. We also develop a generalized primitive operation with the aim of simplifying further research into non-blocking collectives. We propose a new algorithm for the non-blocking allgather collective and implement it using this primitive. The algorithm has constant resource usage even when executing multiple operations simultaneously. We implemented these designs using CORE-Direct offloading support in Mellanox InfiniBand adapters. We present an evaluation of the designs using microbenchmarks and an application kernel that shows that offloaded non-blocking communication operations can provide latency that is comparable to that of their blocking counterparts while allowing most of the duration of the communication to be overlapped with computation and remaining resilient to process arrival and scheduling variations. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2014-05-29 11:55:53.87 MPI offloading high performance computing computer engineering InfiniBand CORE-Direct
94	Towards an MPI-like Framework for Azure Cloud Platform Karamati, Sara 12 August 2014 (has links) Message passing interface (MPI) has been widely used for implementing parallel and distributed applications. The emergence of cloud computing offers a scalable, fault-tolerant, on-demand al-ternative to traditional on-premise clusters. In this thesis, we investigate the possibility of adopt-ing the cloud platform as an alternative to conventional MPI-based solutions. We show that cloud platform can exhibit competitive performance and benefit the users of this platform with its fault-tolerant architecture and on-demand access for a robust solution. Extensive research is done to identify the difficulties of designing and implementing an MPI-like framework for Azure cloud platform. We present the details of the key components required for implementing such a framework along with our experimental results for benchmarking multiple basic operations of MPI standard implemented in the cloud and its practical application in solving well-known large-scale algorithmic problems. High-performance computing Windows Azure MPI Cloud computing
95	Objektorientierte parallele Ein-, Ausgabe auf Höchstleistungsrechnern / Pinkenburg, Simon. January 1900 (has links) Zugl.: Tübingen, Universiẗat, Diss., 2006.
96	Analyse und Optimierung der Softwareschichten von wissenschaftlichen Anwendungen für Metacomputing Keller, Rainer, January 2008 (has links) Stuttgart, Univ., Diss., 2008.
97	Verfahren und Werkzeuge zur Leistungsmessung, -analyse und -bewertung der Ein-, Ausgabeeinheiten von Rechensystemen Versick, Daniel January 2009 (has links) Zugl.: Rostock, Univ., Diss., 2009
98	Evaluation of publicly available barrier-algorithms and improvement of the barrier-operation for large-scale cluster-systems with special attention on infiniBand networks Höfler, Torsten. January 2005 (has links) Chemnitz, Techn. Univ., Diplomarb., 2005.
99	A Case Study of Semi-Automatic Parallelization of Divide and Conquer Algorithms Using Invasive Interactive Parallelization Hansson, Erik January 2009 (has links) Since computers supporting parallel execution have become more and more common the last years, especially on the consumer market, the need for methods and tools for parallelizing existing sequential programs has highly increased. Today there exist different methods of achieving this, in a more or less user friendly way. We have looked at one method, Invasive Interactive Parallelization (IIP), on a special problem area, divide and conquer algorithms, and performed a case study. This case study shows that by using IIP, sequential programs can be parallelized both for shared and distributed memory machines. We have focused on parallelizing Quick Sort for OpenMP and MPI environment using a tool, Reuseware, which is based on the concepts of Invasive Software Composition. parallelization IIP ISC HPC OpenMP MPI Computer Sciences Datavetenskap (datalogi)
100	Um cluster de PCs usando nós baseados em módulos aceleradores de hardware (FPGA) como co-processadores Wanderley Pimentel Araujo, Rodrigo 31 January 2010 (has links) Made available in DSpace on 2014-06-12T15:58:17Z (GMT). No. of bitstreams: 2 arquivo3450_1.pdf: 2428220 bytes, checksum: 164a34bb1ebc71c885503d9ef049987d (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2010 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / A criação de novas soluções para aumentar o desempenho de aplicações está crescendo de importância, pois os processamentos convencionais estão se tornando obsoletos. Diferentes abordagens têm sido estudadas e usadas, porém vários problemas foram encontrados. Um exemplo é dos processadores com vários núcleos, que, apesar de dissipar pouca potência, apresentam velocidade de transmissão baixa e pequena largura de banda. Circuitos ASICs apresentam alto desempenho, baixa dissipação de potência, mas possuem um alto custo de engenharia. Na tentativa de conseguir mais altos níveis de aceleração, plataformas que associam o uso de cluster de computadores convencionais com FPGAs têm sido estudadas. Este tipo de plataforma requer o uso de barramentos de alto desempenho para minimizar o gargalo de comunicação entre PC e FPGA, e um comunicador eficiente entre os nós do sistema. Neste trabalho, são vistas as principais características de algumas arquiteturas que utilizam cluster de PCs. Com isto, é proposta uma arquitetura que utiliza FPGA como co‐processador em cada nó do sistema, utilizando a interface MPI para comunicação entre os nós e um device driver, para Linux, que permite transferência em rajada dos dados, através do barramento PCIe. Como estudo de caso, usado para a validação da arquitetura, é implementado a multiplicação de matrizes densas, esta funcionalidade é baseada no nível três da biblioteca BLAS Cluster Computação de alto desempenho FPGA MPI, Device Driver PCIe

Search results