• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 164
  • 57
  • 44
  • 17
  • 15
  • 11
  • 10
  • 6
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 382
  • 110
  • 90
  • 80
  • 66
  • 63
  • 61
  • 56
  • 51
  • 43
  • 42
  • 41
  • 39
  • 37
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Razvoj serijskog i paralelnog algoritma za računanje elektronske strukture materijala metodom sklapanja naelektrisanja / Development of Serial and Parallel Algorithms forComputing the Electronic Structure of MaterialsUsing the Charge Patching Method

Bodroški Žarko 04 November 2020 (has links)
<p>U tezi je predstavljena implementacija metode teorija funkcionala gustine (DFT) bazirana na metodi za sklapanje naelektrisanja (CPM) koja koristi bazise gausijanskih funkcija. Metod je baziran na pretpostavci da se elektronska gustina naelektrisanja velikih sistema, može predstaviti kao suma doprinosa pojedinačnih atoma, takozvanih motiva gustine naelektrisanja, koji se dobijaju računanjem malog prototip sistema. Talasna funkcija,<br />kao i gustina naelektrisanja, se u na&scaron;oj implementaciji reprezentuju uz pomoć bazise gausijanskih funkcija, dok se motivi opisuju kori&scaron;ćenjem prostornih koordinata. Uz pomoć procedure za minimizaciju se iz motiva opisanih koordinatama, dobija gustina naelektrisanja predstavljena u bazisu Gausijana. Implementacija serijskog programa pokazuje značajno pobolj&scaron;anje u performansama u odnosu na prethodne implementacije bazirane na ravnim talasima. Ova implementacija re&scaron;ava sistem od približno 1000 atoma na jednom procesorskom jezgru za svega nekoliko sati. Paralelna implementacija uz pomoć naprednih metoda paralelizacije i distribucije podataka omogućava re&scaron;avanje sistema od vi&scaron;e desetina hiljada atoma. Najveći testirani sistem ima približno<br />20000 atoma i testiran je na 256 paralelnih procesa.</p> / <p>We present the implementation of the density functional theory (DFT) based charge patching method (CPM) using the basis of Gaussian functions. The method is based on the assumption that the electronic charge density of a large system is the sum of contributions of individual atoms, so called charge density motifs, that are obtained from calculations of small prototype systems.In our implementation wave functions and electronic charge density are represented using the basis of Gaussian functions, while charge density motifs are represented using a real space grid. A constrained minimization procedure is used to obtain Gaussian basis representation of charge density from real space representation of motifs. The code based on this&nbsp; implementation exhibits superior performance in comparison to previous implementation of the charge patching method using the basis of plane waves. It enables calculations of electronic structure of systems with around 1000 atoms on a single CPU core with computational time of just several hours. The parallel implementation enables calculations for the system with more than ten thousand atoms. The largest system tested has around 20000 atoms and was computed on 256 parallel processes.</p>
142

PFFT - An Extension of FFTW to Massively Parallel Architectures

Pippig, Michael January 2012 (has links)
We present a MPI based software library for computing the fast Fourier transforms on massively parallel, distributed memory architectures. Similar to established transpose FFT algorithms, we propose a parallel FFT framework that is based on a combination of local FFTs, local data permutations and global data transpositions. This framework can be generalized to arbitrary multi-dimensional data and process meshes. All performance relevant building blocks can be implemented with the help of the FFTW software library. Therefore, our library offers great flexibility and portable performance. Likewise FFTW, we are able to compute FFTs of complex data, real data and even- or odd-symmetric real data. All the transforms can be performed completely in place. Furthermore, we propose an algorithm to calculate pruned FFTs more efficiently on distributed memory architectures. For example, we provide performance measurements of FFTs of size 512^3 and 1024^3 up to 262144 cores on a BlueGene/P architecture.
143

Runtime MPI Correctness Checking with a Scalable Tools Infrastructure

Hilbrich, Tobias 24 February 2016 (has links) (PDF)
Increasing computational demand of simulations motivates the use of parallel computing systems. At the same time, this parallelism poses challenges to application developers. The Message Passing Interface (MPI) is a de-facto standard for distributed memory programming in high performance computing. However, its use also enables complex parallel programing errors such as races, communication errors, and deadlocks. Automatic tools can assist application developers in the detection and removal of such errors. This thesis considers tools that detect such errors during an application run and advances them towards a combination of both precise checks (neither false positives nor false negatives) and scalability. This includes novel hierarchical checks that provide scalability, as well as a formal basis for a distributed deadlock detection approach. At the same time, the development of parallel runtime tools is challenging and time consuming, especially if scalability and portability are key design goals. Current tool development projects often create similar tool components, while component reuse remains low. To provide a perspective towards more efficient tool development, which simplifies scalable implementations, component reuse, and tool integration, this thesis proposes an abstraction for a parallel tools infrastructure along with a prototype implementation. This abstraction overcomes the use of multiple interfaces for different types of tool functionality, which limit flexible component reuse. Thus, this thesis advances runtime error detection tools and uses their redesign and their increased scalability requirements to apply and evaluate a novel tool infrastructure abstraction. The new abstraction ultimately allows developers to focus on their tool functionality, rather than on developing or integrating common tool components. The use of such an abstraction in wide ranges of parallel runtime tool development projects could greatly increase component reuse. Thus, decreasing tool development time and cost. An application study with up to 16,384 application processes demonstrates the applicability of both the proposed runtime correctness concepts and of the proposed tools infrastructure.
144

Une approche dynamique pour l'optimisation des communications concurrentes sur réseaux hautes performance

Brunet, Elisabeth 08 December 2008 (has links)
Cette thèse cherche à optimiser les communications des applications de calcul intensif s'exécutant sur des grappes de PC. En raison de l'usage massif de processeurs multicoeurs, il est désormais impératif de gérer un grand nombre de flux de communication concurrents. Nous avons mis en évidence et analysé les performances décevantes des solutions actuelles dans un tel contexte. Nous avons ainsi proposé une architecture de communication centrée sur l'arbitrage de l'accès aux matériels. Son originalité réside dans la dissociation de l'activité de l'application de celle des cartes réseaux. Notre modèle exploite l'intervalle de temps introduit entre le dépot des requêtes de communication et la disponibilité des cartes réseaux pour appliquer des optimisations de manière opportuniste. NewMadeleine implémente ce concept et se révèle capable d'exploiter les réseaux les plus performants du moment. Des tests synthétiques et portages d'implémentations caractéristiques de MPI ont permis de valider l'architecture proposée. / The aim of this thesis is to optimize the communications of high performance applications, in the context of clusters computing. Given the massive use of multicore architectures, it is now crucial to handle a large number of concurrent communication flows. We highlighted and analyzed the shortcomings of existing solutions. We therefore designed a new way to schedule communication flows by focusing on the activity of the network cards. Its novelty consists in untying the activity of applications from that of the network cards. Our model takes advantage of the delay that exists between the deposal of the communication requests and the moment when the network cards become idle in order to apply some opportunistic optimizations. NewMadeleine implements this model, thus making possible to exploit last generation high speed networks. The approach of NewMadeleine is not only validated by synthetical tests but also by real applications.
145

Exploitation efficace des architectures parallèles de type grappes de NUMA à l’aide de modèles hybrides de programmation

Clet-Ortega, Jérôme 18 April 2012 (has links)
Les systèmes de calcul actuels sont généralement des grappes de machines composés de nombreux processeurs à l'architecture fortement hiérarchique. Leur exploitation constitue le défi majeur des implémentations de modèles de programmation tels MPI ou OpenMP. Une pratique courante consiste à mélanger ces deux modèles pour bénéficier des avantages de chacun. Cependant ces modèles n'ont pas été pensés pour fonctionner conjointement ce qui pose des problèmes de performances. Les travaux de cette thèse visent à assister le développeur dans la programmation d'application de type hybride. Il s'appuient sur une analyse de la hiérarchie architecturale du système de calcul pour dimensionner les ressources d'exécution (processus et threads). Plutôt qu'une approche hybride classique, créant un processus MPI multithreadé par noeud, nous évaluons de façon automatique des solutions alternatives, avec plusieurs processus multithreadés par noeud, mieux adaptées aux machines de calcul modernes. / Modern computing servers usually consist in clusters of computers with several multi-core CPUs featuring a highly hierarchical hardware design. The major challenge of the programming models implementations is to efficiently take benefit from these servers. Combining two type of models, like MPI and OpenMP, is a current trend to reach this point. However these programming models haven't been designed to work together and that leads to performance issues. In this thesis, we propose to assist the programmer who develop hybrid applications. We lean on an analysis of the computing system architecture in order to set the number of processes and threads. Rather than a classical hybrid approach, that is to say creating one multithreaded MPI process per node, we automatically evaluate alternative solutions, with several multithreaded processes per node, better fitted to modern computing systems.
146

Adsorption dans un milieu carboné lamellaire nanoporeux : simulation Monte Carlo Grand Canonique, synthèse et caractérisation / Adsorption in a slit nanoporous carbon medium : Grand Canonical Monte Carlo simulation, synthesis and characterization

Nguemalieu Kouetcha, Daniella 21 December 2017 (has links)
Les carbones désordonnés nanoporeux sont des supports efficaces pour le piégeage de polluants y compris à l’état de traces dans les eaux usées. Le phénomène d’adsorption à l’origine de la rétention des molécules est cependant complexe car dépendant d’une multitude de facteurs : structure, morphologie et charge de la surface carbonée d’une part,taille/forme et polarité de la molécule d’autre part, l’ensemble étant dépendant du pH et de la concentration. Pourune meilleure compréhension du phénomène, il est important de pouvoir étudier séparément certains paramètres.Dans la perspective d’étudier le phénomène d’adsorption en milieu aqueux sur des carbones nanoporeux à structure et morphologie modèle, des structures lamellaires nanoporeuses de type carbone turbostratique ont été générées numériquement en langage C++ avec le calcul de la fonction de distribution radiale ou de paires. L’adsorption gazeuse d’une molécule non polaire ou polaire puis de deux molécules polaires (H2O/CO2) et (H2O/C6H6O)a été simulée par la méthode Grand Canonique Monte Carlo sur ce support modèle (Isotherme d’adsorption,chaleur d’adsorption, densité des molécules adsorbées) en fonction de la température. Les temps de calcul ont été drastiquement diminués en développant des codes parallèles optimisés sous MPI C++. L’influence de la forme etde la distribution en taille des pores a été mise en évidence en simulant l’adsorption sur la structure d’un carbone activé déjà obtenue par reconstruction 3D de type RMC. Enfin, d’un point de vue expérimental, l’intercalation d’ions tetraalkylammonium par voie électrochimique dans des carbones lamellaires (HOPG et graphite) a été explorée en vue d’obtenir des carbones lamellaires nanoporeux (≈1 nm). La structure a été caractérisée par diffraction des rayons X. / Disordered nanoporous carbons are the good materials for capturing pollutants, including traces in wastewater. The phenomenon of adsorption at the origin of the retention of molecules is complex. However, depending on a multitude of factors : structure, morphology and loading of the carbonaceous surface, on the one hand, size/shapeand polarity of the molecule, on the other hand, the whole being dependent on pH and concentration. For a better understanding of the phenomenon, it is important to be able to study some parameters separately. In order to study the phenomenon of adsorption in aqueous medium on nanoporous carbons with structure and model morphology, nanoporous slit structures of turbostratic carbon type were generated numerically in C ++ language with thecalculation of the radial distribution function or pairs. The gas adsorption of a nonpolar or polar molecule and then oftwo polar molecules (H2O/CO2) and (H2O/C6H6O) was simulated by Grand Canonical Monte Carlo method on this model support (adsorption isotherm, adsorption heat, density of adsorbed molecules) as a function of temperature.The runtime has been drastically reduced by developing parallel codes optimized under MPI C ++. The influence of the shape and the pore size distribution was demonstrated by simulating the adsorption on the structure of an activated carbon already obtained by 3D reconstruction of the RMC type. Finally, from an experimental point of view, the intercalation of tetraalkylammonium ions electrochemically in slit carbons (HOPG and graphite) was explored in order to obtain nanoporous lamellar carbons ( ≈1 nm). The structure was characterized by X-ray diffraction.
147

Measurement of the Underlying Event using track-based event shapes in Z -> ℓ+ℓ− events with ATLAS

Schulz, Holger 13 January 2015 (has links)
Diese Dissertation beschreibt eine Messung von hadronischen Ereignisformvariablen (event shapes) in Protonkollisionen mit einer Schwerpunktsenergie von 7 TeV am Large Hadron Collider (LHC) am CERN (Conseil Europeenne pour la Recherche Nucleaire) bei Genf (Schweiz). Die analysierten Daten mit einer integrierten Luminosität von 1.1 inversen fb wurden im Jahr 2011 mit dem ATLAS Experiment aufgenommen. Für die Analyse wurden solche Ereignisse ausgewählt, in deren harten Streuprozessen ein Z-Boson produziert wurde, welches entweder in ein Elektron-Positron-Paar oder ein Muon-Antimuon-Paar zerfällt. Die Observablen wurden mit sämtlichen rekonstruierten Spuren innerhalb der Akzeptanz des inneren Spurdetektors (Inner Detector) von ATLAS außer denen der Leptonen des Zerfalls des Z-Bosons berechnet. Somit handelt es sich hierbei um die erste Messung dieser Art. Anschließend wurden die Observablen auf Untergrundprozesse mit auf Daten basierenden Methoden korrigiert wobei ein neues Verfahren für die Korrektur des sogenannten Pile-up (Überlagerung mehrerer Proton-Proton Wechselwirkungen) entwickelt und erfolgreich zur Anwedung gebracht wurde. Schließlich wurden die gemessenen Verteilungen entfaltet. Die so erhaltenen Daten sind insbesondere sensitiv auf das sogenannte Underlying Event und können direkt mit Monte-Carlo-Ereignisgeneratoren ohne aufwändige Simulation des ATLAS-Detektors verglichen werden. Abschließend wurde versucht die Modellparameter in den Simulationsprogrammen Pythia8 und Sherpa mithilfe der gewonnenen Daten durch eine bessere Abstimmung (Tuning) zu verbessern. Hierbei zeigte sich, dass das zugrunde liegende Sjostrand-Zijl Modell nicht ausreicht, um eine adäquate Beschreibung der gemessenen Verteilungen zu erreichen. / This thesis describes a measurement of hadron-collider event shapes in proton-proton collisions at a centre of momentum energy of 7 TeV at the Large Hadron Collider (LHC) at CERN (Conseil Europeenne pour la Recherche Nucleaire) located near Geneva (Switzerland). The analysed data (integrated luminosity: 1.1 inverse fb) was recorded in 2011 with the ATLAS-experiment. Events where a Z-boson was produced in the hard sub-process which subsequently decays into an electron-positron or muon-antimuon pair were selected for this analysis. The observables are calculated using all reconstructed tracks of charged particles within the acceptance of the inner detector of ATLAS except those of the leptons of the Z-decay. Thus, this is the first measurement of its kind. The observables were corrected for background processes using data-driven methods. For the correction of so-called pile-up (multiple overlapping proton-proton collisions) a novel technique was developed and successfully applied. The data was further unfolded to correct for remaining detector effects. The obtained distributions are especially sensitive to the so-called Underlying Event and can be compared with predictions of Monte-Carlo event-generators directly, i.e. without the necessity of running time-consuming simulations of the ATLAS-detector. Finally, it was tried to improve the predictions of the event generators Pythia8 and Sherpa by finding an optimised setting of relevant model parameters in a technique called Tuning. It became apparent, however, that the underlying Sjostrand-Zijl model is unable to give a good description of the measured event-shape distributions.
148

Hybrid parallel algorithms for solving nonlinear Schrödinger equation / Hibridni paralelni algoritmi za rešavanje nelinearne Šredingerove jednačine

Lončar Vladimir 17 October 2017 (has links)
<p>Numerical methods and algorithms for solving of partial differential equations, especially parallel algorithms, are an important research topic, given the very broad applicability range in all areas of science. Rapid advances of computer technology open up new possibilities for development of faster algorithms and numerical simulations of higher resolution. This is achieved through paralleliza-tion at different levels that&nbsp; practically all current computers support.</p><p>In this thesis we develop parallel algorithms for solving one kind of partial differential equations known as nonlinear Schr&ouml;dinger equation (NLSE) with a convolution integral kernel. Equations of this type arise in many fields of physics such as nonlinear optics, plasma physics and physics of ultracold atoms, as well as economics and quantitative&nbsp; finance. We focus on a special type of NLSE, the dipolar Gross-Pitaevskii equation (GPE), which characterizes the behavior of ultracold atoms in the state of Bose-Einstein condensation.</p><p>We present novel parallel algorithms for numerically solving GPE for a wide range of modern parallel computing platforms, from shared memory systems and dedicated hardware accelerators in the form of graphics processing units (GPUs), to&nbsp;&nbsp; heterogeneous computer clusters. For shared memory systems, we provide an algorithm and implementation targeting multi-core processors us-ing OpenMP. We also extend the algorithm to GPUs using CUDA toolkit and combine the OpenMP and CUDA approaches into a hybrid, heterogeneous al-gorithm that is capable of utilizing all&nbsp; available resources on a single computer. Given the inherent memory limitation a single&nbsp; computer has, we develop a distributed memory algorithm based on Message Passing Interface (MPI) and previous shared memory approaches. To maximize the performance of hybrid implementations, we optimize the parameters governing the distribution of data&nbsp; and workload using a genetic algorithm. Visualization of the increased volume of output data, enabled by the efficiency of newly developed algorithms, represents a challenge in itself. To address this, we integrate the implementations with the state-of-the-art visualization tool (VisIt), and use it to study two use-cases which demonstrate how the developed programs can be applied to simulate real-world systems.</p> / <p>Numerički metodi i algoritmi za re&scaron;avanje parcijalnih diferencijalnih jednačina, naročito paralelni algoritmi, predstavljaju izuzetno značajnu oblast istraživanja, uzimajući u obzir veoma &scaron;iroku primenljivost u svim oblastima nauke. Veliki napredak informacione tehnologije otvara nove mogućnosti za razvoj bržih al-goritama i&nbsp; numeričkih simulacija visoke rezolucije. Ovo se ostvaruje kroz para-lelizaciju na različitim nivoima koju poseduju praktično svi moderni računari. U ovoj tezi razvijeni su paralelni algoritmi za re&scaron;avanje jedne vrste parcijalnih diferencijalnih jednačina poznate kao nelinearna &Scaron;redingerova jednačina sa inte-gralnim konvolucionim kernelom. Jednačine ovog tipa se javljaju u raznim oblas-tima fizike poput nelinearne optike, fizike plazme i fizike ultrahladnih atoma, kao i u ekonomiji i kvantitativnim finansijama. Teza se bavi posebnim oblikom nelinearne &Scaron;redingerove jednačine, Gros-Pitaevski jednačinom sa dipol-dipol in-terakcionim članom, koja karakteri&scaron;e pona&scaron;anje ultrahladnih atoma u stanju Boze-Ajn&scaron;tajn kondenzacije.<br />U tezi su predstavljeni novi paralelni algoritmi za numeričko re&scaron;avanje Gros-Pitaevski jednačine za &scaron;irok spektar modernih računarskih platformi, od sis-tema sa deljenom memorijom i specijalizovanih hardverskih akceleratora u ob-liku grafičkih procesora, do heterogenih računarskih klastera. Za sisteme sa deljenom memorijom, razvijen je&nbsp; algoritam i implementacija namenjena vi&scaron;e-jezgarnim centralnim procesorima&nbsp; kori&scaron;ćenjem OpenMP tehnologije. Ovaj al-goritam je pro&scaron;iren tako da radi i u&nbsp; okruženju grafičkih procesora kori&scaron;ćenjem CUDA alata, a takođe je razvijen i&nbsp; predstavljen hibridni, heterogeni algoritam koji kombinuje OpenMP i CUDA pristupe i koji je u stanju da iskoristi sve raspoložive resurse jednog računara.<br />Imajući u vidu inherentna ograničenja raspoložive memorije koju pojedinačan računar poseduje, razvijen je i algoritam za sisteme sa distribuiranom memorijom zasnovan na Message Passing Interface tehnologiji i prethodnim algoritmima za sisteme sa deljenom memorijom. Da bi se maksimalizovale performanse razvijenih hibridnih implementacija, parametri koji određuju raspodelu podataka i računskog opterećenja su optimizovani kori&scaron;ćenjem genetskog algoritma. Poseban izazov je vizualizacija povećane količine izlaznih podataka, koji nastaju kao rezultat efikasnosti novorazvijenih algoritama. Ovo je u tezi re&scaron;eno kroz inte-graciju implementacija sa najsavremenijim alatom za vizualizaciju (VisIt), &scaron;to je omogućilo proučavanje dva primera koji pokazuju kako razvijeni programi mogu da se iskoriste za simulacije realnih sistema.</p>
149

Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme

Bordage, Cyril 20 December 2013 (has links)
La méthode multipôle permet d'accélérer les produits matrices-vecteurs, utilisés par les solveurs itératifs pour déterminer le comportement électromagnétique, d'un objet soumis à une onde incidente. Nos travaux ont pour but d'adapter cette méthode pour la rendre efficace sur les architectures hétérogènes contenant des GPU. Pour cela, nous utilisons une ordonnanceur dynamique, StarPU, qui effectuera la distribution des tâches de calcul au sein d'un nœud. Pour la parallélisation en mémoire distribuée, nous effectuerons un ordonnancement statique des boîtes, couplé à un ordonnancement dynamique des interactions proches. / The Fast Multipole Method can speed up matrix-vector products, found in iterative solvers in order to compute the electromagnetics response of an object subject to an incident wave. We have intended to adapt this method to make it effective on heterogeneous architectures with GPUs. For this purpose, we use a dynamic scheduler named StarPU, which distributes the tasks within a node. For the parallelization in distributed memory, we distribute the tasks statically but we distribute the near interactions dynamically..
150

Avaliação de arquitetura paralela para smartphones e tablets utilizando GNU/Linux e MPI para processamento de dados / Evaluation of paralle architecture for smartphones and tablets using GNU/Linux and MPI for data processing

Felipe Sanches Zanoni 06 September 2013 (has links)
A versatilidade e principalmente o poder de processamento dos atuais celulares, mais conhecidos como smartphones, e também dos tablets, impulsionam a computação de algorítimos complexos nesses dispositivos. Cada celular ou tablet pode fazer parte de uma grande rede de computadores com capacidade para processar uma enorme quantidade de dados com um consumo muito baixo de energia se comparado aos computadores convencionais. Um outro grande motivador é que muitos desses celulares possuem o sistema operacional Android, o qual é baseado no núcleo Linux. Isso facilita a migração de projetos já desenvolvidos em computadores para essa nova plataforma móvel, uma vez que é possível executar um sistema operacional completo GNU/Linux nesses dispositivos. O objetivo desse trabalho é avaliar e comparar o desempenho de um cluster híbrido formado por dispositivos móveis e computadores, mostrando que é possível desenvolver aplicativos utilizando MPI para processamento paralelo em celulares da mesma forma com que se desenvolve aplicativos para computadores em GNU/Linux. Os resultados mostram qual é a diferença de poder computacional entre os computadores de arquitetura x86 e os dispositivos móveis de tecnologia ARM. / Versatility and mainly processing power of nowadays smartphones and tablets boost complex algorithms computation on these devices. Each cellphone or tablet can be part of a big computer network capable of processing a huge amount of data with low power consumption compared to personal computers. A great motivator also is that these devices have the Android operational system that is based on Linux kernel, what facilitates developed projects to migrate easily to this new platform. The goal of this project is to evaluate and compare a hybrid cluster performance formed by mobile devices and computers, showing that is possible to develop software using MPI for parallel processing over smartphones and tablets the same way as for GNU/Linux computers. The results show what is the processing power difference between x86 architecture computers and mobile devices using ARM architecture.

Page generated in 0.0422 seconds