Spelling suggestions: "subject:"message passing"" "subject:"essage passing""
111 |
LX-MCAPI : biblioteca de comunicação para suporte a programação paralela em sistemas multi-coreIdeguchi, Antonio Diogo Hidee 12 May 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2016-12-19T10:21:33Z
No. of bitstreams: 1
DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:17Z (GMT) No. of bitstreams: 1
DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:38Z (GMT) No. of bitstreams: 1
DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Made available in DSpace on 2017-01-16T18:00:48Z (GMT). No. of bitstreams: 1
DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5)
Previous issue date: 2016-05-12 / Não recebi financiamento / The multi-core processors represent the industry response for the physical barriers encountered during the development of computing processors during the last decades, and brought new advances on computing system performance. The complex superscalar unicore processors with high frequency clocks gave way to processing units with two or more cores in just one encapsulation, generally with low clock frequencies, allowing one or more execution threads per core. On this context, the existing programming models using serial and
concurrent paradigms do not allow exploring the real potential provided by the new hardware
elements incorporated, generating a necessity of new programming methodologies that does allow exploring parallelism aggregated by the use of multi-core processors. This work presents LX-MCAPI, a library based on modern IPC (Inter-Process Communication) and memory sharing mechanisms, developed over the hypothesis that message passing is a
viable, flexible and scalable abstraction, compared to conventional programming methods using shared-memory on multi-core systems. LX-MCAPI offers a message-passing, zerocopy memory sharing mechanism between processes and ready to use scalability patterns to facilitate the process of abstraction and construction of applications. It has performed well in therms of transmission latency and transfer rate on x86-64 and ARM environments. / Os processadores multi-core representaram a resposta da indústria às barreiras físicas encontradas no desenvolvimento de processadores computacionais nas últimas décadas, e trouxeram novo fôlego ao avanço do desempenho de sistemas computacionais. Os complexos processadores superescalares de núcleo único com frequências de clock relativamente altas deram espaço a unidades de processamento com dois ou mais núcleos em um mesmo encapsulamento, geralmente mais “lentos”, possibilitando uma ou mais threads por núcleo. Nesse contexto, os modelos de programação existentes utilizando os paradigmas sequencial e concorrente não permitiam a exploração do potencial real proporcionado pelos novos elementos de hardware introduzidos, gerando uma necessidade de criação de novas metodologias
de programação que permitissem tirar proveito do paralelismo agregado à utilização dos processadores multi-core. Este trabalho apresenta a LX-MCAPI, biblioteca baseada
em mecanismos modernos de IPC (Inter-Process Communication) e compartilhamento de memória, desenvolvida sobre a hipótese em que a passagem de mensagens é uma abstração viável, flexível e escalável, quando comparada a métodos de programação convencionais utilizando memória-compartilhada em sistemas multi-core. LX-MCAPI oferece um mecanismo de passagem de mensagem e compartilhamento zero-copy de memória entre processos, além de padrões de programação paralela prontos para uso, que facilitam o processo de abstração e construção de aplicações. Além disso, apresentando bom desempenho em termos de latências de transmissão e taxas de transferência em ambientes x86-64 e ARM.
|
112 |
Développements du modèle adjoint de la différentiation algorithmique destinés aux applications intensives en calcul / Extensions of algorithmic differentiation by source transformation inspired by modern scientific computingTaftaf, Ala 17 January 2017 (has links)
Le mode adjoint de la Différentiation Algorithmique (DA) est particulièrement intéressant pour le calcul des gradients. Cependant, ce mode utilise les valeurs intermédiaires de la simulation d'origine dans l'ordre inverse à un coût qui augmente avec la longueur de la simulation. La DA cherche des stratégies pour réduire ce coût, par exemple en profitant de la structure du programme donné. Dans ce travail, nous considérons d'une part le cas des boucles à point-fixe pour lesquels plusieurs auteurs ont proposé des stratégies adjointes adaptées. Parmi ces stratégies, nous choisissons celle de B. Christianson. Nous spécifions la méthode choisie et nous décrivons la manière dont nous l'avons implémentée dans l'outil de DA Tapenade. Les expériences sur une application de taille moyenne montrent une réduction importante de la consommation de mémoire. D'autre part, nous étudions le checkpointing dans le cas de programmes parallèles MPI avec des communications point-à-point. Nous proposons des techniques pour appliquer le checkpointing à ces programmes. Nous fournissons des éléments de preuve de correction de nos techniques et nous les expérimentons sur des codes représentatifs. Ce travail a été effectué dans le cadre du projet européen ``AboutFlow'' / The adjoint mode of Algorithmic Differentiation (AD) is particularly attractive for computing gradients. However, this mode needs to use the intermediate values of the original simulation in reverse order at a cost that increases with the length of the simulation. AD research looks for strategies to reduce this cost, for instance by taking advantage of the structure of the given program. In this work, we consider on one hand the frequent case of Fixed-Point loops for which several authors have proposed adapted adjoint strategies. Among these strategies, we select the one introduced by B. Christianson. We specify further the selected method and we describe the way we implemented it inside the AD tool Tapenade. Experiments on a medium-size application shows a major reduction of the memory needed to store trajectories. On the other hand, we study checkpointing in the case of MPI parallel programs with point-to-point communications. We propose techniques to apply checkpointing to these programs. We provide proof of correctness of our techniques and we experiment them on representative CFD codes
|
113 |
Portierbare numerische Simulation auf parallelen ArchitekturenRehm, W. 30 October 1998 (has links)
The workshop ¨Portierbare numerische Simulationen auf parallelen Architekturen¨
(¨Portable numerical simulations on parallel architectures¨) was organized by the Fac-
ulty of Informatics/Professorship Computer Architecture at 18 April 1996 and held in
the framework of the Sonderforschungsbereich (Joint Research Initiative) ¨Numerische
Simulationen auf massiv parallelen Rechnern¨ (SFB 393) (¨Numerical simulations on
massiv parallel computers¨) ( http://www.tu-chemnitz.de/~pester/sfb/sfb393.html )
The SFB 393 is funded by the German National Science Foundation (DFG).
The purpose of the workshop was to bring together scientists using parallel computing
to provide integrated discussions on portability issues, requirements and future devel-
opments in implementing parallel software efficiently as well as portable on Clusters of
Symmetric Multiprocessorsystems.
I hope that the present paper gives the reader some helpful hints for further discussions
in this field.
|
114 |
Optimizing MPI Collective Communication by Orthogonal StructuresKühnemann, Matthias, Rauber, Thomas, Rünger, Gudula 28 June 2007 (has links)
Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.
|
115 |
Automatic Log Analysis System Integration : Message Bus Integration in a Machine Learning EnvironmentSvensson, Carl January 2015 (has links)
Ericsson is one of the world's largest providers of communications technology and services. Reliable networks are important to deliver services that live up to customers' expectations. Tests are frequently run on Ericsson's systems in order to identify stability problems in their networks. These tests are not always completely reliable. The logs produced by these tests are gathered and analyzed to identify abnormal system behavior, especially abnormal behavior that the tests might not have caught. To automate this analysis process, a machine learning system, called the Awesome Automatic Log Analysis Application (AALAA), is used at Ericsson's Continuous Integration Infrastructure (CII)-department to identify problems within the large logs produced by automated Radio Base Station test loops and processes. AALAA is currently operable in two versions using different distributed cluster computing platforms: Apache Spark and Apache Hadoop. However, it needs improvements in its machine-to-machine communication to make this process more convenient to use. In this thesis, message communication has successfully been implemented in the AALAA system. The result is a message bus deployed in RabbitMQ that is able to successfully initiate model training and abnormal log identification through requests, and to handle a continuous flow of result updates from AALAA. / Ericsson är en av världens största leverantörer av kommunikationsteknologi och tjänster. Tillförlitliga nätverk är viktigt att tillhandahålla för att kunna leverera tjänster som lever upp till kundernas förväntningar. Tester körs därför ofta i Ericssons system med syfte att identifiera stabilitetsproblem som kan uppstå i nätverken. Dessa tester är inte alltid helt tillförlitliga, producerade testloggar samlas därför in och analyseras för att kunna identifiera onormalt beteende som testerna inte lyckats hitta. För att automatisera denna analysprocess har ett maskininlärningssystem utvecklats, Awesome Automatic Log Analysis Application (AALAA). Detta system används i Ericssons Continuous Integration Infrastructure (CII)-avdelning för att identifiera problem i stora loggar som producerats av automatiserade Radio Base Station tester. AALAA är för närvarande funktionellt i två olika versioner av distribuerad klusterberäkning, Apache Spark och Apache Hadoop, men behöver förbättringar i sin maskin-till-maskin-kommunikation för att göra dem enklare och effektivare att använda. I denna avhandling har meddelandekommunikation implementerats som kan kommunicera med flera olika moduler i AALAA. Resultatet är en meddelandebuss implementerad i RabbitMQ som kan initiera träning av modeller och identifiering av onormala loggar på begäran, samt hantera ett kontinuerligt flöde av resultatuppdateringar från pågående beräkningar.
|
116 |
Deep Neural Networks for dictionary-based 5G channel estimation with no ground truth in mixed SNR scenarios / : Djupa neurala nätverk för ordboksbaserad 5G-kanaluppskattning utan sanning i blandade SNR-scenarierFerrini, Matteo January 2022 (has links)
Channel estimation is a fundamental task for exploiting the advantages of massive Multiple-Input Multiple-Output (MIMO) systems in fifth generation (5G) wireless technology. Channel estimates require solving sparse linear inverse problems that is usually performed with the Least Squares method, which brings low complexity but high mean squared error values. Thus other methods are usually needed to obtain better results, on top of Least Squares. Approximate Message Passing (AMP) is an efficient method for solving sparse linear inverse problems and recently a deep neural network approach to quickly solving such problems has been proposed, called Learned Approximate Message Passing (LAMP) [1], which estimates AMP with a fixed number iterations and learnable parameters. We formalize the channel estimation problem as a dictionary-based sparse linear inverse problem and investigate the applicability of LAMP to the task. We build upon the work of Borgerding et al. [1], providing a new loss function to minimize for our dictionary-based problem, we investigate empirically LAMP’s capabilities in various conditions: varying the dataset size, number of subcarriers, depth of network, and signal-to-noise ratio (SNR). We also propose a new network called Adaptive-LAMP which differs from LAMP for the introduction of a small neural network in each layer for estimating certain parameters instead of learning them. Experiments show that LAMP performs significantly better than AMP in terms of NMSE at low signal-to-noise ratio (SNR) levels and worse at high SNR levels. Interestingly, both proposed networks perform well at discovering active paths in cellular networks, paving the way for new approaches to the Channel Estimation problem. / Kanalbedömning är en grundläggande uppgift för att utnyttja fördelarna med massiva MIMO-system (Multiple-Input Multiple-Output) i femte generationens (5G) trådlösa teknik. Kanalskattningar kräver att man löser glesa linjära inversa problem som vanligtvis utförs med Least Squares-metoden, som ger låg komplexitet men höga medelvärden för det kvadratiska felet. Därför behövs vanligtvis andra metoder för att få bättre resultat, utöver Least Squares. Approximate Message Passing (AMP) är en effektiv metod för att lösa sparsamma linjära inversa problem, och nyligen har det föreslagits ett djupt neuralt nätverk för att snabbt lösa sådana problem, kallat Learned Approximate Message Passing (LAMP) [1], som uppskattar AMP med ett fast antal iterationer och inlärningsbara parametrar. Vi formaliserar kanalskattningsproblemet som ett ordboksbaserat sparse linjärt inversproblem och undersöker LAMP:s tillämplighet på uppgiften. Vi bygger på Borgerding et al. [1], som tillhandahåller en ny förlustfunktion att minimera för vårt ordboksbaserade problem, och vi undersöker empiriskt LAMP:s kapacitet under olika förhållanden: vi varierar datasetets storlek, antalet underbärare, nätverkets djup och signal-brusförhållandet (SNR). Vi föreslår också ett nytt nätverk kallat Adaptive-LAMP som skiljer sig från LAMP genom att det införs ett litet neuralt nätverk i varje lager för att uppskatta vissa parametrar i stället för att lära sig dem. Experiment visar att LAMP presterar betydligt bättre än AMP när det gäller NMSE vid låga signal-brusförhållande (SNR) och sämre vid höga SNR-nivåer. Intressant nog presterar båda de föreslagna nätverken bra när det gäller att upptäcka aktiva vägar i cellulära nätverk, vilket banar väg för nya metoder för kanalskattningsproblemet.
|
117 |
MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI LibrarySrivastava, Siddhartha January 2021 (has links)
No description available.
|
118 |
High Performance and Scalable Cooperative Communication Middleware for Next Generation ArchitecturesChakraborty, Sourav 10 October 2019 (has links)
No description available.
|
119 |
Using GPU-aware message passing to accelerate high-fidelity fluid simulations / Användning av grafikprocessormedveten meddelandeförmedling för att accelerera nogranna strömningsmekaniska datorsimuleringarWahlgren, Jacob January 2022 (has links)
Motivated by the end of Moore’s law, graphics processing units (GPUs) are replacing general-purpose processors as the main source of computational power in emerging supercomputing architectures. A challenge in systems with GPU accelerators is the cost of transferring data between the host memory and the GPU device memory. On supercomputers, the standard for communication between compute nodes is called Message Passing Interface (MPI). Recently, many MPI implementations support using GPU device memory directly as communication buffers, known as GPU-aware MPI. One of the most computationally demanding applications on supercomputers is high-fidelity simulations of turbulent fluid flow. Improved performance in high-fidelity fluid simulations can enable cases that are intractable today, such as a complete aircraft in flight. In this thesis, we compare the MPI performance with host memory and GPU device memory, and demonstrate how GPU-aware MPI can be used to accelerate high-fidelity incompressible fluid simulations in the spectral element code Neko. On a test system with NVIDIA A100 GPUs, we find that MPI performance is similar using host memory and device memory, except for intra-node messages in the range of 1-64 KB which is significantly slower using device memory, and above 1 MB which is faster using device memory. We also find that the performance of high-fidelity simulations in Neko can be improved by up to 2.59 times by using GPU-aware MPI in the gather–scatter operation, which avoids several transfers between host and device memory. / Motiverat av slutet av Moores lag så har grafikprocessorer (GPU:er) börjat ersätta konventionella processorer som den huvudsakliga källan till beräkningingskraft i superdatorer. En utmaning i system med GPU-acceleratorer är kostnaden att överföra data mellan värdminnet och acceleratorminnet. På superdatorer är Message Passing Interface (MPI) en standard för kommunikation mellan beräkningsnoder. Nyligen stödjer många MPI-implementationer direkt användning av acceleratorminne som kommunikationsbuffertar, vilket kallas GPU-aware MPI. En av de mest beräkningsintensiva applikationerna på superdatorer är nogranna datorsimuleringar av turbulenta flöden. Förbättrad prestanda i nogranna flödesberäkningar kan möjliggöra fall som idag är omöjliga, till exempel ett helt flygplan i luften. I detta examensarbete jämför vi MPI-prestandan med värdminne och acceleratorminne, och demonstrerar hur GPU-aware MPI kan användas för att accelerera nogranna datorsimuleringar av inkompressibla flöden i spektralelementkoden Neko. På ett testsystem med NVIDIA A100 GPU:er finner vi att MPI-prestandan är liknande med värdminne och acceleratorminne. Detta gäller dock inte för meddelanden inom samma beräkningsnod i intervallet 1-64 KB vilka är betydligt långsammare med acceleratorminne, och över 1 MB vilka är betydligt snabbare med acceleratorminne. Vi finner också att prestandan av nogranna datorsimuleringar i Neko kan förbättras upp till 2,59 gånger genom användning av GPU-aware MPI i den så kallade gather– scatter-operationen, vilket undviker flera överföringar mellan värdminne och acceleratorminne.
|
120 |
Approximate Message Passing Algorithms for Generalized Bilinear InferenceParker, Jason Terry 14 October 2014 (has links)
No description available.
|
Page generated in 0.063 seconds