Spelling suggestions: "subject:"messagepassing"" "subject:"memssagepassing""
21 |
Estudos de algumas ferramentas de coleta e visualiza??o de dados e desempenho de aplica??es paralelas no ambiente MPIFernandes, Cl?udio Ant?nio Costa 23 September 2003 (has links)
Made available in DSpace on 2014-12-17T14:56:04Z (GMT). No. of bitstreams: 1
ClaudioACF.pdf: 1310703 bytes, checksum: 20942a00fb9b1da452758bbafaf1b59d (MD5)
Previous issue date: 2003-09-23 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / The last years have presented an increase in the acceptance and adoption of the parallel processing, as much for scientific computation of high performance as for applications of general intention. This acceptance has been favored mainly for the development of environments with massive parallel processing (MPP - Massively Parallel Processing) and of the distributed computation. A common point between distributed systems and MPPs architectures is the notion of message exchange, that allows the communication between processes. An environment of message exchange consists basically of a communication library that, acting as an extension of the programming languages that allow to the elaboration of applications parallel, such as C, C++ and Fortran. In the development of applications parallel, a basic aspect is on to the analysis of performance of the same ones. Several can be the metric ones used in this analysis: time of execution, efficiency in the use of the processing elements, scalability of the application with respect to the increase in the number of processors or to the increase of the instance of the treat problem. The establishment of models or mechanisms that allow this analysis can be a task sufficiently complicated considering parameters and involved degrees of freedom in the implementation of the parallel application. An joined alternative has been the use of collection tools and visualization of performance data, that allow the user to identify to points of strangulation and sources of inefficiency in an application. For an efficient visualization one becomes necessary to identify and to collect given relative to the execution of the application, stage this called instrumentation. In this work it is presented, initially, a study of the main techniques used in the collection of the performance data, and after that a detailed analysis of the main available tools is made that can be used in architectures parallel of the type to cluster Beowulf with Linux on X86 platform being used libraries of communication based in applications MPI - Message Passing Interface, such as LAM and MPICH. This analysis is validated on applications parallel bars that deal with the problems of the training of neural nets of the type perceptrons using retro-propagation. The gotten conclusions show to the potentiality and easinesses of the analyzed tools. / Os ?ltimos anos t?m apresentado um aumento na aceita??o e ado??o do processamento paralelo, tanto para computa??o cient?fica de alto desempenho como para aplica??es de prop?sito geral. Essa aceita??o tem sido favorecida principalmente pelo desenvolvimento dos ambientes com processamento maci?amente paralelo (MPP - Massively Parallel Processing) e da computa??o distribu?da. Um ponto comum entre sistemas distribu?dos e arquiteturas MPPs ? a no??o de troca de mensagem, que permite a comunica??o entre processos. Um ambiente de troca de mensagem consiste basicamente de uma biblioteca de comunica??o que, atuando como uma extens?o das linguagens de programa??o, permite a elabora??o de aplica??es paralelas, tais como C, C++ e Fortran. No desenvolvimento de aplica??es paralelas, um aspecto fundamental esta ligado ? an?lise de desempenho das mesmas. V?rias podem ser as m?tricas utilizadas nesta an?lise: tempo de execu??o, efici?ncia na utiliza??o dos elementos de processamento, escalabilidade da aplica??o com respeito ao aumento no n?mero de processadores ou ao aumento da inst?ncia do problema tratado. O estabelecimento de modelos ou mecanismos que permitam esta an?lise pode ser uma tarefa bastante complicada considerando-se par?metros e graus de liberdade envolvidos na implementa??o da aplica??o paralela. Uma alternativa encontrada tem sido a utiliza??o de ferramentas de coleta e visualiza??o de dados de desempenho, que permitem ao usu?rio identificar pontos de estrangulamento e fontes de inefici?ncia em uma aplica??o. Para uma visualiza??o eficiente torna-se necess?rio identificar e coletar dados relativos ? execu??o da aplica??o, etapa esta denominada instrumenta??o. Neste trabalho ? apresentado, inicialmente, um estudo das principais t?cnicas utilizadas na coleta dos dados de desempenho, e em seguida ? feita uma an?lise detalhada das principais ferramentas dispon?veis que podem ser utilizadas em arquiteturas paralelas do tipo Cluster Beowulf com Linux sobre plataforma X86 utilizando bibliotecas de comunica??o baseadas em aplica??es MPI - Message Passing Interface, tais como LAM e MPICH . Esta an?lise ? validada sobre aplica??es paralelas que tratam do problema do treinamento de redes neurais do tipo perceptrons usando retropropaga??o. As conclus?es obtidas mostram as potencialidade e facilidades das ferramentas analisadas.
|
22 |
Effective Bayesian inference for sparse factor analysis modelsSharp, Kevin John January 2011 (has links)
We study how to perform effective Bayesian inference in high-dimensional sparse Factor Analysis models with a zero-norm, sparsity-inducing prior on the model parameters. Such priors represent a methodological ideal, but Bayesian inference in such models is usually regarded as impractical. We test this view. After empirically characterising the properties of existing algorithmic approaches, we use techniques from statistical mechanics to derive a theory of optimal learning in the restricted setting of sparse PCA with a single factor. Finally, we describe a novel `Dense Message Passing' algorithm (DMP) which achieves near-optimal performance on synthetic data generated from this model.DMP exploits properties of high-dimensional problems to operate successfully on a densely connected graphical model. Similar algorithms have been developed in the statistical physics community and previously applied to inference problems in coding and sparse classification. We demonstrate that DMP out-performs both a newly proposed variational hybrid algorithm and two other recently published algorithms (SPCA and emPCA) on synthetic data while it explains at least the same amount of variance, for a given level of sparsity, in two gene expression datasets used in previous studies of sparse PCA.A significant potential advantage of DMP is that it provides an estimate of the marginal likelihood which can be used for hyperparameter optimisation. We show that, for the single factor case, this estimate exhibits good qualitative agreement both with theoretical predictions and with the hyperparameter posterior inferred by a collapsed Gibbs sampler. Preliminary work on an extension to inference of multiple factors indicates its potential for selecting an optimal model from amongst candidates which differ both in numbers of factors and their levels of sparsity.
|
23 |
Návrh komunikačního protokolu pro generické simulátory mikroprocesorů / Design of Communication Protocol for Generic Simulators of MicroprocessorsMoskovčák, Jiří Unknown Date (has links)
This work concerns about designing of communication protocol for generic processor simulator. The main objective of this work was to design a communication protocol which allows to simulate multiprocessor system on a cluster of computers.
|
24 |
A Message Oriented Middleware LibraryKuhlman, Christopher James 01 January 2007 (has links)
A message oriented middleware inter-process communication library called Nora has been designed, constructed, and validated. The library is written in C++. The middleware is designed to bridge two of the main messaging standards, the Message Passing Interface (MPI) and the Data Distribution Service (DDS), by enabling communications for (1) computationally intensive distributed systems that typically follow a master-slave design and (2) general data distribution. The design is original and does not borrow from either specification. The library can be statically linked to application code so that the library is part of each application in a distributed system. The implementation for master-slave messaging has not yet been completed, but the great majority of the work is done; the general data distribution model has been fully implemented. The design is critically evaluated.A key aspect of the library is configurability. Various characteristics of the messaging library, such as the number of message producer and consumer threads, the message types serviced by each thread, the types of communication mechanisms, and others are specified through a configuration file. Consequently, the library has only to be built once for all applications in a distributed system and communications for each application are tailored through a unique configuration file. The library application programmer interface (API) is structured so that communications details can be isolated from the application code and therefore applications are not affected by changes to the IPC configuration.Beyond its use for the two classifications of problems listed above, it is also suited for use by system architects that are investigating resource requirements and designs for new systems because applications can be reconfigured quickly for different communications behavior on different platforms through the configuration file. Thus, it is useful for prototyping and performance evaluation.
|
25 |
Resource Optimization of MPSoC for Industrial Use-casesKågesson, Filip, Cederbom, Simon January 2019 (has links)
Today’s embedded systems require more and more performance but they are still required to meet power constraints. Single processor systems can deliver high performance but this leads to high power consumption. One solution to this problem is to use a multiprocessor system instead which is able to provide high performance and at the same time meet the power constraints. The reason that such a system can meet the power constraints is that it can have a lower clock frequency than a similar single processor system. The focus of the project is to explore possibilities when developing new multiprocessor systems. The project makes a comparison of asymmetric multiprocessing (AMP) systems and symmetric multiprocessing (SMP) systems in terms of task management and communication between the processors. A comparison is made between the Advanced High-performance Bus (AHB) interface and the Advanced eXtensible Interface (AXI). The fixed priority and round-robin arbitration algorithms is also compared. The project also contains a practical part where a demo is developed to show that an inter-processor communication using exclusive access is possible to implement. The theoretical part of the project containing the comparisons result in good comparisons that can be used to get an overview of what to use when developing new Multiprocessor System on Chip (MPSoC) designs. The demo developed in this project failed to meet the requirement of having a fully functional spinlock. This problem can be solved in the future if new hardware is developed. / Dagens inbyggda system kräver mer och mer prestanda men de måste fortfarande klara av kraven kring strömförbrukning. System med en processor kan leverera hög prestanda men detta leder till hög strömförbrukning. En lösning till detta problem är att använda ett multiprocessorsystem istället som klarar av att leverera hög prestanda och samtidigt klara av kraven kring strömförbrukning. Anledningen till att denna typ av system klarar av kraven kring strömförbrukning är att de kan använda en lägre klockfrekvens än ett system med en processor. Fokuset på detta projektet ligger på att utforska möjligheterna som finns när nya multiprocessorsystem ska utvecklas. Projektet gör en jämförelse mellan asymmetriska och symmetriska multiprocessorsystem i termer av uppgiftshantering och kommunikation mellan processorerna. En jämförelse har gjorts mellan Advanced High-Performance Bus (AHB) gränssnittet och Advanced eXtensible Interface (AXI) gränssnittet. Fixed priority och round-robin algoritmerna för hantering av krockar mellan processorerna har också jämförts. Det finns även en praktisk del i projektet där en demo har utvecklats för att visa en fungerande kommunikation mellan processorer som använder funktionaliteten för exklusiv åtkomst till den gemensamma bussen. Den teoretiska delen av projektet som innehåller jämförelserna resulterar i bra jämförelser som kan användas när nya multiprocessorsystem utvecklas. Demon som har utvecklats i detta projekt misslyckades med att klara av kravet kring att ha ett fullt fungerande lås. Detta problemet kan lösas i framtiden ifall ny hårdvara utvecklas.
|
26 |
Application and Further Development of TrueSkill™ Ranking in SportsIbstedt, Julia, Rådahl, Elsa, Turesson, Erik, vande Voorde, Magdalena January 2019 (has links)
The aim of this study was to explore the ranking model TrueSkill™ developed by Microsoft, applying it on various sports and constructing extensions to the model. Two different inference methods for TrueSkill was constructed using Gibbs sampling and message passing. Additionally, the sequential method using Gibbs sampling was successfully extended into a batch method, in order to eliminate game order dependency and creating a fairer, although computationally heavier, ranking system. All methods were further implemented with extensions for taking home team advantage, score difference and finally a combination of the two into consideration. The methods were applied on football (Premier League), ice hockey (NHL), and tennis (ATP Tour) and evaluated on the accuracy of their predictions before each game. On football, the extensions improved the prediction accuracy from 55.79% to 58.95% for the sequential methods, while the vanilla Gibbs batch method reached the accuracy of 57.37%. Altogether, the extensions improved the performance of the vanilla methods when applied on all data sets. The home team advantage performed better than the score difference on both football and ice hockey, while the combination of the two reached the highest accuracy. The Gibbs batch method had the highest prediction accuracy on the vanilla model for all sports. The results of this study imply that TrueSkill could be considered a useful ranking model for other sports as well, especially if tuned and implemented with extensions suitable for the particular sport.
|
27 |
Paralelização de um programa para cálculo de propriedades físicas de impurezas magnéticas em metais. / Parallelization of a program that calculates physical properties of magnetic impurities in metals.Sonoda, Eloiza Helena 10 August 2001 (has links)
Este trabalho se dedica à paralelização de um programa para cálculos de propriedades físicas de ligas magnéticas diluídas. O método do grupo de renormalização aplicado ao modelo de Anderson de duas impurezas se mostrou particularmente adequado ao processamento paralelo visto que grande parte dos cálculos pode ser executada simultaneamente, assim como variações nos conjuntos de dados requeridas pelo método. Para tal reescrevemos o programa seqüencial usado anteriormente pelo Grupo de Física Teórica do IFSC e implementamos três versões paralelas. Essas versões diferem entre si em relação à abordagem dada à paralelização. O uso de clusters de computadores se revelou uma opção conveniente pois verificamos que o limitante no desempenho é o tempo tomado pelos cálculos e não pela comunicação. Os resultados mostram uma grande redução no tempo total de execução, porém deficiências no speedup e escalabilidade devido a problemas de balanceamento de carga. Analisamos esses problemas e sugerimos alternativas para solucioná-los. / This dissertation discuss the parallelization of a program that calculates physical properties of dilute magnetic alloys. The renormalization group method applied to Anderson's two impurities model showed to be specially suitable to parallel processing because a large amount of calculations as well as variations of data entries required by the method can be performed simultaneously. To achieve this we rewrote the sequential program previously used by the Theoretical Physics Group of the IFSC and wrote three parallel versions. These versions differ from each other by the parallelization approach. The use of computer clusters revealed to be an appropriate option because the calculation time is the limiting factor on performance instead of communication time. The results show a good reduction of execution time, but speedup and scalability lack due to load balancing problems. We analyze these problems and suggest possible solutions.
|
28 |
MPI Performance Engineering with the MPI Tools Information InterfaceRamesh, Srinivasan 06 September 2018 (has links)
The desire for high performance on scalable parallel systems is increasing
the complexity and the need to tune MPI implementations. The MPI Tools
Information Interface (MPI T) introduced in the MPI 3.0 standard provides
an opportunity for performance tools and external software to introspect and
understand MPI runtime behavior at a deeper level to detect scalability issues. The
interface also provides a mechanism to fine-tune the performance of the MPI library
dynamically at runtime.
This thesis describes the motivation, design, and challenges involved in
developing an MPI performance engineering infrastructure using MPI T for two performance toolkits — the TAU Performance System, and Caliper. I validate the design of the infrastructure for TAU by developing optimizations
for production and synthetic applications. I show that the MPI T runtime
introspection mechanism in Caliper enables a meaningful analysis of performance
data.
This thesis includes previously published co-authored material.
|
29 |
An Analyzer for Message Passing ProgramsHuang, Yu 01 May 2016 (has links)
Asynchronous message passing systems are fast becoming a common means for communication between devices. Two problems existing in message passing programs are difficult to solve. The first problem, intended or otherwise, is message-race where a receive may match with more than one send in the runtime system. This non-determinism often leads to intermittent and unexpected behavior depending on the resolution of the race. Another problem is deadlock, which is a situation in that each member process of the group is waiting for some member process to communicate with it, but no member is attempting to communicate with it. Detecting if message-race and/or deadlocks exist in a message passing program are both NP-complete. The difficulty of solving the two problems also comes from three factors that complicate the semantics: asynchronous communication, synchronous barrier, and buffering settings including infinite buffering (the system can buffer messages) and zero buffering (the system has no internal buffering). To solve the above problems with complicating factors, this research provides a novel predictive analysis that initializes a concrete execution and then predicts the behavior of other executions that arise from the initial execution. This research starts with Satisfiability Modulo Theories (SMT) based model checking that provides precise analysis for the program behavior. Unfortunately, a precise analysis using SMT does not scale to large programs. As such, the SMT based model checking is combined with heuristic search for witnessing program properties. The heuristic search is efficient in identifying how sends may match with receives in the runtime as it only looks for the match relations for sends and receives in a small searching space initially; the space is increased only if the program property is not witnessed, until all possible match relations for sends and receives reflected in message non-determinism are found. This research also gives a static analysis approach that is scalable as it does not need to analyze the full set of program behaviors; rather, the static analysis only uses polynomial-time algorithms to identify all potential deadlocks in a send-receive templates given a set of pre-defined deadlock patterns. Given the predictive analysis consisting of SMT based model checking with heuristic search and static analysis, this research is able to solve the two problems above. The work in this dissertation also demonstrates that the predictive analysis is more efficient than the existing tools for verifying message passing programs.
|
30 |
Experience with Acore: Implementing GHC with ActorsPalmucci, Jeff, Waldsburger, Carl, Duis, David, Krause, Paul 01 August 1990 (has links)
This paper presents a concurrent interpreter for a general-purpose concurrent logic programming language, Guarded Horn Clauses (GHC). Unlike typical implementations of GHC in logic programming languages, the interpreter is implemented in the Actor language Acore. The primary motivation for this work was to probe the strengths and weaknesses of Acore as a platform for developing sophisticated programs. The GHC interpreter provided a rich testbed for exploring Actor programming methodology. The interpreter is a pedagogical investigation of the mapping of GHC constructs onto the Actor model. Since we opted for simplicity over optimization, the interpreter is somewhat inefficient.
|
Page generated in 0.0787 seconds