• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 71
  • 13
  • 7
  • 5
  • 4
  • 3
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 139
  • 139
  • 32
  • 31
  • 30
  • 27
  • 22
  • 22
  • 20
  • 20
  • 19
  • 19
  • 18
  • 18
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

AutoPilot: A Message-Passing Parallel Programming Library for the IMAPCAR2

Kelly, Benjamin 14 March 2013 (has links)
The IMAPCAR2 from Renesas Electronics is an embedded realtime image processor, combining a single core with a 128-way SIMD array. At runtime, sections of the SIMD array can be retasked as additional CPU cores, interconnected via a message passing ring. Using these cores effectively, however, is made difficult by the low-level nature of the message passing API and the lack of cache coherency between processors. Developing and debugging software for this platform is a difficult task. The AutoPilot library addresses this by providing a high-level message-oriented parallel programming model for the IMAPCAR2. AutoPilot's API is closely based on that of Pilot, a wrapper around the Message Passing Interface (MPI) for cluster computing. By reimplementing the Pilot API for the IMAPCAR2, AutoPilot shows that its processes-and-channels architecture is a viable choice for parallel programming on cache-incoherent multicore architectures. At the same time, it provides a simpler API for programmers, with builtin safety checks that eliminate some common sources of errors.
12

IMPROVING MESSAGE-PASSING PERFORMANCE AND SCALABILITY IN HIGH-PERFORMANCE CLUSTERS

RASHTI, Mohammad Javad 26 January 2011 (has links)
High Performance Computing (HPC) is the key to solving many scientific, financial, and engineering problems. Computer clusters are now the dominant architecture for HPC. The scale of clusters, both in terms of processor per node and the number of nodes, is increasing rapidly, reaching petascales these days and soon to exascales. Inter-process communication plays a significant role in the overall performance of HPC applications. With the continuous enhancements in interconnection technologies and node architectures, the Message Passing Interface (MPI) needs to be improved to effectively utilize the modern technologies for higher performance. After providing a background, I present a deep analysis of the user level and MPI libraries over modern cluster interconnects: InfiniBand, iWARP Ethernet, and Myrinet. Using novel techniques, I assess characteristics such as overlap and communication progress ability, buffer reuse effect on latency, and multiple-connection scalability. The outcome highlights some of the inefficiencies that exist in the communication libraries. To improve communication progress and overlap in large message transfers, a method is proposed which uses speculative communication to overlap communication with computation in the MPI Rendezvous protocol. The results show up to 100% communication progress and more than 80% overlap ability over iWARP Ethernet. An adaptation mechanism is employed to avoid overhead on applications that do not benefit from the method due to their timing specifications. To reduce MPI communication latency, I have proposed a technique that exploits the application buffer reuse characteristics for small messages and eliminates the sender-side copy in both two-sided and one-sided MPI small message transfer protocols. The implementation over InfiniBand improves small message latency up to 20%. The implementation adaptively falls back to the current method if the application does not benefit from the proposed technique. Finally, to improve scalability of MPI applications on ultra-scale clusters, I have proposed an extension to the current iWARP standard. The extension improves performance and memory usage for large-scale clusters. The extension equips Ethernet with an efficient zero-copy, connection-less datagram transport. The software-level evaluation shows more than 40% performance benefits and 30% memory usage reduction for MPI applications on a 64-core cluster. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2010-10-16 12:25:18.388
13

Analyse und Optimierung der Softwareschichten von wissenschaftlichen Anwendungen für Metacomputing

Keller, Rainer, January 2008 (has links)
Stuttgart, Univ., Diss., 2008.
14

COMPRESSIVE PARAMETER ESTIMATION VIA APPROXIMATE MESSAGE PASSING

Hamzehei, Shermin 08 April 2020 (has links)
The literature on compressive parameter estimation has been mostly focused on the use of sparsity dictionaries that encode a discretized sampling of the parameter space; these dictionaries, however, suffer from coherence issues that must be controlled for successful estimation. To bypass such issues with discretization, we propose the use of statistical parameter estimation methods within the Approximate Message Passing (AMP) algorithm for signal recovery. Our method leverages the recently proposed use of custom denoisers in place of the usual thresholding steps (which act as denoisers for sparse signals) in AMP. We introduce the design of analog denoisers that are based on statistical parameter estimation algorithms, and we focus on two commonly used examples: frequency estimation and bearing estimation, coupled with the Root MUSIC estimation algorithm. We first analyze the performance of the proposed analog denoiser for signal recovery, and then link the performance in signal estimation to that of parameter estimation. Numerical experiments show significant improvements in estimation performance versus previously proposed approaches for compressive parameter estimation.
15

A Language-Recognition Approach to Unit Testing Message-Passing Systems

Ubah, Ifeanyi January 2017 (has links)
This thesis addresses the problem of unit testing components in message-passing systems. A message-passing system is one that comprises components communicating with each other solely via the exchange of messages. Testing aids developers in detecting and fixing potential errors and with unit testing in particular, the focus is on independently verifying the correctness of single components, such as functions and methods, in a system whose behavior is well understood. With the aid of unit testing frameworks such as those of the xUnit family, this process can not only be automated and done iteratively, but easily interleaved with the development process, facilitating rapid feedback and early detection of errors in the system. However, such frameworks work in an imperative manner and as such, are unsuitable for verifying message-passing systems where the behavior of a component is encoded in its stream of exchanged messages. In this work, we recognise that similar to streams of symbols in the field of formal languages and abstract machines, one can specify properties of a component’s message stream such that they form a language. Unit testing a component thus becomes the description of an automaton that recognizes such a specified language. We propose a platform-independent, language-recognition approach to creating unit testing frameworks for describing and verifying the behavior of message-passing components, and use this approach in creating a prototype implementation for the Kompics component model. We show that this approach can be used to perform both black box and white box testing of components, and that it is easy to work with while preventing common mistakes in practice.
16

Vcluster: A Portable Virtual Computing Library For Cluster Computing

Zhang, Hua 01 January 2008 (has links)
Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries.
17

Zero-Sided Communication Challenges in Implementing Time-Based Channels using the MPI/RT Specification

Neelamegam, Jothi P 11 May 2002 (has links)
Distributed real-time applications require support from the underlying middleware to meet the strict requirements for jitter, latency, and bandwidth. While most existing middleware standards such as MPI do not support Quality of Service (QoS), the MPI/RT standard supports QoS in addition to striving for high performance. This thesis presents HARE, the first known implementation of a subset of the MPI/RT 1.1 standard with time-driven QoS support. This thesis proves the following hypothesis: It is possible to achieve zero-sided communication (a model of communication characterized by the absence of any explicit per-message transfer calls by any of the participating sides) in a real-time environment using a QoS contract between an application and message-passing middleware. Furthermore, it is shown that the performance and predictability of a time-driven task using zero-sided communication is better than that of a best-effort task. The hypothesis is validated through compact MPI/RT application programs that achieve zero-sided communication.
18

DME/P critical area determination and its implementation on message-passing processor

Rajendran, Jaikishan January 1992 (has links)
No description available.
19

High-Performance Multi-Transport MPI Design for Ultra-Scale InfiniBand Clusters

Koop, Matthew J. 03 September 2009 (has links)
No description available.
20

Calcul haute performance pour la simulation d'interactions fluide-structure / High performance computing for the simulation of fluid-structure interactions

Partimbene, Vincent 25 April 2018 (has links)
Cette thèse aborde la résolution des problèmes d'interaction fluide-structure par un algorithme consistant en un couplage entre deux solveurs : un pour le fluide et un pour la structure. Pour assurer la cohérence entre les maillages fluide et structure, on considère également une discrétisation de chaque domaine par volumes finis. En raison des difficultés de décomposition du domaine en sous-domaines, nous considérons pour chaque environnement un algorithme parallèle de multi-splitting (ou multi-décomposition) qui correspond à une présentation unifiée des méthodes de sous-domaines avec ou sans recouvrement. Cette méthode combine plusieurs applications de points fixes contractantes et nous montrons que, sous des hypothèses appropriées, chaque application de points fixes est contractante dans des espaces de dimensions finies normés par des normes hilbertiennes et non-hilbertiennes. De plus, nous montrons qu'une telle étude est valable pour les résolutions parallèles synchrones et plus généralement asynchrones de grands systèmes linéaires apparaissant lors de la discrétisation des problèmes d'interaction fluide-structure et peut être étendue au cas où le déplacement de la structure est soumis à des contraintes. Par ailleurs, nous pouvons également considérer l’analyse de la convergence de ces méthodes de multi-splitting parallèles asynchrones par des techniques d’ordre partiel, lié au principe du maximum discret, aussi bien dans le cadre linéaire que dans celui obtenu lorsque les déplacements de la structure sont soumis à des contraintes. Nous réalisons des simulations parallèles pour divers cas test fluide-structure sur différents clusters, en considérant des communications bloquantes et non bloquantes. Dans ce dernier cas nous avons eu à résoudre une difficulté d'implémentation dans la mesure où une erreur irrécupérable survenait lors de l'exécution ; cette difficulté a été levée par introduction d’une méthode assurant la terminaison de toutes les communications non bloquantes avant la mise à jour du maillage. Les performances des simulations parallèles sont présentées et analysées. Enfin, nous appliquons la méthodologie présentée précédemment à divers contextes d'interaction fluide-structure de type industriel sur des maillages non structurés, ce qui constitue une difficulté supplémentaire. / This thesis deals with the solution of fluid-structure interaction problems by an algorithm consisting in the coupling between two solvers: one for the fluid and one for the structure. In order to ensure the consistency between fluid and structure meshes, we also consider a discretization of each domain by finite volumes. Due to the difficulties of decomposing the domain into sub-domains, we consider a parallel multi-splitting algorithm for each environment which represents a unified presentation of sub-domain methods with or without overlapping. This method combines several contracting fixed point mappings and we show that, under appropriate assumptions, each fixed point mapping is contracting in finite dimensional spaces normalized by Hilbertian and non-Hilbertian norms. In addition, we show that such a study is valid for synchronous parallel solutions and more generally asynchronous of large linear systems arising from the discretization of fluidstructure interaction problems and can be extended to cases where the displacement of the structure is subject to constraints. Moreover, we can also consider the analysis of the convergence of these asynchronous parallel multi-splitting methods by partial ordering techniques, linked to the discrete maximum principle, both in the linear frame and in the one obtained when the structure's displacements are subjected to constraints. We carry out parallel simulations for various fluidstructure test cases on different clusters considering blocking and non-blocking communications. In the latter case, we had to solve an implementation problem due to the fact that an unrecoverable error occurred during execution; this issue has been overcome by introducing a method to ensure the termination of all non-blocking communications prior to the mesh update. Performances of parallel simulations are presented ans analyzed. Finally, we apply the methodology presented above to various fluid-structure interaction cases on unstructured meshes, which represents an additional difficulty.

Page generated in 0.0963 seconds