Spelling suggestions: "subject:"message passing interface"" "subject:"essage passing interface""
11 |
System Support for Improving the Reliability of MPI Applications and LibrariesChen, Zhezhe 19 December 2013 (has links)
No description available.
|
12 |
Robust Online Trajectory Prediction for Non-cooperative Small Unmanned Aerial VehiclesBadve, Prathamesh Mahesh 21 January 2022 (has links)
In recent years, unmanned aerial vehicles (UAVs) have got a boost in their applications in civilian areas like aerial photography, agriculture, communication, etc. An increasing research effort is being exerted to develop sophisticated trajectory prediction methods for UAVs for collision detection and trajectory planning. The existing techniques suffer from problems such as inadequate uncertainty quantification of predicted trajectories. This work adopts particle filters together with Löwner-John ellipsoid to approximate the highest posterior density region for trajectory prediction and uncertainty quantification. The particle filter is tuned and tested on real-world and simulated data sets and compared with the Kalman filter. A parallel computing approach for particle filter is further proposed. This parallel implementation makes the particle filter faster and more suitable for real-time online applications. / Master of Science / In recent years, unmanned aerial vehicles (UAVs) have got a boost in their applications in civilian areas like aerial photography, agriculture, communication, etc. Over the coming years, the number of UAVs will increase rapidly. As a result, the risk of mid-air collisions grows, leading to property damages and possible loss of life if a UAV collides with manned aircraft. An increasing research effort has been made to develop sophisticated trajectory prediction methods for UAVs for collision detection and trajectory planning. The existing techniques suffer from problems such as inadequate uncertainty quantification of predicted trajectories. This work adopts particle filters, a Bayesian inferencing technique for trajectory prediction. The use of minimum volume enclosing ellipsoid to approximate the highest posterior density region for prediction uncertainty quantification is also investigated. The particle filter is tuned and tested on real-world and simulated data sets and compared with the Kalman filter. A parallel computing approach for particle filter is further proposed. This parallel implementation makes the particle filter faster and more suitable for real-time online applications.
|
13 |
Adjusting Process Count on Demand for Petascale Global OptimizationRadcliffe, Nicholas Ryan 16 January 2012 (has links)
There are many challenges that need to be met before efficient and reliable computation at the petascale is possible. Many scientific and engineering codes running at the petascale are likely to be memory intensive, which makes thrashing a serious problem for many petascale applications. One way to overcome this challenge is to use a dynamic number of processes, so that the total amount of memory available for the computation can be increased on demand. This thesis describes modifications made to the massively parallel global optimization code pVTdirect in order to allow for a dynamic number of processes. In particular, the modified version of the code monitors memory use and spawns new processes if the amount of available memory is determined to be insufficient. The primary design challenges are discussed, and performance results are presented and analyzed. / Master of Science
|
14 |
Programming High-Performance Clusters with Heterogeneous Computing DevicesAji, Ashwin M. 19 May 2015 (has links)
Today's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned.
MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization. / Ph. D.
|
15 |
Implementation of a Hardware-Optimized MPI Library for the SCMP MultiprocessorPoole, Jeffrey Hyatt 16 August 2004 (has links)
As time progresses, computer architects continue to create faster and more complex microprocessors using techniques such as out-of-order execution, branch prediction, dynamic scheduling, and predication. While these techniques enable greater performance, they also increase the complexity and silicon area of the design. This creates larger development and testing times. The shrinking feature sizes associated with newer technology increase wire resistance and signal propagation delays, further complicating large designs. One potential solution is the Single-Chip Message-Passing (SCMP) Parallel Computer, developed at Virginia Tech. SCMP makes use of an architecture where a number of simple processors are tiled across a single chip and connected by a fast interconnection network. The system is designed to take advantage of thread-level parallelism and to keep wire traces short in preparation for even smaller integrated circuit feature sizes.
This thesis presents the implementation of the MPI (Message-Passing Interface) communications library on top of SCMP's hardware communication support. Emphasis is placed on the specific needs of this system with regards to MPI. For example, MPI is designed to operate between heterogeneous systems; however, in the SCMP environment such support is unnecessary and wastes resources. The SCMP network is also designed such that messages can be sent with very low latency, but with cooperative multitasking it is difficult to assure a timely response to messages. Finally, the low-level network primitives have no support for send operations that occur before the receiver is prepared and that functionality is necessary for MPI support. / Master of Science
|
16 |
Molecular Dynamics for Exascale Supercomputers / La dynamique moléculaire pour les machines exascaleCieren, Emmanuel 09 October 2015 (has links)
Dans la course vers l’exascale, les architectures des supercalculateurs évoluent vers des nœuds massivement multicœurs, sur lesquels les accès mémoire sont non-uniformes et les registres de vectorisation toujours plus grands. Ces évolutions entraînent une baisse de l’efficacité des applications homogènes (MPI simple), et imposent aux développeurs l’utilisation de fonctionnalités de bas-niveau afin d’obtenir de bonnes performances.Dans le contexte de la dynamique moléculaire (DM) appliqué à la physique de la matière condensée, les études du comportement des matériaux dans des conditions extrêmes requièrent la simulation de systèmes toujours plus grands avec une physique de plus en plus complexe. L’adaptation des codes de DM aux architectures exaflopiques est donc un enjeu essentiel.Cette thèse propose la conception et l’implémentation d’une plateforme dédiée à la simulation de très grands systèmes de DM sur les futurs supercalculateurs. Notre architecture s’organise autour de trois niveaux de parallélisme: décomposition de domaine avec MPI, du multithreading massif sur chaque domaine et un système de vectorisation explicite. Nous avons également inclus une capacité d’équilibrage dynamique de charge de calcul. La conception orienté objet a été particulièrement étudiée afin de préserver un niveau de programmation utilisable par des physiciens sans altérer les performances.Les premiers résultats montrent d’excellentes performances séquentielles, ainsi qu’une accélération quasi-linéaire sur plusieurs dizaines de milliers de cœurs. En production, nous constatons une accélération jusqu’à un facteur 30 par rapport au code utilisé actuellement par les chercheurs du CEA. / In the exascale race, supercomputer architectures are evolving towards massively multicore nodes with hierarchical memory structures and equipped with larger vectorization registers. These trends tend to make MPI-only applications less effective, and now require programmers to explicitly manage low-level elements to get decent performance.In the context of Molecular Dynamics (MD) applied to condensed matter physics, the need for a better understanding of materials behaviour under extreme conditions involves simulations of ever larger systems, on tens of thousands of cores. This will put molecular dynamics codes among software that are very likely to meet serious difficulties when it comes to fully exploit the performance of next generation processors.This thesis proposes the design and implementation of a high-performance, flexible and scalable framework dedicated to the simulation of large scale MD systems on future supercomputers. We managed to separate numerical modules from different expressions of parallelism, allowing developers not to care about optimizations and still obtain high levels of performance. Our architecture is organized in three levels of parallelism: domain decomposition using MPI, thread parallelization within each domain, and explicit vectorization. We also included a dynamic load balancing capability in order to equally share the workload among domains.Results on simple tests show excellent sequential performance and a quasi linear speedup on several thousands of cores on various architectures. When applied to production simulations, we report an acceleration up to a factor 30 compared to the code previously used by CEA’s researchers.
|
17 |
A Java Founded LOIS-framework and the Message Passing Interface? : An Exploratory Case StudyStrand, Christian January 2006 (has links)
<p>In this thesis project we have successfully added an MPI extension layer to the LOIS framework. The framework defines an infrastructure for executing and connecting continuous stream processing applications. The MPI extension provides the same amount of stream based data as the framework’s original transport. We assert that an MPI-2 compatible implementation can be a candidate to extend the given framework with an adaptive and flexible communication sub-system. Adaptability is required since the communication subsystem has to be resilient to changes, either due to optimizations or system requirements.</p>
|
18 |
A Java Founded LOIS-framework and the Message Passing Interface? : An Exploratory Case StudyStrand, Christian January 2006 (has links)
In this thesis project we have successfully added an MPI extension layer to the LOIS framework. The framework defines an infrastructure for executing and connecting continuous stream processing applications. The MPI extension provides the same amount of stream based data as the framework’s original transport. We assert that an MPI-2 compatible implementation can be a candidate to extend the given framework with an adaptive and flexible communication sub-system. Adaptability is required since the communication subsystem has to be resilient to changes, either due to optimizations or system requirements.
|
19 |
Large-Message Nonblocking Allgather and Broadcast Offload via BlueField-2 DPUSarkauskas, Nicholas Robert 09 August 2022 (has links)
No description available.
|
20 |
Development and application of an enhanced sampling molecular dynamics method to the conformational exploration of biologically relevant moleculesAlibay, Irfan January 2017 (has links)
This thesis describes the development a new swarm-enhanced sampling methodology and its application to the exploration of biologically relevant molecules. First, the development of a new multi-dimensional swarm-enhanced sampling molecular dynamics (msesMD) approach is detailed. Relative to the original swarm-enhanced sampling molecular dynamics (sesMD) methodology, the msesMD method demonstrates improved parameter transferability, resulting in more extensive sampling when scaling to larger systems such as alanine heptapeptide. The implementation and optimisation of the swarm-enhanced sampling algorithms in the AMBER software suite are also described. Through the use of the newer pmemd molecular dynamics (MD) engine and asynchronous MPI routines, speedups of up to three times the original sesMD implementation were achieved. The msesMD method is then applied to the investigation of carbohydrates, first looking at rare conformational changes in Lewis oligosaccharides. Validating against multi-microsecond unbiased MD trajectories and other enhanced sampling methods, the msesMD simulations identified rare conformational changes leading to the adoption of non-canonical unstacked core trisaccharide structures. Next, the use of msesMD as a tool to probe pyranose ring pucker events is explored. Evaluating against four benchmark monosaccharide systems, msesMD simulations accurately recover puckering details not easily obtained via multi-microsecond unbiased MD. This was followed by an exploration of the impact of ring substituents on conformation in glycosaminoglycan monosaccharides: through msesMD simulations, the influence of specific sulfation patterns were explored, finding that in some cases, such as 4-O-sulfation in N-acetyl-galactosamine, large changes in the relative stability of ring conformers can arise. Finally, the msesMD method was coupled with a thermodynamic integration scheme and used to evaluate solvation free energies for small molecule systems. Comparing against independent trajectory TI simulations, it was found that although the correct solvation free energies were obtained, the msesMD based method did not offer an advantage over unbiased MD for these small molecule systems. However, interesting discrepancies in free energy estimates arising from the use of hydrogen mass repartitioning were found.
|
Page generated in 0.1389 seconds