Global ETD Search

101	Nutzung von MPI für parallele FEM-Systeme Grabowsky, L., Ermer, Th., Werner, J. 30 October 1998 (has links) (PDF) Der Standard des Message Passing Interfaces (MPI) stellt dem Entwickler paralleler Anwendungen ein mächtiges Werkzeug zur Verfügung, seine Softwa- re effizient und weitgehend unabhängig von Details des parallelen Systems zu entwerfen. Im Rahmen einer Projektarbeit erfolgte die Umstellung der Kommunikationsbibliothek eines bestehenden FEM-Programmes auf den MPI-Mechanismus. Die Ergebnisse werden in der hier gegebenen Beschreibung der Cubecom-Implementierung zusammengefasst. In einem zweiten Teil dieser Arbeit wird untersucht, auf welchem Wege mit der in MPI verfügbaren Funktionalität auch die Koppelrandkommunikation mit einem einheitlichen und effizienten Verfahren durchgeführt werden kann. Sowohl fuer die Basisimplementierung als auch die MPI-basierte Koppelrandkommunikation wird die Effizienz untersucht und ein Ausblick auf weitere Anwendungsmoeglichkeiten gegeben. FEM MSC 65Y05 MSC 65N30 ddc:004 MPI
102	Scalable applications in a distributed environment Andersson, Filip, Norberg, Simon January 2011 (has links) As the amount of simultaneous users of distributed systems increase, scalability is becoming an important factor to consider during software development. Without sufficient scalability, systems might have a hard time to manage high loads, and might not be able to support a high amount of users. We have determined how scalability can best be implemented, and what extra costs this leads to. Our research is based on both a literature review, where we have looked at what others in the field of computer engineering thinks about scalability, and by implementing a highly scalable system of our own. In the end we came up with a couple of general pointers which can help developers to determine if they should focus on scalable development, and what they should consider if they choose to do so. parallel programming mpi distributed computing Computer Sciences Datavetenskap (datalogi)
103	FPGA acceleration of high performance computing communication middleware Xiong, Qingqing 29 September 2019 (has links) High-Performance Computing (HPC) necessarily requires computing with a large number of nodes. As computing technology progresses, internode communication becomes an ever more critical performance blocker. The execution time of software communication support is generally critical, often accounting for hundreds of times the latency of actual time-of-flight. This software support comes in two types. The first is support for core functions as defined in middleware such as the ubiquitous Message Passing Interface (MPI). Over the last decades this software overhead has been addressed through a number of advances such as eliminating data copies, improving drivers, and bypassing the operating system. However an essential core still remains, including message matching, data marshaling, and handling collective operations. The second type of communication support is for new services not inherently part of the middleware. The most prominent of these is compression; it brings huge savings in transmission time, but much of this benefit is offset by a new level of software overhead. In this dissertation, we address the software overhead in internode communication with elements of the emerging node architectures, which include FPGAs in multiple configurations, including closely coupled hardware support, programmable Network Interface Cards (NICs), and routers with programmable accelerators. While there has been substantial work in offloading communication software into hardware, we advance the state-of-the-art in three ways. The first is to use an emerging hardware model that is, for the first time, both realistic and supportive of very high performance gains. Previous studies (and some products) have relied on hardware models that are either of limited benefit (a NIC processor) or not sustainable (NIC augmented with ASICs). Our hardware model is based on the various emerging CPU-FPGA computing architectures. The second is to improve on previous work. We have found this to be possible through a number of means: taking advantage of configurable hardware, taking advantage of close coupling, and coming up with novel improvements. The third is looking at problems that have been, so far, nearly completely unexplored. One of these is hardware acceleration of application-aware, in-line, lossy compression. In this dissertation, we propose offload approaches and hardware designs for integrated FPGAs to bring down communication latency to ultra-low levels unachievable by today's software/hardware. We focus on improving performance from three aspects: 1) Accelerating middleware semantics within communication routines such as message matching and derived datatypes; 2) Optimizing complex communication routines, namely, collective operations; 3) Accelerating operations vital in new communication services independent of the middleware, such as data compression. % The last aspect is somewhat broader than the others. It is applicable both to HPC communication, but also is vital to broader system functions such as I/O. Computer engineering Communication Compression FPGA HPC Middleware MPI
104	Parallel Sparse Matrix-Matrix Multiplication: A Scalable Solution With 1D Algorithm Hoque, Mohammad Asadul, Raju, Md Rezaul Karim, Tymczak, Christopher John, Vrinceanu, Daniel, Chilakamarri, Kiran 01 January 2015 (has links) This paper presents a novel implementation of parallel sparse matrix-matrix multiplication using distributed memory systems on heterogeneous hardware architecture. The proposed algorithm is expected to be linearly scalable up to several thousands of processors for matrices with dimensions over 106 (million). Our approach of parallelism is based on 1D decomposition and can work for both structured and unstructured sparse matrices. The storage mechanism is based on distributed hash lists, which reduces the latency for accessing and modifying an element of the product matrix, while reducing the overall merging time of the partial results computed by the processors. Theoretically, the time and space complexity of our algorithm is linearly proportional to the total number of non-zero elements in the product matrix C. The results of the performance evaluation show that the algorithm scales much better for sparse matrices with bigger dimensions. The speedup achieved using our algorithm is much better than other existing 1D algorithms. We have been able to achieve about 500 times speedup with only 672 processors. We also identified the impact of hardware architecture on scalability. distributed computing MPI parallel algorithm scalable Sparse matrix
105	Partitioning of Urban Transportation Networks Utilizing Real-World Traffic Parameters for Distributed Simulation in SUMO Ahmed, Md Salman, Hoque, Mohammad A. 27 January 2017 (has links) This paper describes a partitioning algorithm for real-world transportation networks incorporating previously unaccounted parameters like signalized traffic intersection, road segment length, traffic density, number of lanes and inter-partition communication overhead due to the migration of vehicles from one partition to another. We also describe our hypothetical framework for distributed simulation of the partitioned road network on SUMO, where a master controller is currently under development using TraCI APIs and MPI library to coordinate the parallel simulation and synchronization between the sub-networks generated by our proposed algorithm. METIS MPI network partition OSM parallel simulation SUMO TraCI
106	Profiling MPI Primitives in Real-time Using OSU INAM Sankarapandian Dayala Ganesh R, Kamal Raj 07 October 2020 (has links) No description available. Computer Science HPC,MPI,MPIT profiling,MVAPICH,OSU INAM
107	Vcluster: A Portable Virtual Computing Library For Cluster Computing Zhang, Hua 01 January 2008 (has links) Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries. message passing multithreading java mpi pvm Computer Sciences Engineering
108	Using MPI One-Sided Communication for Parallel Sudoku Solving Aili, Henrik January 2023 (has links) This thesis investigates the scalability of parallel Sudoku solving using Donald Knuth’s Dancing Links and Algorithm X with two different MPI communication methods: MPI One-Sided Communication and MPI Send-Receive. The study compares the performance of the two communication approaches and finds that MPI One-Sided Communication exhibits better scalability in terms of speedup and efficiency. The research contributes to the understanding of parallel Sudoku solving and provides insights into the suitability of MPI One-Sided Communication for this task. The results highlight the advantages of using MPI One-Sided Communication over MPI Send-Receive, emphasizing its superior performance in parallel Sudoku solving scenarios. This research lays the foundation for future investigations in distributed computing environments and facilitates advancements in parallel Sudoku solving algorithms. exact cover sudoku parallelization MPI Computer and Information Sciences Data- och informationsvetenskap
109	Zero-Sided Communication Challenges in Implementing Time-Based Channels using the MPI/RT Specification Neelamegam, Jothi P 11 May 2002 (has links) Distributed real-time applications require support from the underlying middleware to meet the strict requirements for jitter, latency, and bandwidth. While most existing middleware standards such as MPI do not support Quality of Service (QoS), the MPI/RT standard supports QoS in addition to striving for high performance. This thesis presents HARE, the first known implementation of a subset of the MPI/RT 1.1 standard with time-driven QoS support. This thesis proves the following hypothesis: It is possible to achieve zero-sided communication (a model of communication characterized by the absence of any explicit per-message transfer calls by any of the participating sides) in a real-time environment using a QoS contract between an application and message-passing middleware. Furthermore, it is shown that the performance and predictability of a time-driven task using zero-sided communication is better than that of a best-effort task. The hypothesis is validated through compact MPI/RT application programs that achieve zero-sided communication. MPI/RT Time-based real-time message-passing
110	Enhancing MPI with modern networking mechanisms in cluster interconnects Yu, Weikuan 12 September 2006 (has links) No description available. Computer Science InfiniBand Myrinet Quadrics MPI Parallel IO RDMA

Search results