• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 159
  • 54
  • 44
  • 17
  • 15
  • 11
  • 10
  • 6
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 372
  • 105
  • 85
  • 77
  • 63
  • 61
  • 60
  • 56
  • 49
  • 42
  • 40
  • 39
  • 39
  • 36
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Best Effort MPI/RT as an Alternative to MPI: Design and Performance Comparison

Angadi, Raghavendra 13 December 2002 (has links)
The Real-Time Message Passing Interface (MPI/RT) is an emerging real-time communications middleware standard for distributed real-time applications. The Message Passing Interface (MPI) is the de facto standard for high performance parallel application development. In this thesis, we describe how MPI/RT with best effort quality of service can be used as an alternative for MPI. Mercury Computer Systems' RACE embedded parallel computer is used as the platform for comparison of design and performance of these two standards. The main advantages MPI/RT has over MPI are its explicit support for communication channels and its emphasis on early binding. Design and implementation of best effort MPI/RT on Mercury is described and its performance is compared with MPI in order to illustrate how MPI/RT features allow implementations to exploit the underlying platform more optimally. The results for the benchmarks show that MPI/RT outperforms MPI in almost all cases examined.
2

Scalability-Driven Approaches to Key Aspects of the Message Passing Interface for Next Generation Supercomputing

Zounmevo, Ayi Judicael 23 May 2014 (has links)
The Message Passing Interface (MPI), which dominates the supercomputing programming environment, is used to orchestrate and fulfill communication in High Performance Computing (HPC). How far HPC programs can scale depends in large part on the ability to achieve fast communication; and to overlap communication with computation or communication with communication. This dissertation proposes a new asynchronous solution to the nonblocking Rendezvous protocol used between pairs of processes to transfer large payloads. On top of enforcing communication/computation overlapping in a comprehensive way, the proposal trumps existing network device-agnostic asynchronous solutions by being memory-scalable and by avoiding brute force strategies. Achieving overlapping between communication and computation is important; but each communication is also expected to generate minimal latency. In that respect, the processing of the queues meant to hold messages pending reception inside the MPI middleware is expected to be fast. Currently though, that processing slows down when program scales grow. This research presents a novel scalability-driven message queue whose processing skips altogether large portions of queue items that are deterministically guaranteed to lead to unfruitful searches. For having little sensitivity to program sizes, the proposed message queue maintains a very good performance, on top of displaying a low and flattening memory footprint growth pattern. Due to the blocking nature of its required synchronizations, the one-sided communication model of MPI creates both communication/computation and communication/communication serializations. This research fixes these issues and latency-related inefficiencies documented for MPI one-sided communications by proposing completely nonblocking and non-serializing versions for those synchronizations. The improvements, meant for consideration in a future MPI standard, also allow new classes of programs to be more efficiently expressed in MPI. Finally, a persistent distributed service is designed over MPI to show its impacts at large scales beyond communication-only activities. MPI is analyzed in situations of resource exhaustion, partial failure and heavy use of internal objects for communicating and non-communicating routines. Important scalability issues are revealed and solution approaches are put forth. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2014-05-23 15:08:58.56
3

A PARALLEL APPROACH TOWARDS CORRELATION MEASUREMENT GENE PAIRS WITH TIME-LAGGING EXPRESSION BEHAVIORS

Cao, Xiaopeng 01 December 2009 (has links)
Some current similarity measurement method include Normal Euclidean Distance, Pearson Product-Moment Correlation Coefficient, Spearman's rank correlation coefficient, Z-Score or standard score, Spearman's Footrule distance, Kendall tau rank coefficient, Jaccard similarity coefficient, Cayley's distance, hamming distance etc, since they cannot capture the similarity between genes with arbitrary time-delay and time-gap behavior, a novel algorithm, which enables time-delay alignment and time-gap alignment is proposed and integrated with some of those existing approaches which are local comparisons to fit into the underlying biological context. Time-delay behavior occurs when a gene's expression triggers a delayed expression in its co-regulated or anti-co-regulated peers. In addition, arbitrary time lag also might appear due to experiment error or measurement error. If any gene data has one or both of those condition, the similarity measurement using traditional methods will either under-estimate the similarity or completely miss such relationship. To align the gene data, an alignment algorithm that can be used to align time-delay as well as removing time-gap was developed. Because both Normal Euclidean Distance and Pearson Product-Moment Correlation Coefficient are local comparisons, the algorithm was able to integrate within those two approaches to accommodate the time-delay and time-gap behavior. All of the implementations are done through parallel programming of Message Passing Interface in C by splitting the work load dynamically from a master server to many slave servers in order to speed up the computation process. Synthetic and real microarray data are used to demonstrate the superior of our proposed method. The experimental results show that Normal Euclidean Distance and Pearson Product-Moment Correlation Coefficient with our alignment algorithm perform better in terms of capturing the similarity of more co-regulated or anti-regulated gene pairs. Some improvements such as isolation of experimental conditions, weighted averages and statistical analysis for threshold setting are proposed. Because such time-delay behavior in gene expression pattern is not unusual and usually play important roles in the cell system, the new approach will help scientists to discover important knowledge that otherwise will not be revealed. This approach is sensitive to capturing a wide spectrum of expression patterns, which tends to be ignored by traditional methods. Global comparison algorithms usually have a pre-step that normalize the entire dataset to achieve better result, thus the implemented time-delay and time-gap alignment algorithm will not be effective on the normalized data set. In order to cope with the intensive computing needs of large-scale microarray data, parallel code under message passing interface in C is developed with dynamic work load balancing strategy and executed at a Linux cluster.
4

An Investigation of I/O Strategies for MPI Workloads

Attari, Sanya 19 January 2011 (has links)
Different techniques could be used for improving application performance in parallel systems. Studies have been shown that I/O communication delay is the main reason for different behavior of I/O intensive applications with specific requirements for performance optimization. So, using common strategies, generally defined and effective for computationally intensive applications may not have the same effect on performance improvement for these applications. Moreover, background system configuration effects on the behavior of the application and its performance. Growing use of parallel multi-core systems is an important factor in increasing performance and speeding up the applications. Since changing multi-core systems hardware is not an efficient method in satisfying different expectations of unique application, it is application developer's responsibility to design flexible and scalable code that is compatible with different environments. On the other hand, predicting application behavior and I/O requirements for I/O intensive applications with irregular communication patterns is a complicated and time-consuming task that pushes the problem to runtime impacts. Addressing this issue, we provided an overview on different techniques used for solving this problem. We have studied I/O bound parallel applications that use MPI as the communication method in order to define a general perspective to optimize cost performance ratio. Our designed experiments cover different setups for these applications in order to define various criteria that should be considered in design stage as well as runtime. Moreover, targeting one of the popular I/O intensive applications, we have discussed some possible solutions to speed it up on a multi-core system. / Master of Science
5

Lazy Fault Recovery for Redundant Mpi

Saliba, Elie 01 June 2019 (has links)
Distributed Systems (DS) where multiple computers share a workload across a network, are used everywhere, from data intensive computations to storage and machine learning. DS provide a relatively cheap and efficient solution that allows stability with improved performance for computational intensive applications. In a DS faults and failures are the norm not the exception. At any moment data corruption can occur especially since a DS usually consists of hundred to thousands of units of commodity hardware. The large number and quality of components guarantees, by probability, that at any given time some of the components will not be working and some of them will not recover from failure. DS can experience problems caused by application bugs, operating systems bugs, failures with disks, memory, connectors, networking, power supply, and other components; therefore, constant monitoring and failure detection are fundamental. Automatic recovery must be integral to the system. One of the most commonly used programming languages for DS is Message Passing Interface (MPI). Unfortunately MPI does not support fault detection or recovery. In this thesis, we build a recovery mechanism based on replicas that works on top of the asynchronous fault detection implemented in previous work. Results shows that our recovery implementation is successful and the overhead in execution time is minimal.
6

Erweiterung einer MPI-Umgebung zur Interoperabilität verteilter MPP-Systeme

Gabriel, Edgar. January 1996 (has links)
Stuttgart, Univ., Studienarb., 1996.
7

Lygiagrečių programų efektyvumo tyrimas / Efficiency analysis of parallel programs

Šeinauskas, Vytenis 11 August 2008 (has links)
Šis magistrinis darbas skirtas lygiagrečių programų efektyvumo analizei atlikti, pasinaudojant sukurta lygiagrečių programų efektyvumo tyrimo programine įranga. Pagrindinis darbo tikslas – sukurti, ištirti bei pritaikyti mokymo programinę įrangą, skirtą lygiagrečių programų analizei. Tam tikslui buvo atliekamas sukurtos programos galimybių tyrimas bei suplanuoti ir vykdomi programinės įrangos tobulinimo darbai. Taip pat buvo atliekami pavyzdinių lygiagrečių programų tyrimai, naudojant sukurtą programinę įrangą, norint parodyti lygiagrečių programų efektyvumo tyrimo būdus bei sukurtos lygiagrečių programų efektyvumo tyrimo programinės įrangos galimybes. / Parallel program execution is often used to overcome the constraints of processing speed and memory size when executing complex and time-consuming algorithms. The downside to this approach is the increased overall complexity of programs and their implementations. Parallel execution introduces a new class of software bugs and performance shortcomings, that are usually difficult to trace using traditional methods and tools. Hence, new tools and methods need to be introduced, which deal specifically with problems encountered in parallel programs. The goal of this project is the development of MPI-based parallel program performance monitoring tool and research into the ways this tool can be used for measuring, comparing and improving the performance of target programs.
8

An Efficient Platform for Large-Scale MapReduce Processing

Wang, Liqiang 15 May 2009 (has links)
In this thesis we proposed and implemented the MMR, a new and open-source MapRe- duce model with MPI for parallel and distributed programing. MMR combines Pthreads, MPI and the Google's MapReduce processing model to support multi-threaded as well as dis- tributed parallelism. Experiments show that our model signi cantly outperforms the leading open-source solution, Hadoop. It demonstrates linear scaling for CPU-intensive processing and even super-linear scaling for indexing-related workloads. In addition, we designed a MMR live DVD which facilitates the automatic installation and con guration of a Linux cluster with integrated MMR library which enables the development and execution of MMR applications.
9

Uso de auto-tuning para otimização de decomposição de domínios paralela / Optimizing parallel domain decomposition using auto-tuning

Almeida, Alexandre Vinicius January 2011 (has links)
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos de uma determinada plataforma é uma tarefa que exige conhecimento técnico do ambiente de hardware, uma vez que o software deve explorar detalhes específicos da plataforma em questão. Pelo fato do software ser específico à plataforma, caso ela evolua ou se altere, as otimizações realizadas podem não explorar a nova arquitetura de forma eficiente. Auto-tuners são sistemas que surgiram como um meio automatizado de adaptar um determinado software a uma arquitetura alvo. Essa adaptação ocorre através de uma busca empírica de valores ótimos para parâmetros específicos de uma aplicação, a fim de ajustá-los às características do hardware, ou ainda através da geração de códigofonte otimizado para a plataforma. Este trabalho propõe um módulo auto-tuner orientado à adaptação parametrizada de uma aplicação paralela, que trabalha variando os fatores da dimensão do domínio bidimensional, o número de processos e a extensão das regiões de sobreposição. Para cada variação dos fatores, o auto-tuner testa a aplicação na arquitetura paralela de forma a buscar a combinação de parâmetros com melhor desempenho. Para possibilitar o auto-tuning, foi desenvolvida uma classe em linguagem C++ denominada Mesh, baseada no padrão MPI. A classe busca abstrair a decomposição de domínios de uma aplicação paralela por meio do uso de Orientação a Objetos, e facilita a variação da extensão das regiões de sobreposição entre os subdomínios. Os resultados experimentais demonstraram que o auto-tuner explora o ganho de desempenho pela variação do número de processos da aplicação, que também é tratado pelo módulo auto-tuner. A arquitetura paralela utilizada na validação não se mostrou ideal para uma otimização através do aumento da extensão das regiões sobrepostas entre subdomínios. / Achieving the peak performance level of a particular platform requires technical knowledge of the hardware environment involved, since the software must explore specific details inherent to the hardware. Once the software is optimized for a target platform, if the hardware evolves or is changed, the software probably would not be as efficient in the new environment. This performance portability problem is addressed by software auto-tuning, which emerged in the past decade as an automated technique to adapt a particular software to an underlying hardware. The software adaptation is performed by an auto-tuner. The auto-tuner is an entity that empirically adjusts specific application parameters in order to improve the overall application performance, or even generates source-code optimized for the target platform. This dissertation proposes an auto-tuner to optimize the domain decomposition of a parallel application that performs stencil computations. The proposed auto-tuner works in a parameterized adaptation fashion, and varies the dimensions of a 2D domain, the number of parallel processes and the extension of the overlapping zones between subdomains. For each combination of parameter values, the auto-tuner probes the application in the parallel architecture in order to seek the best combination of values. In order to make auto-tuning possible, it is proposed a C++ class called Mesh, based on the Message Passing Interface (MPI) standard. The role of this class is to abstract the domain decomposition from the application using the Object Orientation facilities provided by C++, and also to enable the extension of the overlapping zones between subdomain. The experimental results showed that the performance gains were mainly due to the variation of the number of processes, which was one of the application factors dealt by the auto-tuner. The parallel architecture used in the experiments showed itself as not adequate for optimizing the domain decomposition by increasing the overlapping zones extension.
10

Uso de auto-tuning para otimização de decomposição de domínios paralela / Optimizing parallel domain decomposition using auto-tuning

Almeida, Alexandre Vinicius January 2011 (has links)
O desenvolvimento de aplicações de forma a atingir níveis de desempenho próximos aos níveis teóricos de uma determinada plataforma é uma tarefa que exige conhecimento técnico do ambiente de hardware, uma vez que o software deve explorar detalhes específicos da plataforma em questão. Pelo fato do software ser específico à plataforma, caso ela evolua ou se altere, as otimizações realizadas podem não explorar a nova arquitetura de forma eficiente. Auto-tuners são sistemas que surgiram como um meio automatizado de adaptar um determinado software a uma arquitetura alvo. Essa adaptação ocorre através de uma busca empírica de valores ótimos para parâmetros específicos de uma aplicação, a fim de ajustá-los às características do hardware, ou ainda através da geração de códigofonte otimizado para a plataforma. Este trabalho propõe um módulo auto-tuner orientado à adaptação parametrizada de uma aplicação paralela, que trabalha variando os fatores da dimensão do domínio bidimensional, o número de processos e a extensão das regiões de sobreposição. Para cada variação dos fatores, o auto-tuner testa a aplicação na arquitetura paralela de forma a buscar a combinação de parâmetros com melhor desempenho. Para possibilitar o auto-tuning, foi desenvolvida uma classe em linguagem C++ denominada Mesh, baseada no padrão MPI. A classe busca abstrair a decomposição de domínios de uma aplicação paralela por meio do uso de Orientação a Objetos, e facilita a variação da extensão das regiões de sobreposição entre os subdomínios. Os resultados experimentais demonstraram que o auto-tuner explora o ganho de desempenho pela variação do número de processos da aplicação, que também é tratado pelo módulo auto-tuner. A arquitetura paralela utilizada na validação não se mostrou ideal para uma otimização através do aumento da extensão das regiões sobrepostas entre subdomínios. / Achieving the peak performance level of a particular platform requires technical knowledge of the hardware environment involved, since the software must explore specific details inherent to the hardware. Once the software is optimized for a target platform, if the hardware evolves or is changed, the software probably would not be as efficient in the new environment. This performance portability problem is addressed by software auto-tuning, which emerged in the past decade as an automated technique to adapt a particular software to an underlying hardware. The software adaptation is performed by an auto-tuner. The auto-tuner is an entity that empirically adjusts specific application parameters in order to improve the overall application performance, or even generates source-code optimized for the target platform. This dissertation proposes an auto-tuner to optimize the domain decomposition of a parallel application that performs stencil computations. The proposed auto-tuner works in a parameterized adaptation fashion, and varies the dimensions of a 2D domain, the number of parallel processes and the extension of the overlapping zones between subdomains. For each combination of parameter values, the auto-tuner probes the application in the parallel architecture in order to seek the best combination of values. In order to make auto-tuning possible, it is proposed a C++ class called Mesh, based on the Message Passing Interface (MPI) standard. The role of this class is to abstract the domain decomposition from the application using the Object Orientation facilities provided by C++, and also to enable the extension of the overlapping zones between subdomain. The experimental results showed that the performance gains were mainly due to the variation of the number of processes, which was one of the application factors dealt by the auto-tuner. The parallel architecture used in the experiments showed itself as not adequate for optimizing the domain decomposition by increasing the overlapping zones extension.

Page generated in 0.0844 seconds