• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 337
  • 189
  • 134
  • 56
  • 45
  • 44
  • 4
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 921
  • 921
  • 921
  • 404
  • 394
  • 351
  • 351
  • 329
  • 325
  • 320
  • 319
  • 316
  • 314
  • 313
  • 313
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

High performance bioinformatics and computational biology on general-purpose graphics processing units

Ling, Cheng January 2012 (has links)
Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology.
112

Dissecting genetic interactions in complex traits

Hemani, Gibran January 2012 (has links)
Of central importance in the dissection of the components that govern complex traits is understanding the architecture of natural genetic variation. Genetic interaction, or epistasis, constitutes one aspect of this, but epistatic analysis has been largely avoided in genome wide association studies because of statistical and computational difficulties. This thesis explores both issues in the context of two-locus interactions. Initially, through simulation and deterministic calculations it was demonstrated that not only can epistasis maintain deleterious mutations at intermediate frequencies when under selection, but that it may also have a role in the maintenance of additive variance. Based on the epistatic patterns that are evolutionarily persistent, and the frequencies at which they are maintained, it was shown that exhaustive two dimensional search strategies are the most powerful approaches for uncovering both additive variance and the other genetic variance components that are co-precipitated. However, while these simulations demonstrate encouraging statistical benefits, two dimensional searches are often computationally prohibitive, particularly with the marker densities and sample sizes that are typical of genome wide association studies. To address this issue different software implementations were developed to parallelise the two dimensional triangular search grid across various types of high performance computing hardware. Of these, particularly effective was using the massively-multi-core architecture of consumer level graphics cards. While the performance will continue to improve as hardware improves, at the time of testing the speed was 2-3 orders of magnitude faster than CPU based software solutions that are in current use. Not only does this software enable epistatic scans to be performed routinely at minimal cost, but it is now feasible to empirically explore the false discovery rates introduced by the high dimensionality of multiple testing. Through permutation analysis it was shown that the significance threshold for epistatic searches is a function of both marker density and population sample size, and that because of the correlation structure that exists between tests the threshold estimates currently used are overly stringent. Although the relaxed threshold estimates constitute an improvement in the power of two dimensional searches, detection is still most likely limited to relatively large genetic effects. Through direct calculation it was shown that, in contrast to the additive case where the decay of estimated genetic variance was proportional to falling linkage disequilibrium between causal variants and observed markers, for epistasis this decay was exponential. One way to rescue poorly captured causal variants is to parameterise association tests using haplotypes rather than single markers. A novel statistical method that uses a regularised parameter selection procedure on two locus haplotypes was developed, and through extensive simulations it can be shown that it delivers a substantial gain in power over single marker based tests. Ultimately, this thesis seeks to demonstrate that many of the obstacles in epistatic analysis can be ameliorated, and with the current abundance of genomic data gathered by the scientific community direct search may be a viable method to qualify the importance of epistasis.
113

HPC scheduling in a brave new world

Gonzalo P., Rodrigo January 2017 (has links)
Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources. / <p>Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.</p>
114

Reactive transport modeling at hillslope scale with high performance computing methods

He, Wenkui 07 December 2016 (has links) (PDF)
Reactive transport modeling is an important approach to understand water dynamics, mass transport and biogeochemical processes from the hillslope to the catchment scale. It has a wide range of applications in the fields of e.g. water resource management, contaminanted site remediation and geotechnical engineering. To simulate reactive transport processes at a hillslope or larger scales is a challenging task, which involves interactions of complex physical and biogeochemical processes, huge computational expenses as well as difficulties in numerical precision and stability. The primary goal of the work is to develop a practical, accurate and efficient tool to facilitate the simulation techniques for reactive transport problems towards hillslope or larger scales. The first part of the work deals with the simulation of water flow in saturated and unsaturated porous media. The capability and accuracy of different numerical approaches were analyzed and compared by using benchmark tests. The second part of the work introduces the coupling of the scientific software packages OpenGeoSys and IPhreeqc by using a character-string-based interface. The accuracy and computational efficiency of the coupled tool were discussed based on three benchmarks. It shows that OGS#IPhreeqc provides sufficient numerical accuracy to simulate reactive transport problems for both equilibrium and kinetic reactions in variably saturated porous media. The third part of the work describes the algorithm of a parallelization scheme using MPI (Message Passing Interface) grouping concept, which enables a flexible allocation of computational resources for calculating geochemical reaction and the physical processes such as groundwater flow and transport. The parallel performance of the approach was tested by three examples. It shows that the new approach has more advantages than the conventional ones for the calculation of geochemically-dominated problems, especially when only limited benefit can be obtained through parallelization for solving flow or solute transport. The comparison between the character-string-based and the file-based coupling shows, that the former approach produces less computational overhead in a distributed-memory system such as a computing cluster. The last part of the work shows the application of OGS#IPhreeqc for the simulation of the water dynamic and denitrification process in the groundwater aquifer of a study site in Northern Germany. It demonstrates that OGS#IPhreeqc is able to simulate heterogeneous reactive transport problems at a hillslope scale within an acceptable time span. The model results shows the importance of functional zones for natural attenuation process. / Modellierung des reaktiven Stofftranports ist ein wichtiger Ansatz um die Wasserströmung, den Stofftransport und die biogeochemischen Prozesse von der Hang- bis zur Einzugsgebietsskala zu verstehen. Es gibt umfangreiche Anwendungsgebiete, z.B. in der Wasserwirtschaft, Umweltsanierung und Geotechnik. Die Simulation der reaktiven Stofftransportprozesse auf der Hangskala oder auf größeren Maßstäbe ist eine anspruchsvolle Aufgabe, da es sich um die Wechselwirkungen komplexer physikalischer und biogeochemischen Prozesse handelt, die riesigen Berechnungsaufwand sowie numerischen Schwierigkeiten bezogen auf die Genauigkeit und die Stabilität nach sich ziehen. Das Hauptziel dieser Arbeit besteht darin, ein praktisches, genaues und effizientes Werkzeug zu entwickeln, um die Simulationstechnik für reaktiven Stofftransport auf der Hangskala und auf größeren Skalen zu verbessern. Der erste Teil der Arbeit behandelt die Simulation der Wasserströmung in gesättigten und ungesättigten porösen Medien. Das Anwendungspotential und die Genauigkeit verschiedener numerischer Ansätze wurden mittels einiger Benchmarks analysiert und miteinander verglichen. Der zweite Teil der Arbeit stellt die Kopplung der wissenschaftlichen Softwarepakete OpenGeoSys und IPhreeqc mit einer stringbasierten Schnittstelle dar. Die Genauigkeit und die Recheneffizienz des gekoppelten Tools OGS#IPhreeqc wurden basierend auf drei Benchmark-Tests diskutiert. Das Ergebnis zeigt, dass OGS#IPhreeqc die ausreichende numerische Genauigkeit für die Simulation reaktiven Stofftransports liefert, welcher sich sowohl auf die Gleichgewichtsreaktion als auch auf die kinetische Reaktion in variabel gesättigten porösen Medien beziehen. Der dritte Teil der Arbeit beschreibt zuerst den Algorithmus der Parallelisierung des OGS#IPhreeqc basierend auf dem MPI (Message Passing Interface) Gruppierungskonzept, welcher eine flexible Verteilung der Rechenressourcen für die Berechnung der geochemischen Reaktion und der physikalischen Prozesse wie z.B. Wasserströmung oder Stofftransport ermöglicht. Danach wurde die Leistungsfähigkeit des Algorithmus anhand von drei Beispielen getestet. Es zeigt sich, dass der neue Ansatz Vorteile gegenüber die konventionellen Ansätzen für die Berechnung von geochemisch dominierten Problemen bringt. Dies ist vor allem dann der Fall, wenn nur eingeschränkter Nutzen aus der Parallelisierung für die Berechnung der Wasserströmung oder des Stofftransportes gezogen werden kann. Der Vergleich zwischen der string- und der dateibasierten Kopplung zeigt, dass die erstere weniger Rechenoverhead in einem verteilten Rechnersystem, wie z.B. Cluster erzeugt. Der letzte Teil der Arbeit zeigt die Anwendung von OGS#IPhreeqc für die Simulation der Wasserdynamik und der Denitrifikation im Grundwasserleiter eines Untersuchungsgebietes in NordDeutschland. Es beweist, dass OGS#IPhreeqc in der Lage ist, reaktiven Stofftransport auf der Hangskala innerhalb akzeptabler Zeitspanne zu simulieren. Die Simulationsergebnisse zeigen die Bedeutung der funktionalen Zonen für die natürlichen Selbstreinigungsprozesse.
115

Scalable data-management systems for Big Data / Sur le passage à l'échelle des systèmes de gestion des grandes masses de données

Tran, Viet-Trung 21 January 2013 (has links)
La problématique "Big Data" peut être caractérisée par trois "V": - "Big Volume" se rapporte à l'augmentation sans précédent du volume des données. - "Big Velocity" se réfère à la croissance de la vitesse à laquelle ces données sont déplacées entre les systèmes qui les gèrent. - "Big Variety" correspond à la diversification des formats de ces données. Ces caractéristiques imposent des changements fondamentaux dans l'architecture des systèmes de gestion de données. Les systèmes de stockage doivent être adaptés à la croissance des données, et se doivent de passer à l'échelle tout en maintenant un accès à hautes performances. Cette thèse se concentre sur la construction des systèmes de gestion de grandes masses de données passant à l'échelle. Les deux premières contributions ont pour objectif de fournir un support efficace des "Big Volumes" pour les applications data-intensives dans les environnements de calcul à hautes performances (HPC). Nous abordons en particulier les limitations des approches existantes dans leur gestion des opérations d'entrées/sorties (E/S) non-contiguës atomiques à large échelle. Un mécanisme basé sur les versions est alors proposé, et qui peut être utilisé pour l'isolation des E/S non-contiguës sans le fardeau de synchronisations coûteuses. Dans le contexte du traitement parallèle de tableaux multi-dimensionels en HPC, nous présentons Pyramid, un système de stockage large-échelle optimisé pour ce type de données. Pyramid revoit l'organisation physique des données dans les systèmes de stockage distribués en vue d'un passage à l'échelle des performances. Pyramid favorise un partitionnement multi-dimensionel de données correspondant le plus possible aux accès générés par les applications. Il se base également sur une gestion distribuée des métadonnées et un mécanisme de versioning pour la résolution des accès concurrents, ce afin d'éliminer tout besoin de synchronisation. Notre troisième contribution aborde le problème "Big Volume" à l'échelle d'un environnement géographiquement distribué. Nous considérons BlobSeer, un service distribué de gestion de données orienté "versioning", et nous proposons BlobSeer-WAN, une extension de BlobSeer optimisée pour un tel environnement. BlobSeer-WAN prend en compte la hiérarchie de latence et favorise les accès aux méta-données locales. BlobSeer-WAN inclut la réplication asynchrone des méta-données et une résolution des collisions basée sur des "vector-clock". Afin de traîter le caractère "Big Velocity" de la problématique "Big Data", notre dernière contribution consiste en DStore, un système de stockage en mémoire orienté "documents" qui passe à l'échelle verticalement en exploitant les capacités mémoires des machines multi-coeurs. Nous montrons l'efficacité de DStore dans le cadre du traitement de requêtes d'écritures atomiques complexes tout en maintenant un haut débit d'accès en lecture. DStore suit un modèle d'exécution mono-thread qui met à jour les transactions séquentiellement, tout en se basant sur une gestion de la concurrence basée sur le versioning afin de permettre un grand nombre d'accès simultanés en lecture. / Big Data can be characterized by 3 V’s. • Big Volume refers to the unprecedented growth in the amount of data. • Big Velocity refers to the growth in the speed of moving data in and out management systems. • Big Variety refers to the growth in the number of different data formats. Managing Big Data requires fundamental changes in the architecture of data management systems. Data storage should continue being innovated in order to adapt to the growth of data. They need to be scalable while maintaining high performance regarding data accesses. This thesis focuses on building scalable data management systems for Big Data. Our first and second contributions address the challenge of providing efficient support for Big Volume of data in data-intensive high performance computing (HPC) environments. Particularly, we address the shortcoming of existing approaches to handle atomic, non-contiguous I/O operations in a scalable fashion. We propose and implement a versioning-based mechanism that can be leveraged to offer isolation for non-contiguous I/O without the need to perform expensive synchronizations. In the context of parallel array processing in HPC, we introduce Pyramid, a large-scale, array-oriented storage system. It revisits the physical organization of data in distributed storage systems for scalable performance. Pyramid favors multidimensional-aware data chunking, that closely matches the access patterns generated by applications. Pyramid also favors a distributed metadata management and a versioning concurrency control to eliminate synchronizations in concurrency. Our third contribution addresses Big Volume at the scale of the geographically distributed environments. We consider BlobSeer, a distributed versioning-oriented data management service, and we propose BlobSeer-WAN, an extension of BlobSeer optimized for such geographically distributed environments. BlobSeer-WAN takes into account the latency hierarchy by favoring locally metadata accesses. BlobSeer-WAN features asynchronous metadata replication and a vector-clock implementation for collision resolution. To cope with the Big Velocity characteristic of Big Data, our last contribution feautures DStore, an in-memory document-oriented store that scale vertically by leveraging large memory capability in multicore machines. DStore demonstrates fast and atomic complex transaction processing in data writing, while maintaining high throughput read access. DStore follows a single-threaded execution model to execute update transactions sequentially, while relying on a versioning concurrency control to enable a large number of simultaneous readers.
116

The effects of hardware acceleration on power usage in basic high-performance computing

Amsler, Christopher January 1900 (has links)
Master of Science / Department of Electrical Engineering / Dwight Day / Power consumption has become a large concern in many systems including portable electronics and supercomputers. Creating efficient hardware that can do more computation with less power is highly desirable. This project proposes a possible avenue to complete this goal by hardware accelerating a conjugate gradient solve using a Field Programmable Gate Array (FPGA). This method uses three basic operations frequently: dot product, weighted vector addition, and sparse matrix vector multiply. Each operation was accelerated on the FPGA. A power monitor was also implemented to measure the power consumption of the FPGA during each operation with several different implementations. Results showed that a decrease in time can be achieved with the dot product being hardware accelerated in relation to a software only approach. However, the more memory intensive operations were slowed using the current architecture for hardware acceleration.
117

Effect of memory access and caching on high performance computing

Groening, James January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Dwight Day / High-performance computing is often limited by memory access. As speeds increase, processors are often waiting on data transfers to and from memory. Classic memory controllers focus on delivering sequential memory as quickly as possible. This will increase the performance of instruction reads and sequential data reads and writes. However, many applications in high-performance computing often include random memory access which can limit the performance of the system. Techniques such as scatter/gather can improve performance by allowing nonsequential data to be written and read in a single operation. Caching can also improve performance by storing some of the data in memory local to the processor. In this project, we try to find the benefits of different cache configurations. The different configurations include different cache line sizes as well as total size of cache. Although a range of benchmarks are typically used to test performance, we focused on a conjugate gradient solver, HPCCG. The program HPCCG incorporates many of the elements of common benchmarks used in high-performance computing, and relates better to a real world problem. Results show that the performance of a cache configuration can depend on the size of the problem. Problems of smaller sizes can benefit more from a larger cache, while a smaller cache may be sufficient for larger problems.
118

A Parallel Implementation of an Agent-Based Brain Tumor Model

Skjerven, Brian M. 05 June 2007 (has links)
"The complex growth patterns of malignant brain tumors can present challenges in developing accurate models. In particular, the computational costs associated with modeling a realistically sized tumor can be prohibitive. The use of high-performance computing (HPC) and novel mathematical techniques can help to overcome this barrier. This paper presents a parallel implementation of a model for the growth of glioma, a form of brain cancer, and discusses how HPC is being used to take a first step toward realistically sized tumor models. Also, consideration is given to the visualization process involved with large-scale computing. Finally, simulation data is presented with a focus on scaling."
119

An integrated component selection framework for system level design

Unknown Date (has links)
The increasing system design complexity is negatively impacting the overall system design productivity by increasing the cost and time of product development. One key to overcoming these challenges is exploiting Component Based Engineering practices. However it is a challenge to select an optimum component from a component library that will satisfy all system functional and non-functional requirements, due to varying performance parameters and quality of service requirements. In this thesis we propose an integrated framework for component selection. The framework is a two phase approach that includes a system modeling and analysis phase and a component selection phase. Three component selection algorithms have been implemented for selecting components for a Network on Chip architecture. Two algorithms are based on a standard greedy method, with one being enhanced to produce more intelligent behavior. The third algorithm is based on simulated annealing. Further, a prototype was developed to evaluate the proposed framework and compare the performance of all the algorithms. / by Chad Calvert. / Thesis (M.S.C.S.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
120

Avaliação do impacto da comunicação intra e entre-nós em nuvens computacionais para aplicações de alto desempenho / Evaluation of impact from inter and intra-node communication in cloud computing for HPC applications

Okada, Thiago Kenji 07 November 2016 (has links)
Com o advento da computação em nuvem, não é mais necessário ao usuário investir grandes quantidades de recursos financeiros em equipamentos computacionais. Ao invés disto, é possível adquirir recursos de processamento, armazenamento ou mesmo sistemas completos por demanda, usando um dos diversos serviços disponibilizados por provedores de nuvem como a Amazon, o Google, a Microsoft, e a própria USP. Isso permite um controle maior dos gastos operacionais, reduzindo custos em diversos casos. Por exemplo, usuários de computação de alto desempenho podem se beneficiar desse modelo usando um grande número de recursos durante curtos períodos de tempo, ao invés de adquirir um aglomerado computacional de alto custo inicial. Nosso trabalho analisa a viabilidade de execução de aplicações de alto desempenho, comparando o desempenho de aplicações de alto desempenho em infraestruturas com comportamento conhecido com a nuvem pública oferecida pelo Google. Em especial, focamos em diferentes configurações de paralelismo com comunicação interna entre processos no mesmo nó, chamado de intra-nós, e comunicação externa entre processos em diferentes nós, chamado de entre-nós. Nosso caso de estudo para esse trabalho foi o NAS Parallel Benchmarks, um benchmark bastante popular para a análise de desempenho de sistemas paralelos e de alto desempenho. Utilizamos aplicações com implementações puramente MPI (para as comunicações intra e entre-nós) e implementações mistas onde as comunicações internas foram feitas utilizando OpenMP (comunicação intra-nós) e as comunicações externas foram feitas usando o MPI (comunicação entre-nós). / With the advent of cloud computing, it is no longer necessary to invest large amounts of money on computing resources. Instead, it is possible to obtain processing or storage resources, and even complete systems, on demand, using one of the several available services from cloud providers like Amazon, Google, Microsoft, and USP. Cloud computing allows greater control of operating expenses, reducing costs in many cases. For example, high-performance computing users can benefit from this model using a large number of resources for short periods of time, instead of acquiring a computer cluster with high initial cost. Our study examines the feasibility of running high-performance applications, comparing the performance of high-performance applications in a known infrastructure compared to the public cloud offering from Google. In particular, we focus on various parallel configurations with internal communication between processes on the same node, called intra-node, and external communication between processes on different nodes, called inter-nodes. Our case study for this work was the NAS Parallel Benchmarks, a popular benchmark for performance analysis of parallel systems and high performance computing. We tested applications with MPI-only implementations (for intra and inter-node communications) and mixed implementations where internal communications were made using OpenMP (intra-node communications) and external communications were made using the MPI (inter-node communications).

Page generated in 0.3659 seconds