Global ETD Search

61	Adjusting Process Count on Demand for Petascale Global Optimization Radcliffe, Nicholas Ryan 16 January 2012 (has links) There are many challenges that need to be met before efficient and reliable computation at the petascale is possible. Many scientific and engineering codes running at the petascale are likely to be memory intensive, which makes thrashing a serious problem for many petascale applications. One way to overcome this challenge is to use a dynamic number of processes, so that the total amount of memory available for the computation can be increased on demand. This thesis describes modifications made to the massively parallel global optimization code pVTdirect in order to allow for a dynamic number of processes. In particular, the modified version of the code monitors memory use and spawns new processes if the amount of available memory is determined to be insufficient. The primary design challenges are discussed, and performance results are presented and analyzed. / Master of Science Petascale computing Global optimization Message Passing Interface (MPI) Dynamic process count
62	Programming High-Performance Clusters with Heterogeneous Computing Devices Aji, Ashwin M. 19 May 2015 (has links) Today's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned. MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization. / Ph. D. Runtime Systems Programming Models Message Passing Interface (MPI) CUDA OpenCL
63	High Performance Applications for the Single-Chip Message-Passing Parallel Computer Dickenson, William Wesley 05 May 2004 (has links) Computer architects continue to push the limits of modern microprocessors. By using techniques such as out-of-order execution, branch prediction, and dynamic scheduling, designers have found ways to speed execution. However, growing architectural complexity has led to unsustained development and testing times. Shrinking feature sizes are causing increased wire resistances and signal propagation, thereby limiting a design's scalability. Indeed, the method of exploiting instruction-level parallelism (ILP) within applications is reaching a point of diminishing returns. One approach to the aforementioned challenges is the Single-Chip Message-Passing (SCMP) Parallel Computer, developed at Virginia Tech. SCMP is a unique, tiled architecture aimed at thread-level parallelism (TLP). Identical cores are replicated across the chip, and global wire traces have been eliminated. The nodes are connected via a 2-D grid network and each contains a local memory bank. This thesis presents the design and analysis of three high-performance applications for SCMP. The results show that the architecture proves itself as a formidable opponent to several current systems. / Master of Science Chip Multiprocessors Message-Passing Systems Parallel Applications Parallel Architectures Single-Chip Systems Thread Level Parallelism
64	Robust Online Trajectory Prediction for Non-cooperative Small Unmanned Aerial Vehicles Badve, Prathamesh Mahesh 21 January 2022 (has links) In recent years, unmanned aerial vehicles (UAVs) have got a boost in their applications in civilian areas like aerial photography, agriculture, communication, etc. An increasing research effort is being exerted to develop sophisticated trajectory prediction methods for UAVs for collision detection and trajectory planning. The existing techniques suffer from problems such as inadequate uncertainty quantification of predicted trajectories. This work adopts particle filters together with Löwner-John ellipsoid to approximate the highest posterior density region for trajectory prediction and uncertainty quantification. The particle filter is tuned and tested on real-world and simulated data sets and compared with the Kalman filter. A parallel computing approach for particle filter is further proposed. This parallel implementation makes the particle filter faster and more suitable for real-time online applications. / Master of Science / In recent years, unmanned aerial vehicles (UAVs) have got a boost in their applications in civilian areas like aerial photography, agriculture, communication, etc. Over the coming years, the number of UAVs will increase rapidly. As a result, the risk of mid-air collisions grows, leading to property damages and possible loss of life if a UAV collides with manned aircraft. An increasing research effort has been made to develop sophisticated trajectory prediction methods for UAVs for collision detection and trajectory planning. The existing techniques suffer from problems such as inadequate uncertainty quantification of predicted trajectories. This work adopts particle filters, a Bayesian inferencing technique for trajectory prediction. The use of minimum volume enclosing ellipsoid to approximate the highest posterior density region for prediction uncertainty quantification is also investigated. The particle filter is tuned and tested on real-world and simulated data sets and compared with the Kalman filter. A parallel computing approach for particle filter is further proposed. This parallel implementation makes the particle filter faster and more suitable for real-time online applications. particle filters path planning Bayesian inference non-cooperative UAVs parallel computing message passing interface
65	Balancing Performance, Area, and Power in an On-Chip Network Gold, Brian 06 August 2003 (has links) Several trends can be observed in modern microprocessor design. Architectures have become increasingly complex while design time continues to dwindle. As feature sizes shrink, wire resistance and delay increase, limiting architects from scaling designs centered around a single thread of execution. Where previous decades have focused on exploiting instruction-level parallelism, emerging applications such as streaming media and on-line transaction processing have shown greater thread-level parallelism. Finally, the increasing gap between processor and off-chip memory speeds has constrained performance of memory-intensive applications. The Single-Chip Message Passing (SCMP) parallel computer sits at the confluence of these trends. SCMP is a tiled architecture consisting of numerous thread-parallel processor and memory nodes connected through a structured interconnection network. Using an interconnection network removes global, ad-hoc wiring that limits scalability and introduces design complexity. However, routing data through general purpose interconnection networks can come at the cost of dedicated bandwidth, longer latency, increased area, and higher power consumption. Understanding the impact architectural decisions have on cost and performance will aid in the eventual adoption of general purpose interconnects. This thesis covers the design and analysis of the on-chip network and its integration with the SCMP system. The result of these efforts is a framework for analyzing on-chip interconnection networks that considers network performance, circuit area, and power consumption. / Master of Science area virtual channels SCMP power network router crossbar switch single chip computer message passing system on chip
66	Effiziente parallele Sortier- und Datenumverteilungsverfahren für Partikelsimulationen auf Parallelrechnern mit verteiltem Speicher / Efficient Parallel Sorting and Data Redistribution Methods for Particle Codes on Distributed Memory Systems Hofmann, Michael 16 April 2012 (has links) (PDF) Partikelsimulationen repräsentieren eine Klasse von daten- und rechenintensiven Simulationsanwendungen, die in unterschiedlichen Bereichen der Wissenschaft und der industriellen Forschung zum Einsatz kommen. Der hohe Berechnungsaufwand der eingesetzten Lösungsmethoden und die großen Datenmengen, die zur Modellierung realistischer Probleme benötigt werden, machen die Nutzung paralleler Rechentechnik hierfür unverzichtbar. Parallelrechner mit verteiltem Speicher stellen dabei eine weit verbreitete Architektur dar, bei der eine Vielzahl an parallel arbeitenden Rechenknoten über ein Verbindungsnetzwerk miteinander Daten austauschen können. Die Berechnung von Wechselwirkungen zwischen Partikeln stellt oft den Hauptaufwand einer Partikelsimulation dar und wird mit Hilfe schneller Lösungsmethoden, wie dem Barnes-Hut-Algorithmus oder der Schnellen Multipolmethode, durchgeführt. Effiziente parallele Implementierungen dieser Algorithmen benötigen dabei eine Sortierung der Partikel nach ihren räumlichen Positionen. Die Sortierung ist sowohl notwendig, um einen effizienten Zugriff auf die Partikeldaten zu erhalten, als auch Teil von Optimierungen zur Erhöhung der Lokalität von Speicherzugriffen, zur Minimierung der Kommunikation und zur Verbesserung der Lastbalancierung paralleler Berechnungen. Die vorliegende Dissertation beschäftigt sich mit der Entwicklung eines effizienten parallelen Sortierverfahrens und der dafür benötigten Kommunikationsoperationen zur Datenumverteilung in Partikelsimulationen. Hierzu werden eine Vielzahl existierender paralleler Sortierverfahren für verteilten Speicher analysiert und mit den Anforderungen von Seiten der Partikelsimulationsanwendungen verglichen. Besondere Herausforderungen ergeben sich dabei hinsichtlich der Aufteilung der Partikeldaten auf verteilten Speicher, der Gewichtung zu sortierender Daten zur verbesserten Lastbalancierung, dem Umgang mit doppelten Schlüsselwerten sowie der Verfügbarkeit und Nutzung speichereffizienter Kommunikationsoperationen. Um diese Anforderungen zu erfüllen, wird ein neues paralleles Sortierverfahren entwickelt und in die betrachteten Anwendungsprogramme integriert. Darüber hinaus wird ein neuer In-place-Algorithmus für der MPI_Alltoallv-Kommunikationsoperation vorgestellt, mit dem der Speicherverbrauch für die notwendige Datenumverteilung innerhalb der parallelen Sortierung deutlich reduziert werden kann. Das Verhalten aller entwickelten Verfahren wird jeweils isoliert und im praxisrelevanten Einsatz innerhalb verschiedener Anwendungsprogramme und unter Verwendung unterschiedlicher, insbesondere auch hochskalierbarer Parallelrechner untersucht. Paralleles Sortieren Datenumverteilung Partikelsimulation Performance-Optimierung Verteilter Speicher Message-Passing-Programmierung parallel sorting data redistribution particle simulation performance optimization distributed memory message passing programming ddc:005 Sortierverfahren Parallelverarbeitung Computersimulation Verteilter Speicher Wissenschaftliches Rechnen
67	Molecular Dynamics for Exascale Supercomputers / La dynamique moléculaire pour les machines exascale Cieren, Emmanuel 09 October 2015 (has links) Dans la course vers l’exascale, les architectures des supercalculateurs évoluent vers des nœuds massivement multicœurs, sur lesquels les accès mémoire sont non-uniformes et les registres de vectorisation toujours plus grands. Ces évolutions entraînent une baisse de l’efficacité des applications homogènes (MPI simple), et imposent aux développeurs l’utilisation de fonctionnalités de bas-niveau afin d’obtenir de bonnes performances.Dans le contexte de la dynamique moléculaire (DM) appliqué à la physique de la matière condensée, les études du comportement des matériaux dans des conditions extrêmes requièrent la simulation de systèmes toujours plus grands avec une physique de plus en plus complexe. L’adaptation des codes de DM aux architectures exaflopiques est donc un enjeu essentiel.Cette thèse propose la conception et l’implémentation d’une plateforme dédiée à la simulation de très grands systèmes de DM sur les futurs supercalculateurs. Notre architecture s’organise autour de trois niveaux de parallélisme: décomposition de domaine avec MPI, du multithreading massif sur chaque domaine et un système de vectorisation explicite. Nous avons également inclus une capacité d’équilibrage dynamique de charge de calcul. La conception orienté objet a été particulièrement étudiée afin de préserver un niveau de programmation utilisable par des physiciens sans altérer les performances.Les premiers résultats montrent d’excellentes performances séquentielles, ainsi qu’une accélération quasi-linéaire sur plusieurs dizaines de milliers de cœurs. En production, nous constatons une accélération jusqu’à un facteur 30 par rapport au code utilisé actuellement par les chercheurs du CEA. / In the exascale race, supercomputer architectures are evolving towards massively multicore nodes with hierarchical memory structures and equipped with larger vectorization registers. These trends tend to make MPI-only applications less effective, and now require programmers to explicitly manage low-level elements to get decent performance.In the context of Molecular Dynamics (MD) applied to condensed matter physics, the need for a better understanding of materials behaviour under extreme conditions involves simulations of ever larger systems, on tens of thousands of cores. This will put molecular dynamics codes among software that are very likely to meet serious difficulties when it comes to fully exploit the performance of next generation processors.This thesis proposes the design and implementation of a high-performance, flexible and scalable framework dedicated to the simulation of large scale MD systems on future supercomputers. We managed to separate numerical modules from different expressions of parallelism, allowing developers not to care about optimizations and still obtain high levels of performance. Our architecture is organized in three levels of parallelism: domain decomposition using MPI, thread parallelization within each domain, and explicit vectorization. We also included a dynamic load balancing capability in order to equally share the workload among domains.Results on simple tests show excellent sequential performance and a quasi linear speedup on several thousands of cores on various architectures. When applied to production simulations, we report an acceleration up to a factor 30 compared to the code previously used by CEA’s researchers. Dynamique Moléculaire Calcul Intensif Multi-Cœurs Message Passing Interface Threads Tbb Vectorisation Équilibrage de charge C++ Xeon Phi Molecular Dynamics High Performance Computing Manycore Message Passing Interface Threads Tbb Vectorization Load-Balancing C++ Xeon Phi
68	Effiziente parallele Sortier- und Datenumverteilungsverfahren für Partikelsimulationen auf Parallelrechnern mit verteiltem Speicher Hofmann, Michael 09 March 2012 (has links) Partikelsimulationen repräsentieren eine Klasse von daten- und rechenintensiven Simulationsanwendungen, die in unterschiedlichen Bereichen der Wissenschaft und der industriellen Forschung zum Einsatz kommen. Der hohe Berechnungsaufwand der eingesetzten Lösungsmethoden und die großen Datenmengen, die zur Modellierung realistischer Probleme benötigt werden, machen die Nutzung paralleler Rechentechnik hierfür unverzichtbar. Parallelrechner mit verteiltem Speicher stellen dabei eine weit verbreitete Architektur dar, bei der eine Vielzahl an parallel arbeitenden Rechenknoten über ein Verbindungsnetzwerk miteinander Daten austauschen können. Die Berechnung von Wechselwirkungen zwischen Partikeln stellt oft den Hauptaufwand einer Partikelsimulation dar und wird mit Hilfe schneller Lösungsmethoden, wie dem Barnes-Hut-Algorithmus oder der Schnellen Multipolmethode, durchgeführt. Effiziente parallele Implementierungen dieser Algorithmen benötigen dabei eine Sortierung der Partikel nach ihren räumlichen Positionen. Die Sortierung ist sowohl notwendig, um einen effizienten Zugriff auf die Partikeldaten zu erhalten, als auch Teil von Optimierungen zur Erhöhung der Lokalität von Speicherzugriffen, zur Minimierung der Kommunikation und zur Verbesserung der Lastbalancierung paralleler Berechnungen. Die vorliegende Dissertation beschäftigt sich mit der Entwicklung eines effizienten parallelen Sortierverfahrens und der dafür benötigten Kommunikationsoperationen zur Datenumverteilung in Partikelsimulationen. Hierzu werden eine Vielzahl existierender paralleler Sortierverfahren für verteilten Speicher analysiert und mit den Anforderungen von Seiten der Partikelsimulationsanwendungen verglichen. Besondere Herausforderungen ergeben sich dabei hinsichtlich der Aufteilung der Partikeldaten auf verteilten Speicher, der Gewichtung zu sortierender Daten zur verbesserten Lastbalancierung, dem Umgang mit doppelten Schlüsselwerten sowie der Verfügbarkeit und Nutzung speichereffizienter Kommunikationsoperationen. Um diese Anforderungen zu erfüllen, wird ein neues paralleles Sortierverfahren entwickelt und in die betrachteten Anwendungsprogramme integriert. Darüber hinaus wird ein neuer In-place-Algorithmus für der MPI_Alltoallv-Kommunikationsoperation vorgestellt, mit dem der Speicherverbrauch für die notwendige Datenumverteilung innerhalb der parallelen Sortierung deutlich reduziert werden kann. Das Verhalten aller entwickelten Verfahren wird jeweils isoliert und im praxisrelevanten Einsatz innerhalb verschiedener Anwendungsprogramme und unter Verwendung unterschiedlicher, insbesondere auch hochskalierbarer Parallelrechner untersucht. info:eu-repo/classification/ddc/005 ddc:005
69	Deep Learning based Approximate Message Passing for MIMO Detection in 5G : Low complexity deep learning algorithms for solving MIMO Detection in real world scenarios / Deep Learning-baserat Ungefärligt meddelande som passerar för MIMO-detektion i 5G : Låg komplexitet djupinlärningsalgoritmer för att lösa MIMO-detektion i verkliga scenarier Pozzoli, Andrea January 2022 (has links) The Fifth Generation (5G) mobile communication system is the latest technology in wireless communications. This technique brings several advantages, in particular by using multiple receiver antennas that serve multiple transmitters. This configuration used in 5G is called Massive Multiple Input Multiple Output (MIMO), and it increases link reliability and information throughput. However, MIMO systems face two challenges at link layer: channel estimation and MIMO detection. In this work, the focus is only on the MIMO detection problem. It consists in retrieving the original messages, sent by the transmitters, at the receiver side when the received message is a noisy signal. The optimal technique to solve the problem is called Maximum Likelihood (ML), but it does not scale and therefore with MIMO systems it cannot be used. Several sub-optimal techniques have been tested during years in order to solve MIMO detection problem, trying to balance the complexity-performance trade-off. In recent years, Approximate Message Passing (AMP) based techniques brought interesting results. Moreover, deep learning (DL) is spreading in several and different fields, and also in MIMO detection, it has been tested with promising results. A neural network called MMNet brought the most interesting results, but new techniques have been developed. These new techniques, despite they are promising, have not been compared with MMNet. In this thesis, two new techniques AMP and DL based, called Ortoghonal AMP Network Second (OAMP-Net2) and Learnable Vector AMP (LVAMP), have been tested and compared with the state of art. The aim of the thesis is to discover if one or both the techniques can provide better results than MMNet, in order to discover a valid alternative solution while dealing with MIMO detection problem. OAMP-Net2 and LVAMP have been developed and tested on different channel models (i.i.d. Gaussian and Kronecker) and on MIMO systems of different sizes (small and medium-large). OAMP-Net2 revealed to be a consistent technique that can be used in solving MIMO detection problem. It provides interesting results on both i.i.d Gaussian and Kronecker channel models and with different sizes matrices. Moreover, OAMP-Net2 has good adaptability, in fact it provides good results on Kronecker channel models also when it is trained with i.i.d. Gaussian matrices. LVAMP instead has performances that are similar to MMSE, but with a lower complexity. It adapts well to complex channels such as OAMP-Net2. / Femte generationens (5G) mobila kommunikationssystem är den senaste tekniken inom trådlös kommunikation. Denna teknik ger flera fördelar, i synnerhet genom att använda flera mottagarantenner som betjänar flera sändare. Denna konfiguration som används i 5G kallas Massive Multiple Input Multiple Output (MIMO), och den ökar länktillförlitligheten och informationsgenomströmningen. MIMO-system står dock inför två utmaningar i länkskiktet: kanaluppskattning och MIMO-detektering. I detta arbete ligger fokus endast på MIMO-detekteringsproblemet. Den består i att hämta de ursprungliga meddelandena, skickade av sändarna, på mottagarsidan när det mottagna meddelandet är en brusig signal. Den optimala tekniken för att lösa problemet kallas Maximum Likelihood (ML), men den skalas inte och därför kan den inte användas med MIMO-system. Flera suboptimala tekniker har testats under flera år för att lösa MIMO-detekteringsproblem och försöka balansera komplexitet-prestanda-avvägningen. Under de senaste åren har Approximate Message Passing (AMP)-baserade tekniker gett intressanta resultat. Dessutom sprids djupinlärning (DL) inom flera och olika områden, och även inom MIMO-detektering har det testats med lovande resultat. Ett neuralt nätverk kallat MMNet gav de mest intressanta resultaten, men nya tekniker har utvecklats. Dessa nya tekniker, trots att de är lovande, har inte jämförts med MMNet. I detta examensarbete har två nya tekniker AMP- och DL-baserade, kallade Ortoghonal AMP Network Second (OAMP-Net2) och Learnable Vector AMP (LVAMP), testats och jämförts med den senaste tekniken. Syftet med avhandlingen är att ta reda på om en eller båda teknikerna kan ge bättre resultat än MMNet, för att upptäcka en giltig alternativ lösning samtidigt som man hanterar MIMO-detekteringsproblem. OAMP-Net2 och LVAMP har utvecklats och testats på olika kanalmodeller (i.i.d. Gaussian och Kronecker) och på MIMO-system av olika storlekar (small och medium-large).OAMP-Net2 visade sig vara en konsekvent teknik som kan användas för att lösa MIMO-detekteringsproblem. Det ger riktigt intressanta resultat på både i.i.d Gaussian och Kronecker kanalmodeller och med matriser i olika storlekar. Dessutom har OAMP-Net2 god anpassningsförmåga, faktiskt ger den bra resultat på Kronecker kanalmodeller även när den tränas med i.i.d. Gaussiska matriser. LVAMP har istället prestanda som liknar MMSE, men med lägre komplexitet. Den anpassar sig väl till komplexa kanaler somOAMPNet2. 5G MIMO detection Approximate Message Passing OAMP-Net2 LVAMP MMNet Deep Learning 5G MIMO-detektering Approximate Message Passing OAMP-Net2 LVAMP MMNet Deep Learning Computer and Information Sciences Data- och informationsvetenskap
70	Equilibrium and Dynamics on Complex Networkds Del Ferraro, Gino January 2016 (has links) Complex networks are an important class of models used to describe the behaviour of a very broad category of systems which appear in different fields of science ranging from physics, biology and statistics to computer science and other disciplines. This set of models includes spin systems on a graph, neural networks, decision networks, spreading disease, financial trade, social networks and all systems which can be represented as interacting agents on some sort of graph architecture. In this thesis, by using the theoretical framework of statistical mechanics, the equilibrium and the dynamical behaviour of such systems is studied. For the equilibrium case, after presenting the region graph free energy approximation, the Survey Propagation method, previously used to investi- gate the low temperature phase of complex systems on tree-like topologies, is extended to the case of loopy graph architectures. For time-dependent behaviour, both discrete-time and continuous-time dynamics are considered. It is shown how to extend the cavity method ap- proach from a tool used to study equilibrium properties of complex systems to the discrete-time dynamical scenario. A closure scheme of the dynamic message-passing equation based on a Markovian approximations is presented. This allows to estimate non-equilibrium marginals of spin models on a graph with reversible dynamics. As an alternative to this approach, an extension of region graph variational free energy approximations to the non-equilibrium case is also presented. Non-equilibrium functionals that, when minimized with constraints, lead to approximate equations for out-of-equilibrium marginals of general spin models are introduced and discussed. For the continuous-time dynamics a novel approach that extends the cav- ity method also to this case is discussed. The main result of this part is a Cavity Master Equation which, together with an approximate version of the Master Equation, constitutes a closure scheme to estimate non-equilibrium marginals of continuous-time spin models. The investigation of dynamics of spin systems is concluded by applying a quasi-equilibrium approach to a sim- ple case. A way to test self-consistently the assumptions of the method as well as its limits is discussed. In the final part of the thesis, analogies and differences between the graph- ical model approaches discussed in the manuscript and causal analysis in statistics are presented. / <p>QC 20160904</p> Statistical mechanics complex networks spin systems non equilibrium dynamics generalized belief propagation message passing cavity method variational approaches

Search results