Spelling suggestions: "subject:"high performance computing."" "subject:"igh performance computing.""
221 |
Modélisation et implémentation de simulations multi-agents sur architectures massivement parallèles / Modeling and implementing multi-agents based simulations on massively parallel architecturesHermellin, Emmanuel 18 November 2016 (has links)
La simulation multi-agent représente une solution pertinente pour l’ingénierie et l’étude des systèmes complexes dans de nombreux domaines (vie artificielle, biologie, économie, etc.). Cependant, elle requiert parfois énormément de ressources de calcul, ce qui représente un verrou technologique majeur qui restreint les possibilités d'étude des modèles envisagés (passage à l’échelle, expressivité des modèles proposés, interaction temps réel, etc.).Parmi les technologies disponibles pour faire du calcul intensif (High Performance Computing, HPC), le GPGPU (General-Purpose computing on Graphics Processing Units) consiste à utiliser les architectures massivement parallèles des cartes graphiques (GPU) comme accélérateur de calcul. Cependant, alors que de nombreux domaines bénéficient des performances du GPGPU (météorologie, calculs d’aérodynamique, modélisation moléculaire, finance, etc.), celui-ci est peu utilisé dans le cadre de la simulation multi-agent. En fait, le GPGPU s'accompagne d’un contexte de développement très spécifique qui nécessite une transformation profonde et non triviale des modèles multi-agents. Ainsi, malgré l'existence de travaux pionniers qui démontrent l'intérêt du GPGPU, cette difficulté explique le faible engouement de la communauté multi-agent pour le GPGPU.Dans cette thèse, nous montrons que, parmi les travaux qui visent à faciliter l'usage du GPGPU dans un contexte agent, la plupart le font au travers d’une utilisation transparente de cette technologie. Cependant, cette approche nécessite d’abstraire un certain nombre de parties du modèle, ce qui limite fortement le champ d’application des solutions proposées. Pour pallier ce problème, et au contraire des solutions existantes, nous proposons d'utiliser une approche hybride (l'exécution de la simulation est partagée entre le processeur et la carte graphique) qui met l'accent sur l'accessibilité et la réutilisabilité grâce à une modélisation qui permet une utilisation directe et facilitée de la programmation GPU. Plus précisément, cette approche se base sur un principe de conception, appelé délégation GPU des perceptions agents, qui consiste à réifier une partie des calculs effectués dans le comportement des agents dans de nouvelles structures (e.g. dans l’environnement). Ceci afin de répartir la complexité du code et de modulariser son implémentation. L'étude de ce principe ainsi que les différentes expérimentations réalisées montre l'intérêt de cette approche tant du point de vue conceptuel que du point de vue des performances. C'est pourquoi nous proposons de généraliser cette approche sous la forme d'une méthodologie de modélisation et d'implémentation de simulations multi-agents spécifiquement adaptée à l'utilisation des architectures massivement parallèles. / Multi-Agent Based Simulations (MABS) represents a relevant solution for the engineering and the study of complex systems in numerous domains (artificial life, biology, economy, etc.). However, MABS sometimes require a lot of computational resources, which is a major constraint that restricts the possibilities of study for the considered models (scalability, real-time interaction, etc.).Among the available technologies for HPC (High Performance Computing), the GPGPU (General-Purpose computing on Graphics Processing Units) proposes to use the massively parallel architectures of graphics cards as computing accelerator. However, while many areas benefit from GPGPU performances (meteorology, molecular dynamics, finance, etc.). Multi-Agent Systems (MAS) and especially MABS hardly enjoy the benefits of this technology: GPGPU is very little used and only few works are interested in it. In fact, the GPGPU comes along with a very specific development context which requires a deep and not trivial transformation process for multi-agents models. So, despite the existence of works that demonstrate the interest of GPGPU, this difficulty explains the low popularity of GPGPU in the MAS community.In this thesis, we show that among the works which aim to ease the use of GPGPU in an agent context, most of them do it through a transparent use of this technology. However, this approach requires to abstract some parts of the models, what greatly limits the scope of the proposed solutions. To handle this issue, and in contrast to existing solutions, we propose to use a nhybrid approach (the execution of the simulation is shared between both the processor and graphics card) that focuses on accessibility and reusability through a modeling process that allows to use directly GPU programming while simplifying its use. More specifically, this approach is based on a design principle, called GPU delegation of agent perceptions, consists in making a clear separation between the agent behaviors, managed by the processor, and environmental dynamics, handled by the graphics card. So, one major idea underlying this principle is to identify agent computations which can be transformed in new structures (e.g. in the environment) in order to distribute the complexity of the code and modulate its implementation. The study of this principle and the different experiments conducted show the advantages of this approach from both a conceptual and performances point of view. Therefore, we propose to generalize this approach and define a comprehensive methodology relying on GPU delegation specifically adapted to the use of massively parallel architectures for MABS.
|
222 |
High-performance communication infrastructure design on FPGA-centric clustersYang, Chen 29 September 2019 (has links)
FPGA-Centric Clusters (FCCs) with the FPGAs directly linked through their Multi-Gigabit Transceivers (MGTs) have a proven advantage over other commodity architectures for communication bound applications. To date, however, communication infrastructure for such clusters has generally only taken one of two simple approaches: nearest-neighbor-only, which is fast but of limited utility, and processor-based, which is general but slow. The overall problem addressed in this dissertation is the architecture, design, and implementation of communication networks for FCCs. These network designs should take advantage of the decades of design experience in networks for High-Performance Computing (HPC) clusters, but should also account for, and take advantage of, unique characteristics of FCCs, in particular, the configurability of the FPGAs themselves.
This dissertation has seven parts. We begin with in-depth implementations of two model applications, Directional Dark Matter (DM) Detection, and Molecular Dynamics (MD). These implementations expose the necessary characteristics of FCC networks from physical through application layers.
The second is the systematic exploration of communication microarchitecture for FCCs, as has been done previously for HPC clusters and for Networks on Chips (NoCs) on both FPGAs and ASICs. One outcome of this part is to find the properties of FCCs that substantially influence the router design space. Another outcome is to create a selection of candidate routers and generalize it so that it is parameterized by routing algorithm, arbitration policy, number of virtual channels (VCs), and other parameters.
The third part is to use the proposed application-aware framework to evaluate the resulting design space with respect to a number of common communication patterns and packet sizes. The results from this part enable two sets of designs. One is the selection of an optimal router for a given resource budget that accounts for all the workloads. The other is to take advantage of FPGA reconfigurability to select the optimal router accounting for both resource budget and a particular workload.
The fourth part is to evaluate the advantages of this approach of adapting the router design to the application. We find that the optimality of the router design varies significantly with workloads. We observe that compared with the router configuration with the best average performance, application-aware router selection can lead to substantial improvement in performance or reduction in resources required.
The fifth part is application-specific optimizations in which we develop several modules and functional units that can provide specific optimizations for certain types of communication workloads depending on the application it going to serve.
The sixth part explores topology emulation, e.g., when a three-dimensional network is used in the computation of an application that is logically two dimensional. We propose a generalized fold-and-cut mechanism that both preserves the locality in logical mapping, while also making use of the extra links provided by our 3D-torus fixture.
The seventh part is a table-based static-scheduled router for applications with a static or persistent communication pattern. The router supports various cases, including unicast, multicast, and reduction. By making routing decisions a priori, we can bring better load-balance to network links and reduce congestion.
|
223 |
Gene-EnvironmentInteraction Analysis UsingGraphic Cards / Analys av genmiljöinteraktion med använding avgrafikkortBerglund, Daniel January 2015 (has links)
Genome-wide association studies(GWAS) are used to find associations betweengenetic markers and diseases. One part of GWAS is to study interactions be-tween markers which can play an important role in the risk for the disease. Thesearch for interactions can be computationally intensive. The aim of this thesiswas to improve the performance of software used for gene-environment interac-tion by using parallel programming techniques on graphical processors. A studyof the new programs performance, speedup and efficiency was made using mul-tiple simulated datasets. The program shows significantly better performancecompared with the older program.
|
224 |
Automating telemetry- and trace-based analytics on large-scale distributed systemsAteş, Emre 28 September 2020 (has links)
Large-scale distributed systems---such as supercomputers, cloud computing platforms,
and distributed applications---routinely suffer from slowdowns and crashes due to
software and hardware problems, resulting in reduced efficiency and wasted
resources. These large-scale systems typically deploy monitoring or tracing
systems that gather a variety of statistics about the state of the hardware
and the software. State-of-the-art methods either analyze this data manually,
or design unique automated methods for each specific problem. This thesis
builds on the vision that generalized automated analytics methods on the data
sets collected from these complex computing systems provide critical
information about the causes of the problems, and this analysis can then enable
proactive management to improve performance, resilience, efficiency, or security
significantly beyond current limits.
This thesis seeks to design scalable, automated analytics methods and frameworks
for large-scale distributed systems that minimize dependency on expert
knowledge, automate parts of the solution process, and help make systems more
resilient. In addition to analyzing data that is already collected from systems,
our frameworks also identify what to collect from where in the system, such that
the collected data would be concise and useful for manual analytics. We focus on
two data sources for conducting analytics: numeric telemetry data, which is
typically collected from operating system or hardware counters, and end-to-end
traces collected from distributed applications.
This thesis makes the following contributions in large-scale distributed
systems: (1) Designing a framework for accurately diagnosing previously
encountered performance variations, (2) designing a technique for detecting
(unwanted) applications running on the systems, (3) developing a suite for
reproducing performance variations that can be used to systematically develop
analytics methods, (4) designing a method to explain predictions of black-box
machine learning frameworks, and (5) constructing an end-to-end tracing
framework that can dynamically adjust instrumentation for effective diagnosis of
performance problems. / 2021-09-28T00:00:00Z
|
225 |
HPCC based Platform for COPD Readmission Risk Analysis with implementation of Dimensionality reduction and balancing techniquesUnknown Date (has links)
Hospital readmission rates are considered to be an important indicator of quality of care because they may be a consequence of actions of commission or omission made during the initial hospitalization of the patient, or as a consequence of poorly managed transition of the patient back into the community. The negative impact on patient quality of life and huge burden on healthcare system have made reducing hospital readmissions a central goal of healthcare delivery and payment reform efforts.
In this study, we will be proposing a framework on how the readmission analysis and other healthcare models could be deployed in real world and a Machine learning based solution which uses patients discharge summaries as a dataset to train and test the machine learning model created. Current systems does not take into consideration one of the very important aspect of solving readmission problem by taking Big data into consideration. This study also takes into consideration Big data aspect of solutions which can be deployed in the field for real world use. We have used HPCC compute platform which provides distributed parallel programming platform to create, run and manage applications which involves large amount of data. We have also proposed some feature engineering and data balancing techniques which have shown to greatly enhance the machine learning model performance. This was achieved by reducing the dimensionality in the data and fixing the imbalance in the dataset.
The system presented in this study provides a real world machine learning based predictive modeling for reducing readmissions which could be templatized for other diseases. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
226 |
Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive WorkloadsSteven Monteiro, Steena Dominica 01 December 2016 (has links)
Projecting performance of applications and hardware is important to several market segments—hardware designers, software developers, supercomputing centers, and end users. Hardware designers estimate performance of current applications on future systems when designing new hardware. Software developers make performance estimates to evaluate performance of their code on different architectures and input datasets. Supercomputing centers try to optimize the process of matching computing resources to computing needs. End users requesting time on supercomputers must provide estimates of their application’s run time, and incorrect estimates can lead to wasted supercomputing resources and time. However, application performance is challenging to predict because it is affected by several factors in application code, specifications of system hardware, choice of compilers, compiler flags, and libraries.
This dissertation uses statistical techniques to model and optimize performance of scientific applications across different computer processors. The first study in this research offers statistical models that predict performance of an application across different input datasets prior to application execution. These models guide end users to select parameters that produce optimal application performance during execution. The second study offers a suite of statistical models that predict performance of a new application on a new processor. Both studies present statistical techniques that can be generalized to analyze, optimize, and predict performance of diverse computation- and data-intensive applications on different hardware.
|
227 |
Turbulence Modeling of Strongly Heated Internal Pipe Flow Using Large Eddy SimulationHradisky, Michal 01 May 2011 (has links)
The main objective of this study was to evaluate the performance of three Large Eddy Simulation (LES) subgrid scale (SGS) models on a strongly heated, low Mach number upward gas flow in a vertical pipe with forced convection. The models chosen for this study were the Smagorinsky-Lilly Dynamic model (SLD), the Kinetic Energy Transport model (KET), and the Wall-Adaptive Local-Eddy viscosity model (WALE). The used heating rate was sufficiently large to cause properties to vary significantly in both the radial and streamwise directions. All simulations were carried out using the commercial software FLUENT.
The effect of inlet turbulence generation techniques was considered as well. Three inlet turbulence generation techniques were compared, namely, the Spectral Synthesizer Method (SSM), the Vortex Method (VM), and the Generator (GEN) technique. A user-defined function (UDF) was written to implement the GEN technique into the solver; the SSM and VM techniques were already build-in. All simulation and solver settings were validated by performing computational simulations of isothermal fully developed pipe flow and results were compared to available experimental and Direct Numerical Simulation (DNS) data.
For isothermal boundary conditions, among the three inlet turbulence generation techniques, the GEN technique produced results which best matched the experimental and DNS results. All three LES SGS models performed equally well when coupled with the GEN technique for the study of isothermal pipe flow. However, all models incorrectly predicted the behavior of radial and circumferential velocity fluctuations near the wall and the GEN technique proved to be the most computationally expensive. For simulations with longer computational domain, the effect of the inlet turbulence generation technique diminishes. However, results suggest that both the SLD and KET models need shorter computational domains to recover proper LES behavior when coupled with the VM technique in comparison to the WALE SGS model with the same turbulence inlet generation technique.
For high heat flux simulations all SGS models were coupled with the VM technique to decrease the computational effort to obtain statistically steady-state solution. For comparative purposes, one simulation was carried out using the WALE and GEN techniques. All simulations equally significantly underpredicted the streamwise temperature distribution along the pipe wall as well as in the radial directions at various streamwise locations. These effects are attributed to the overpredicted streamwise velocity components and incorrect behavior of both the radial and circumferential velocity components in the near wall region for all subgrid scale models.
|
228 |
Scalable event tracking on high-end parallel systemsMohror, Kathryn Marie 01 January 2010 (has links)
Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amount of data measured and stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time.
|
229 |
Power, Performance and Energy Models and Systems for Emergent ArchitecturesSong, Shuaiwen 10 April 2013 (has links)
Massive parallelism combined with complex memory hierarchies and heterogeneity in high-performance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency.
A key limitation of past approaches is a lack of power-performance policies allowing users to quantitatively bound the effects of power management on the performance of their applications and systems. Existing controllers and predictors use policies fixed by a knowledgeable user to opportunistically save energy and minimize performance impact. While the qualitative effects are often good and the aggressiveness of a controller can be tuned to try to save more or less energy, the quantitative effects of tuning and setting opportunistic policies on performance and power are unknown. In other words, the controller will save energy and minimize performance loss in many cases but we have little understanding of the quantitative effects of controller tuning. This makes setting power-performance policies a manual trial and error process for domain experts and a black art for practitioners. To improve upon past approaches to high-performance power management, we need to quantitatively understand the effects of power and performance at scale.
In this work, I have developed theories and techniques to quantitatively understand the relationship between power and performance for high performance systems at scale. For instance, our system-level, iso-energy-efficiency model analyzes, evaluates and predicts the performance and energy use of data intensive parallel applications on multi-core systems. This model allows users to study the effects of machine and application dependent characteristics on system energy efficiency. Furthermore, this model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I have also developed methodologies which can be extended and applied to model modern heterogeneous architectures such as GPU-based clusters to improve their efficiency at scale. / Ph. D.
|
230 |
Popcorn Linux: enabling efficient inter-core communication in a Linux-based multikernel operating systemShelton, Benjamin H. 31 May 2013 (has links)
As manufacturers introduce new machines with more cores, more NUMA-like architectures, and more tightly integrated heterogeneous processors, the traditional abstraction of a monolithic OS running on a SMP system is encountering new challenges. One proposed path forward is the multikernel operating system. Previous efforts have shown promising results both in scalability and in support for heterogeneity. However, one effort\'s source code is not freely available (FOS), and the other effort is not self-hosting and does not support a majority of existing applications (Barrelfish).
In this thesis, we present Popcorn, a Linux-based multikernel operating system. While Popcorn was a group effort, the boot layer code and the memory partitioning code are the author\'s work, and we present them in detail here. To our knowledge, we are the first to support multiple instances of the Linux kernel on a 64-bit x86 machine and to support more than 4 kernels running simultaneously.
We demonstrate that existing subsystems within Linux can be leveraged to meet the design goals of a multikernel OS. Taking this approach, we developed a fast inter-kernel network driver and messaging layer. We demonstrate that the network driver can share a 1 Gbit/s link without degraded performance and that in combination with guest kernels, it meets or exceeds the performance of SMP Linux with an event-based web server. We evaluate the messaging layer with microbenchmarks and conclude that it performs well given the limitations of current x86-64 hardware. Finally, we use the messaging layer to provide live process migration between cores. / Master of Science
|
Page generated in 0.4142 seconds