Spelling suggestions: "subject:"high performance computing"" "subject:"igh performance computing""
211 |
KernTune: self-tuning Linux kernel performance using support vector machinesYi, Long January 2006 (has links)
Magister Scientiae - MSc / Self-tuning has been an elusive goal for operating systems and is becoming a pressing issue for modern operating systems. Well-trained system administrators are able to tune an operating system to achieve better system performance for a specific system class. Unfortunately, the system class can change when the running applications change. The model for self-tuning operating system is based on a monitor-classify-adjust loop. The idea of this loop is to continuously monitor certain performance metrics, and whenever these change, the system determines the new system class and dynamically adjusts tuning parameters for this new class. This thesis described KernTune, a prototype tool that identifies the system class and improves system performance automatically. A key aspect of KernTune is the notion of Artificial Intelligence oriented performance tuning. Its uses a support vector machine to identify the system class, and tunes the operating system for that specific system class. This thesis presented design and implementation details for KernTune. It showed how KernTune identifies a system class and tunes the operating system for improved performance. / South Africa
|
212 |
Virtualisation en contexte HPC / Virtualisation in HPC contextCapra, Antoine 17 December 2015 (has links)
Afin de répondre aux besoins croissants de la simulation numérique et de rester à la pointe de la technologie, les supercalculateurs doivent d’être constamment améliorés. Ces améliorations peuvent être d’ordre matériel ou logiciel. Cela force les applications à s’adapter à un nouvel environnement de programmation au fil de son développement. Il devient alors nécessaire de se poser la question de la pérennité des applications et de leur portabilité d’une machine à une autre. L’utilisation de machines virtuelles peut être une première réponse à ce besoin de pérennisation en stabilisant les environnements de programmation. Grâce à la virtualisation, une application peut être développée au sein d’un environnement figé, sans être directement impactée par l’environnement présent sur une machine physique. Pour autant, l’abstraction supplémentaire induite par les machines virtuelles entraine en pratique une perte de performance. Nous proposons dans cette thèse un ensemble d’outils et de techniques afin de permettre l’utilisation de machines virtuelles en contexte HPC. Tout d’abord nous montrons qu’il est possible d’optimiser le fonctionnement d’un hyperviseur afin de répondre le plus fidèlement aux contraintes du HPC que sont : le placement des fils d’exécution et la localité mémoire des données. Puis en s’appuyant sur ce résultat, nous avons proposé un service de partitionnement des ressources d’un noeud de calcul par le biais des machines virtuelles. Enfin, pour étendre nos travaux à une utilisation pour des applications MPI, nous avons étudié les solutions et performances réseau d’une machine virtuelle. / To meet the growing needs of the digital simulation and remain at the forefront of technology, supercomputers must be constantly improved. These improvements can be hardware or software order. This forces the application to adapt to a new programming environment throughout its development. It then becomes necessary to raise the question of the sustainability of applications and portability from one machine to another. The use of virtual machines may be a first response to this need for sustaining stabilizing programming environments. With virtualization, applications can be developed in a fixed environment, without being directly impacted by the current environment on a physical machine. However, the additional abstraction induced by virtual machines in practice leads to a loss of performance. We propose in this thesis a set of tools and techniques to enable the use of virtual machines in HPC context. First we show that it is possible to optimize the operation of a hypervisor to respond accurately to the constraints of HPC that are : the placement of implementing son and memory data locality. Then, based on this, we have proposed a resource partitioning service from a compute node through virtual machines. Finally, to expand our work to use for MPI applications, we studied the network solutions and performance of a virtual machine.
|
213 |
Dynamic Load Balancing Schemes for Large-scale HLA-based SimulationsDe Grande, Robson E. January 2012 (has links)
Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run-time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed simulation applications, the computational load and interaction dependencies of each simulation entity change during run-time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and execution delays. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Due to the relevance in dynamically balancing load for distributed simulations, many balancing approaches have been proposed in order to offer a sub-optimal balancing solution, but they are limited to certain simulation aspects, specific to determined applications, or unaware of HLA-based simulation characteristics. Therefore, schemes for balancing the communication and computational load during the execution of distributed simulations are devised, adopting a hierarchical architecture. First, in order to enable the development of such balancing schemes, a migration technique is also employed to perform reliable and low-latency simulation load transfers. Then, a centralized balancing scheme is designed; this scheme employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, and it uses load reallocation policies to determine a distribution of load and minimize imbalances. As a measure to overcome the drawbacks of this scheme, such as bottlenecks, overheads, global synchronization, and single point of failure, a distributed redistribution algorithm is designed. Extensions of the distributed balancing scheme are also developed to improve the detection of and the reaction to load imbalances. These extensions introduce communication delay detection, migration latency awareness, self-adaptation, and load oscillation prediction in the load redistribution algorithm. Such developed balancing systems successfully improved the use of shared resources and increased distributed simulations' performance.
|
214 |
Code profiling and optimization in transactional memory systems / Profiling e otimização de código em sistemas de memória transacionalCordeiro, Silvio Ricardo January 2014 (has links)
Memória Transacional tem se demonstrado um paradigma promissor na implementação de aplicações concorrentes sob memória compartilhada que busquem evitar um modelo de sincronização baseado em locks. Em vez de sujeitar a execução a um acesso exclusivo com base no valor de um lock que é compartilhado por threads concorrentes, uma aplicação sob Memória Transacional tenta executar seções críticas de modo otimista, desfazendo as modificações no caso de um conflito de acesso à memória. Entretanto, apesar de a abordagem baseada em locks ter adquirido um número significativo de ferramentas automatizadas para a depuração, profiling e otimização automatizados (por ser uma das técnicas de sincronização mais antigas e mais bem pesquisadas), o campo da Memória Transacional ainda é comparativamente recente, e programadores frequentemente precisam adaptar manualmente suas aplicações transacionais ao encontrar problemas de eficiência. Este trabalho propõe um sistema no qual o profiling de código em uma implementação de Memória Transacional simulada é utilizado para caracterizar uma aplicação transacional, formando a base para uma parametrização automatizada do respectivo sistema especulativo para uma execução eficiente do código em questão. Também é proposta uma abordagem de escalonamento de threads guiado por profiling em uma implementação de Memória Transacional baseada em software, usando dados coletados pelo profiler para prever a probabilidade de conflitos e determinar que thread escalonar com base nesta previsão. São apresentados os resultados de experimentos sob ambas as abordagens. / Transactional Memory has shown itself to be a promising paradigm for the implementation of shared-memory concurrent applications that eschew a lock-based model of data synchronization. Rather than conditioning exclusive access on the value of a lock that is shared across concurrent threads, Transactional Memory attempts to execute critical sections optimistically, rolling back the modifications in the event of a data access conflict. However, while the lock-based approach has acquired a significant body of debugging, profiling and automated optimization tools (as one of the oldest and most researched synchronization techniques), the field of Transactional Memory is still comparably recent, and programmers are usually tasked with an unguided manual tuning of their transactional applications when facing efficiency problems. We propose a system in which code profiling in a simulated hardware implementation of Transactional Memory is used to characterize a transactional application, which forms the basis for the automated tuning of the underlying speculative system for the efficient execution of that particular application. We also propose a profile-guided approach to the scheduling of threads in a software-based implementation of Transactional Memory, using collected data to predict the likelihood of conflicts and determine what thread to schedule based on this prediction. We present the results achieved under both designs.
|
215 |
Avaliação do impacto da comunicação intra e entre-nós em nuvens computacionais para aplicações de alto desempenho / Evaluation of impact from inter and intra-node communication in cloud computing for HPC applicationsThiago Kenji Okada 07 November 2016 (has links)
Com o advento da computação em nuvem, não é mais necessário ao usuário investir grandes quantidades de recursos financeiros em equipamentos computacionais. Ao invés disto, é possível adquirir recursos de processamento, armazenamento ou mesmo sistemas completos por demanda, usando um dos diversos serviços disponibilizados por provedores de nuvem como a Amazon, o Google, a Microsoft, e a própria USP. Isso permite um controle maior dos gastos operacionais, reduzindo custos em diversos casos. Por exemplo, usuários de computação de alto desempenho podem se beneficiar desse modelo usando um grande número de recursos durante curtos períodos de tempo, ao invés de adquirir um aglomerado computacional de alto custo inicial. Nosso trabalho analisa a viabilidade de execução de aplicações de alto desempenho, comparando o desempenho de aplicações de alto desempenho em infraestruturas com comportamento conhecido com a nuvem pública oferecida pelo Google. Em especial, focamos em diferentes configurações de paralelismo com comunicação interna entre processos no mesmo nó, chamado de intra-nós, e comunicação externa entre processos em diferentes nós, chamado de entre-nós. Nosso caso de estudo para esse trabalho foi o NAS Parallel Benchmarks, um benchmark bastante popular para a análise de desempenho de sistemas paralelos e de alto desempenho. Utilizamos aplicações com implementações puramente MPI (para as comunicações intra e entre-nós) e implementações mistas onde as comunicações internas foram feitas utilizando OpenMP (comunicação intra-nós) e as comunicações externas foram feitas usando o MPI (comunicação entre-nós). / With the advent of cloud computing, it is no longer necessary to invest large amounts of money on computing resources. Instead, it is possible to obtain processing or storage resources, and even complete systems, on demand, using one of the several available services from cloud providers like Amazon, Google, Microsoft, and USP. Cloud computing allows greater control of operating expenses, reducing costs in many cases. For example, high-performance computing users can benefit from this model using a large number of resources for short periods of time, instead of acquiring a computer cluster with high initial cost. Our study examines the feasibility of running high-performance applications, comparing the performance of high-performance applications in a known infrastructure compared to the public cloud offering from Google. In particular, we focus on various parallel configurations with internal communication between processes on the same node, called intra-node, and external communication between processes on different nodes, called inter-nodes. Our case study for this work was the NAS Parallel Benchmarks, a popular benchmark for performance analysis of parallel systems and high performance computing. We tested applications with MPI-only implementations (for intra and inter-node communications) and mixed implementations where internal communications were made using OpenMP (intra-node communications) and external communications were made using the MPI (inter-node communications).
|
216 |
Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous SystemsErnstsson, August January 2020 (has links)
Today's society is increasingly software-driven and dependent on powerful computer technology. Therefore it is important that advancements in the low-level processor hardware are made available for exploitation by a growing number of programmers of differing skill level. However, as we are approaching the end of Moore's law, hardware designers are finding new and increasingly complex ways to increase the accessible processor performance. It is getting more and more difficult to effectively target these processing resources without expert knowledge in parallelization, heterogeneous computation, communication, synchronization, and so on. To ensure that the software side can keep up, advanced programming environments and frameworks are needed to bridge the widening gap between hardware and software. One such example is the pattern-centric skeleton programming model and in particular the SkePU project. The work presented in this thesis first redesigns the SkePU framework based on modern C++ variadic template metaprogramming and state-of-the-art compiler technology. It then explores new ways to improve performance: by providing new patterns, improving the data access locality of existing ones, and using both static and dynamic knowledge about program flow. The work combines novel ideas with practical evaluation of the approach on several applications. The advancements also include the first skeleton API that allows variadic skeletons, new data containers, and finally an approach to make skeleton programming more customizable without compromising universal portability. / <p>Ytterligare forskningsfinansiärer: EU H2020 project EXA2PRO (801015); SeRC.</p>
|
217 |
Design of an Optimized Supervisor Module for Tomographic Adaptive Optics Systems of Extremely Large TelescopesDoucet, Nicolas 08 January 2020 (has links)
The recent advent of next generation ground-based telescopes, code-named Extremely Large Telescopes (ELT), highlights the beginning of a forced march toward an era of deploying instruments capable of exploiting starlight captured by mirrors at an unprecedented scale. This confronts the astronomy community with both a daunting challenge and a unique opportunity. The challenge arises from the mismatch between the complexity of current instruments and their expected scaling with the square of the future telescope diameters, on which astronomy applications have relied to produce better science. To deliver on the promise of tomorrow’s ELT, astronomers must design new technologies that can effectively enhance the performance of the instrument at scale, while compensating for the atmospheric turbulence in real-time. This is an unsolved problem. This problem presents an opportunity because the astronomy community is now compelled to rethink essential components of the optical systems and their traditional hardware/software ecosystems in order to achieve high optical performance with a near real-time computational response. In order to realize the full potential of such instruments, we investigate a technique supporting Adaptive Optics (AO), i.e., a dedicated concept relying on turbulence tomography. In particular, a critical part of AO systems is the supervisor module, which is responsible for providing the system with a Tomographic Reconstructor (ToR) at a regular pace, as the atmospheric turbulence evolves over an observation window. In this thesis, we implement an optimized supervisor module and assess it under real configurations of the future European ELT (E-ELT) with a 39m diameter, the largest and most complex optical telescope ever conceived. This necessitates manipulating large matrix sizes (i.e., up to 100k × 100k) that contain measurements captured by multiple wavefront sensors. To address the complexity bottleneck, we employ high performance computing software solutions based on cutting-edge numerical algorithms using asynchronous, fine-grained computations as well as approximations techniques that leverage the resulting matrix data structure. Furthermore, GPU-based hardware accelerators are used in conjunction with the software solutions to ensure reasonable time-to-solution to cope with rapidly evolving atmospheric turbulence. The proposed software/hardware solution permits to reconstruct an image with high accuracy. We demonstrate the validity of the AO systems with a third-party testbed simulating at the E-ELT scale, which is intended to pave the way for a first prototype installed on-site
|
218 |
Instalace a konfigurace Octave výpočetního clusteru / Installation and configuration of Octave computation clusterMikulka, Zdeněk January 2014 (has links)
This diploma thesis contains detailed design of high-performance cluster, primarely focused for parallel computing in Octave application. Each of component of this cluster is described along with instructions for installation and configuration. Cluster is based on GNU/Linux operating system and Message Parsing Interface. Design alllows implementation of this cluster in computers of schoolroom with active lessons.
|
219 |
Low-rank Approximations in Quantum Transport SimulationsDaniel A. Lemus (5929940) 07 May 2020 (has links)
Quantum-mechanical effects play a major role in the performance of modern electronic devices. In order to predict the behavior of novel devices, quantum effects are often included using Non-Equilibrium Green's Function (NEGF) methods in atomistic device representations. These quantum effects may include realistic inelastic scattering caused by device impurities and phonons. With the inclusion of realistic physical phenomena, the computational load of predictive simulations increases greatly, and a manageable basis through low-rank approximations is desired.<br><br>In this work, low-rank approximations are used to reduce the computational load of atomistic simulations. The benefits of basis reductions on simulation time and peak memory are assessed.<br>The low-rank approximation method is then extended to include more realistic physical effects than those modeled today, including exact calculations of scattering phenomena. The inclusion of these exact calculations are then contrasted to current methods and approximations.
|
220 |
Modélisation et implémentation de simulations multi-agents sur architectures massivement parallèles / Modeling and implementing multi-agents based simulations on massively parallel architecturesHermellin, Emmanuel 18 November 2016 (has links)
La simulation multi-agent représente une solution pertinente pour l’ingénierie et l’étude des systèmes complexes dans de nombreux domaines (vie artificielle, biologie, économie, etc.). Cependant, elle requiert parfois énormément de ressources de calcul, ce qui représente un verrou technologique majeur qui restreint les possibilités d'étude des modèles envisagés (passage à l’échelle, expressivité des modèles proposés, interaction temps réel, etc.).Parmi les technologies disponibles pour faire du calcul intensif (High Performance Computing, HPC), le GPGPU (General-Purpose computing on Graphics Processing Units) consiste à utiliser les architectures massivement parallèles des cartes graphiques (GPU) comme accélérateur de calcul. Cependant, alors que de nombreux domaines bénéficient des performances du GPGPU (météorologie, calculs d’aérodynamique, modélisation moléculaire, finance, etc.), celui-ci est peu utilisé dans le cadre de la simulation multi-agent. En fait, le GPGPU s'accompagne d’un contexte de développement très spécifique qui nécessite une transformation profonde et non triviale des modèles multi-agents. Ainsi, malgré l'existence de travaux pionniers qui démontrent l'intérêt du GPGPU, cette difficulté explique le faible engouement de la communauté multi-agent pour le GPGPU.Dans cette thèse, nous montrons que, parmi les travaux qui visent à faciliter l'usage du GPGPU dans un contexte agent, la plupart le font au travers d’une utilisation transparente de cette technologie. Cependant, cette approche nécessite d’abstraire un certain nombre de parties du modèle, ce qui limite fortement le champ d’application des solutions proposées. Pour pallier ce problème, et au contraire des solutions existantes, nous proposons d'utiliser une approche hybride (l'exécution de la simulation est partagée entre le processeur et la carte graphique) qui met l'accent sur l'accessibilité et la réutilisabilité grâce à une modélisation qui permet une utilisation directe et facilitée de la programmation GPU. Plus précisément, cette approche se base sur un principe de conception, appelé délégation GPU des perceptions agents, qui consiste à réifier une partie des calculs effectués dans le comportement des agents dans de nouvelles structures (e.g. dans l’environnement). Ceci afin de répartir la complexité du code et de modulariser son implémentation. L'étude de ce principe ainsi que les différentes expérimentations réalisées montre l'intérêt de cette approche tant du point de vue conceptuel que du point de vue des performances. C'est pourquoi nous proposons de généraliser cette approche sous la forme d'une méthodologie de modélisation et d'implémentation de simulations multi-agents spécifiquement adaptée à l'utilisation des architectures massivement parallèles. / Multi-Agent Based Simulations (MABS) represents a relevant solution for the engineering and the study of complex systems in numerous domains (artificial life, biology, economy, etc.). However, MABS sometimes require a lot of computational resources, which is a major constraint that restricts the possibilities of study for the considered models (scalability, real-time interaction, etc.).Among the available technologies for HPC (High Performance Computing), the GPGPU (General-Purpose computing on Graphics Processing Units) proposes to use the massively parallel architectures of graphics cards as computing accelerator. However, while many areas benefit from GPGPU performances (meteorology, molecular dynamics, finance, etc.). Multi-Agent Systems (MAS) and especially MABS hardly enjoy the benefits of this technology: GPGPU is very little used and only few works are interested in it. In fact, the GPGPU comes along with a very specific development context which requires a deep and not trivial transformation process for multi-agents models. So, despite the existence of works that demonstrate the interest of GPGPU, this difficulty explains the low popularity of GPGPU in the MAS community.In this thesis, we show that among the works which aim to ease the use of GPGPU in an agent context, most of them do it through a transparent use of this technology. However, this approach requires to abstract some parts of the models, what greatly limits the scope of the proposed solutions. To handle this issue, and in contrast to existing solutions, we propose to use a nhybrid approach (the execution of the simulation is shared between both the processor and graphics card) that focuses on accessibility and reusability through a modeling process that allows to use directly GPU programming while simplifying its use. More specifically, this approach is based on a design principle, called GPU delegation of agent perceptions, consists in making a clear separation between the agent behaviors, managed by the processor, and environmental dynamics, handled by the graphics card. So, one major idea underlying this principle is to identify agent computations which can be transformed in new structures (e.g. in the environment) in order to distribute the complexity of the code and modulate its implementation. The study of this principle and the different experiments conducted show the advantages of this approach from both a conceptual and performances point of view. Therefore, we propose to generalize this approach and define a comprehensive methodology relying on GPU delegation specifically adapted to the use of massively parallel architectures for MABS.
|
Page generated in 0.1169 seconds