Spelling suggestions: "subject:"performancecomputing"" "subject:"performance.comparing""
401 |
Autoelastic: explorando a elasticidade de recursos de computação em nuvem para a execução de aplicações de alto desempenho iterativaRodrigues, Vinicius Facco 29 February 2016 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-05-09T12:51:39Z
No. of bitstreams: 1
Vinicius Facco Rodrigues_.pdf: 2415611 bytes, checksum: 1672419839adc1b3095f04e90badce93 (MD5) / Made available in DSpace on 2016-05-09T12:51:40Z (GMT). No. of bitstreams: 1
Vinicius Facco Rodrigues_.pdf: 2415611 bytes, checksum: 1672419839adc1b3095f04e90badce93 (MD5)
Previous issue date: 2016-02-29 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / PROSUP - Programa de Suporte à Pós-Gradução de Instituições de Ensino Particulares / Elasticidade de recursos é uma das características chave da Computação em Nuvem. Através dessa funcionalidade, recursos computacionais podem ser adicionados ou removidos ao ambiente a qualquer momento, permitindo aplicações escalarem dinamicamente, evitando provisionamento excessivo ou restrito de recursos. Considerando a área de computação de alto desempenho, conhecida também como HPC (High Performance Computing), iniciativas baseadas em sacola-de-tarefas utilizam um balanceador de carga e instâncias de máquinas virtuais (VM) fracamente acopladas. Neste cenário, os processos
desempenham papéis independentes, facilitando a adição e remoção de VM’s pois o balanceador de carga se encarrega de distribuir tarefas entre os processos das VM’s ativas. Entretanto, aplicações HPC iterativas se caracterizam por serem fortemente acopladas e terem dificuldade de obter vantagem da elasticidade pois, em tais aplicações, geralmente os processos são fixos durante todo o tempo de execução. Devido a isso, a simples adição de novos recursos não garante que os mesmos serão utilizados pelos processos da aplicação. Além disso, a remoção de processos pode comprometer a inteira execução da aplicação, pois cada processo desempenha um papel fundamental em seu ciclo de execução. Aplicações iterativas voltadas para HPC são comumente implementadas utilizando MPI (Message Passing Interface) e neste contexto, fazer o uso da elasticidade torna-se um desafio pois é necessária a reescrita do código fonte para o tratamento da reorganização de recursos. Tal estratégia muitas vezes requer um conhecimento prévio do comportamento da aplicação, sendo necessárias interrupções do fluxo de execução nos momentos de reorganização de recursos. Além disso, utilizando MPI 2.0, em que há a possibilidade da alteração da quantidade de processos em tempo de execução, existem problemas relacionados em como tirar proveito da elasticidade pois o desenvolvedor deve por si mesmo gerenciar a reorganização da topologia de comunicação. Ainda, consolidações repentinas de máquinas virtuais que executam processos da aplicação podem comprometer a sua execução. Focando nessas questões, propõe-se nessa dissertação um modelo de elasticidade baseado na camada PaaS (Platform as a Service) da nuvem, chamado AutoElastic. AutoElastic age como um middleware permitindo que aplicações HPC iterativas obtenham vantagem do provisionamento de recursos dinâmico de uma infraestrutura de nuvem sem a necessidade de modificações no código fonte. AutoElastic oferece a elasticidade de forma automática, não sendo necessária a configuração de regras por parte do usuário. O mecanismo de elasticidade conta com a utilização de threholds fixos além de oferecer uma nova abordagem em que eles se auto ajustam durante a execução da aplicação. Ainda, AutoElastic oferece também um novo conceit o nomeado como elasticidade assíncrona, o qual oferece um arcabouço para permitir que a execução de aplicações não seja bloqueada enquanto recursos são adicionados ou removidos do ambiente. A viabilidade de AutoElastic é demonstrada através de um protótipo que executa uma aplicação de integração numérica CPU-Bound
sobre a plataforma de nuvem OpenNebula. Resultados com tal aplicação demonstraram ganhos de desempenho de 28,4% a 59% quando comparadas diferentes execuções elásticas e não elásticas. Além disso, testes com diferentes parametrizações de thresholds e diferentes cargas de trabalho demonstraram que no uso de thresholds fixos, o valor do threshold superior possui maior impacto que o inferior no desempenho e consumo de recursos por parte da aplicação. / Elasticity is one of the key features of cloud computing. Using this functionality, we can increase or decrease the amount of computational resources of the cloud at any time, enabling applications to dynamically scale computing and storage resources, avoiding overand under-provisioning. In high performance computing (HPC), initiatives like bag-oftasks or key-value applications use a load balancer and a loosely-coupled set of virtual machine (VM) instances. In this scenario, it is easier to add or remove virtual machines because the load balancer is in charge of distribute tasks between the active processes. However, iterative HPC applications are characterized by being tightly-coupled and have difficulty to take advantage of the elasticity because in such applications the amount of processes is fixed throughout the application runtime. In fact, the simple addition of new resources does not guarantee that the processes will use them. Moreover, removing a single process can compromise the entire execution of the application because each process plays a key role in its execution cycle. Iterative applications related to HPC are commonly implemented using MPI (Message Passing Interface). In the joint-field of MPI and tightly-coupled HPC applications, it is a challenge use the elasticity feature since we need re-write the source code to address resource reorganization. Such strategy requires prior knowledge of application behaviour, requiring stop-reconfigure-and-go approaches when reorganizing resources. Besides, using MPI 2.0, in which the number of process can be changed during the application execution, there are problems related to how profit this new feature in the HPC scope, since the developer needs to handle the communication topology by himself. Moreover, sudden consolidation of a VM, together with a process, can compromise the entire execution. To address these issues, we propose a PaaS-based elasticity model, named AutoElastic. It acts as a middleware that allows iterative HPC applications to take advantage of dynamic resource provisioning of cloud infrastructures without any major modification. AutoElastic offers elasticity automatically, where the user does not need to configure any resource management policy. This elastic mechanism includes using fixed thresholds as well as offering a new approach where it self adjusts the threshold values during the application execution. AutoElastic provides a new concept denoted here as asynchronous elasticity, i.e., it provides a framework to allow applications to either increase or decrease their computing resources without blocking the current execution. The feasibility of AutoElastic is demonstrated through a prototype that runs a CPU-bound numerical integration application on top of the OpenNebula middleware. Results with a parallel iterative application showed performance gains between 28.4% and 59% when comparing different executions enabling and disabling elasticity feature. In addition, tests with different parameters showed that when using threshold-rule based techniques with fixed thresholds, the upper threshold has a greater impact in performance and resource consumption than the lower threshold.
|
402 |
Rheology and Structure Formation in Complex Polymer MeltsSchneider, Ludwig 10 April 2019 (has links)
No description available.
|
403 |
Hardware and software co-design toward flexible terabits per second traffic processing / Co-conception matérielle et logicielle pour du traitement de trafic flexible au-delà du terabit par secondeCornevaux-Juignet, Franck 04 July 2018 (has links)
La fiabilité et la sécurité des réseaux de communication nécessitent des composants efficaces pour analyser finement le trafic de données. La diversification des services ainsi que l'augmentation des débits obligent les systèmes d'analyse à être plus performants pour gérer des débits de plusieurs centaines, voire milliers de Gigabits par seconde. Les solutions logicielles communément utilisées offrent une flexibilité et une accessibilité bienvenues pour les opérateurs du réseau mais ne suffisent plus pour répondre à ces fortes contraintes dans de nombreux cas critiques.Cette thèse étudie des solutions architecturales reposant sur des puces programmables de type Field-Programmable Gate Array (FPGA) qui allient puissance de calcul et flexibilité de traitement. Des cartes équipées de telles puces sont intégrées dans un flot de traitement commun logiciel/matériel afin de compenser les lacunes de chaque élément. Les composants du réseau développés avec cette approche innovante garantissent un traitement exhaustif des paquets circulant sur les liens physiques tout en conservant la flexibilité des solutions logicielles conventionnelles, ce qui est unique dans l'état de l'art.Cette approche est validée par la conception et l'implémentation d'une architecture de traitement de paquets flexible sur FPGA. Celle-ci peut traiter n'importe quel type de paquet au coût d'un faible surplus de consommation de ressources. Elle est de plus complètement paramétrable à partir du logiciel. La solution proposée permet ainsi un usage transparent de la puissance d'un accélérateur matériel par un ingénieur réseau sans nécessiter de compétence préalable en conception de circuits numériques. / The reliability and the security of communication networks require efficient components to finely analyze the traffic of data. Service diversification and through put increase force network operators to constantly improve analysis systems in order to handle through puts of hundreds,even thousands of Gigabits per second. Commonly used solutions are software oriented solutions that offer a flexibility and an accessibility welcome for network operators, but they can no more answer these strong constraints in many critical cases.This thesis studies architectural solutions based on programmable chips like Field-Programmable Gate Arrays (FPGAs) combining computation power and processing flexibility. Boards equipped with such chips are integrated into a common software/hardware processing flow in order to balance short comings of each element. Network components developed with this innovative approach ensure an exhaustive processing of packets transmitted on physical links while keeping the flexibility of usual software solutions, which was never encountered in the previous state of theart.This approach is validated by the design and the implementation of a flexible packet processing architecture on FPGA. It is able to process any packet type at the cost of slight resources over consumption. It is moreover fully customizable from the software part. With the proposed solution, network engineers can transparently use the processing power of an hardware accelerator without the need of prior knowledge in digital circuit design.
|
404 |
Development and application of an enhanced sampling molecular dynamics method to the conformational exploration of biologically relevant moleculesAlibay, Irfan January 2017 (has links)
This thesis describes the development a new swarm-enhanced sampling methodology and its application to the exploration of biologically relevant molecules. First, the development of a new multi-dimensional swarm-enhanced sampling molecular dynamics (msesMD) approach is detailed. Relative to the original swarm-enhanced sampling molecular dynamics (sesMD) methodology, the msesMD method demonstrates improved parameter transferability, resulting in more extensive sampling when scaling to larger systems such as alanine heptapeptide. The implementation and optimisation of the swarm-enhanced sampling algorithms in the AMBER software suite are also described. Through the use of the newer pmemd molecular dynamics (MD) engine and asynchronous MPI routines, speedups of up to three times the original sesMD implementation were achieved. The msesMD method is then applied to the investigation of carbohydrates, first looking at rare conformational changes in Lewis oligosaccharides. Validating against multi-microsecond unbiased MD trajectories and other enhanced sampling methods, the msesMD simulations identified rare conformational changes leading to the adoption of non-canonical unstacked core trisaccharide structures. Next, the use of msesMD as a tool to probe pyranose ring pucker events is explored. Evaluating against four benchmark monosaccharide systems, msesMD simulations accurately recover puckering details not easily obtained via multi-microsecond unbiased MD. This was followed by an exploration of the impact of ring substituents on conformation in glycosaminoglycan monosaccharides: through msesMD simulations, the influence of specific sulfation patterns were explored, finding that in some cases, such as 4-O-sulfation in N-acetyl-galactosamine, large changes in the relative stability of ring conformers can arise. Finally, the msesMD method was coupled with a thermodynamic integration scheme and used to evaluate solvation free energies for small molecule systems. Comparing against independent trajectory TI simulations, it was found that although the correct solvation free energies were obtained, the msesMD based method did not offer an advantage over unbiased MD for these small molecule systems. However, interesting discrepancies in free energy estimates arising from the use of hydrogen mass repartitioning were found.
|
405 |
Algoritmo distribuído para alocação de múltiplos recursos em ambientes distribuídos. / Distributed algorithm for multiple resource allocation in a distributed environment.Francisco Ribacionka 07 June 2013 (has links)
Ao considerar um sistema distribuído composto por um conjunto de servidores, clientes e recursos, que caracterizam ambientes como grades ou nuvens computacionais, que oferecem um grande número de recursos distribuídos como CPUs ou máquinas virtuais, os quais são utilizados conjuntamente por diferentes tipos de aplicações, tem-se a necessidade de se ter uma solução para alocação destes recursos. O apoio à alocação dos recursos fornecidos por tais ambientes deve satisfazer todas as solicitações de recursos das aplicações, e fornecer respostas afirmativas para alocação eficiente de recursos, fazer justiça na alocação no caso de pedidos simultâneos entre vários clientes de recursos e responder em um tempo finito a requisições. Considerando tal contexto de grande escala em sistemas distribuídos, este trabalho propõe um algoritmo distribuído para alocação de recursos. Este algoritmo explora a Lógica Fuzzy sempre que um servidor está impossibilitado de atender a uma solicitação feita por um cliente, encaminhando esta solicitação a um servidor remoto. O algoritmo utiliza o conceito de relógio lógico para garantir justiça no atendimento das solicitações feitas em todos os servidores que compartilham recursos. Este algoritmo segue o modelo distribuído, onde uma cópia do algoritmo é executada em cada servidor que compartilha recursos para seus clientes, e todos os servidores tomam parte das decisões com relação a alocação destes recursos. A estratégia desenvolvida tem como objetivo minimizar o tempo de resposta na alocação de recursos, funcionando como um balanceamento de carga em um ambiente cliente-servidor com alto índice de solicitações de recursos pelos clientes. A eficiência do algoritmo desenvolvido neste trabalho foi comprovada através da implementação e comparação com outros algoritmos tradicionais, mostrando a possibilidade de utilização de recursos que pertencem a distintos servidores por uma mesma solicitação de recursos, com a garantia de que esta requisição será atendida, e em um tempo finito. / When considering a distributed system composed of a set of servers, clients, and resources that characterize environments like computational grids or clouds that offer a large number of distributed resources such as CPUs or virtual machines, which are used jointly by different types of applications, there is the need to have a solution for allocating these resources. Support the allocation of resources provided by such environments must satisfy all Requests for resources such applications, and provide affirmative answers to the efficient allocation of resources, to do justice in this allocation in the case of simultaneous Requests from multiple clients and answer these resources in a finite time these Requests. Considering such a context of large- scale distributed systems, this paper proposes a distributed algorithm for resource allocation This algorithm exploits fuzzy logic whenever a server is unable to meet a request made by a client, forwarding this request to a remote server. The algorithm uses the concept of logical clock to ensure fairness in meeting the demands made on all servers that share resources. This algorithm follows a distributed model, where a copy of the algorithm runs on each server that shares resources for its clients and all servers take part in decisions regarding allocation of resources. The strategy developed aims to minimize the response time in allocating resources, functioning as a load-balancing in a client-server environment with high resource Requests by customers.
|
406 |
Résilience dans les Systèmes de Workflow Distribués pour les Applications d’Optimisation Numérique : Conception et Expériences / Collaborative platform for multidiscipline optimizationTrifan, Laurentiu 21 October 2013 (has links)
Cette thèse vise à la conception d'un environnement pour le calcul haute performance dans un cadre d'optimisation numérique. Les outils de conception et d’optimisation sont répartis dans plusieurs équipes distantes, académiques et industrielles, qui collaborent au sein des mêmes projets. Les outils doivent être fédérés au sein d’un environnement commun afin d'en faciliter l'accès aux chercheurs et ingénieurs. L'environnement que nous proposons, pour répondre aux conditions précédentes, se compose d’un système de workflow et d’un système de calcul distribué. Le premier a pour objectif de faciliter la tâche de conception de l'application tandis que le second se charge de l’exécution sur des ressources de calcul distribuées. Bien sûr, des services de communication entre les deux systèmes doivent être développés. Les calculs doivent être réalisés de manière efficace, en prenant en compte le parallélisme interne de certains codes, l’exécution synchrone ou asynchrone des tâches, le transfert des données et les ressources matérielles et logicielles disponibles (répartition de charge par exemple). De plus, l’environnement doit assurer un bon niveau de tolérance aux pannes et aux défaillances logicielles, afin de minimiser leur influence sur le résultat final ou sur le temps de calcul. Une condition importante en particulier est de pouvoir implanter des dispositifs de reprise sur erreur, de telle sorte que le temps supplémentaire de traitement des erreurs reste très inférieur au temps de re-exécution total. Dans le cadre de ce travail, notre choix s'est porté sur le moteur de workflow Yawl, qui présente de bonnes caractéristiques en termes i) d'indépendance vis à vis du matériel et du logiciel (système client-serveur pouvant fonctionner sur du matériel hétérogène) et ii) de mécanisme de reprise sur erreur. Pour la partie calcul distribué, nos expériences ont été réalisées sur la plateforme Grid5000, en utilisant jusqu'à 64 machines différentes réparties sur cinq sites géographiques. Ce document détaille les choix de conception de cet environnement ainsi que les ajouts et modifications que nous avons été amenés à apporter à Yawl pour lui permettre de fonctionner sur une plateforme distribuée. / This thesis aims conceiving an environment for high performance computing in a numerical optimization context. The tools for conception and optimization are distributed across several teams, both academics and industrial, which collaborate inside a unique project. The tools should be federated within a common environment to facilitate access to researchers and engineers. The environment that we offer, in order to meet the above conditions, consists of a workflow system and a distributed computing system. The first system aims to facilitate the application design task while the latter is responsible for executing on distributed computing resources. Of course, communication services between the two systems must be developed. The computation must be performed effectively, taking into account the internal parallelism of some software code, synchronous or asynchronous task execution, the transfer of data and hardware and software resources available (e.g. load balancing). In addition, the environment should provide a good level of fault tolerance and software failures, to minimize their influence on the final result or the computation time. An important condition in particular is to implement recovery devices on error occurence, so that the extra time for error handling remains well below the total time of re-execution. As part of this work, our choice fell on the Yawl workflow engine, which has good characteristics in terms of i) hardware and software independence (client-server system that can run on heterogeneous hardware) and ii) error recovery mechanism. For distributed computing part, our experiments were performed on the Grid5000 platform, using up to 64 different machines on five geographical sites. This document details the design of this environment and the extensions and changes we have had to perform on Yawl to enable it to run on a distributed platform.
|
407 |
Surface Modified Capillaries in Capillary Electrophoresis Coupled to Mass Spectrometry : Method Development and Exploration of the Potential of Capillary Electrophoresis as a Proteomic ToolZuberovic, Aida January 2009 (has links)
The increased knowledge about the complexity of the physiological processes increases the demand on the analytical techniques employed to explore them. A comprehensive analysis of the entire sample content is today the most common approach to investigate the molecular interplay behind a physiological deviation. For this purpose a method that offers a number of important properties, such as speed and simplicity, high resolution and sensitivity, minimal sample volume requirements, cost efficiency and robustness, possibility of automation, high-throughput and wide application range of analysis is requested. Capillary electrophoresis (CE) coupled to mass spectrometry (MS) has a great potential and fulfils many of these criteria. However, further developments and improvements of these techniques and their combination are required to meet the challenges of complex biological samples. Protein analysis using CE is a challenging task due to protein adsorption to the negatively charged fused-silica capillary wall. This is especially emphasised with increased basicity and size of proteins and peptides. In this thesis, the adsorption problem was addressed by using an in-house developed physically adsorbed polyamine coating, named PolyE-323. The coating procedure is fast and simple that generates a coating stable over a wide pH range, 2-11. By coupling PolyE-323 modified capillaries to MS, either using electrospray ionisation (ESI) or matrix-assisted laser desorption/ionisation (MALDI), successful analysis of peptides, proteins and complex samples, such as protein digests and crude human body fluids were obtained. The possibilities of using CE-MALDI-MS/MS as a proteomic tool, combined with a proper sample preparation, are further demonstrated by applying high-abundant protein depletion in combination with a peptide derivatisation step or isoelectric focusing (IEF). These approaches were applied in profiling of the proteomes of human cerebrospinal fluid (CSF) and human follicular fluid (hFF), respectively. Finally, a multiplexed quantitative proteomic analysis was performed on a set of ventricular cerebrospinal fluid (vCSF) samples from a patient with traumatic brain injury (TBI) to follow relative changes in protein patterns during the recovery process. The results presented in this thesis confirm the potential of CE, in combination with MS, as a valuable choice in the analysis of complex biological samples and clinical applications.
|
408 |
A model of dynamic compilation for heterogeneous compute platformsKerr, Andrew 10 December 2012 (has links)
Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity.
The rise of parallelism adds an additional dimension to the challenge of portability, as
different processors support different notions of parallelism, whether vector parallelism executing
in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software
experiences obstacles to portability and efficient execution beyond differences in instruction sets;
rather, the underlying execution models of radically different architectures may not be compatible.
Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction
layer decoupling program representations from optimized binaries, thus enabling portability without
encumbering performance. This dissertation proposes several techniques that extend dynamic
compilation to data-parallel execution models. These contributions include:
- characterization of data-parallel workloads
- machine-independent application metrics
- framework for performance modeling and prediction
- execution model translation for vector processors
- region-based compilation and scheduling
We evaluate these claims via the development of a novel dynamic compilation framework,
GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables
the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a
functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage
of vector instruction set extensions, and effectively exploit data locality via scheduling which
attempts to maximize control locality.
|
409 |
Execution Of Distributed Database Queries On A Hpc SystemOnder, Ibrahim Seckin 01 January 2010 (has links) (PDF)
Increasing performance of computers and ability to connect computers with high speed communication networks make distributed databases systems an attractive research area. In this study, we evaluate communication and data processing capabilities of a HPC machine. We calculate accurate cost formulas for high volume data communication between processing nodes and experimentally measure sorting times. A left deep query plan executer has been implemented and experimentally used for executing plans generated by two different genetic algorithms for a distributed database environment using message passing paradigm to prove that a parallel system can provide scalable performance by increasing the number of nodes used for storing database relations and processing nodes. We compare the performance of plans generated by genetic algorithms with optimal plans generated by exhaustive search algorithm. Our results have verified that optimal plans are better than those of genetic algorithms, as expected.
|
410 |
Parallel Solution Of Soil-structure Interaction Problems On Pc ClustersBahcecioglu, Tunc 01 February 2011 (has links) (PDF)
Numerical assessment of soil structure interaction problems require heavy computational efforts because of the dynamic and iterative (nonlinear) nature of the problems. Furthermore,
modeling soil-structure interaction may require
|
Page generated in 0.1126 seconds