Global ETD Search

1	Compiler-assisted staggered checkpointing Norman, Alison Nicholas 23 November 2010 (has links) To make progress in the face of failures, long-running parallel applications need to save their state, known as a checkpoint. Unfortunately, current checkpointing techniques are becoming untenable on large-scale supercomputers. Many applications checkpoint all processes simultaneously--a technique that is easy to implement but often saturates the network and file system, causing a significant increase in checkpoint overhead. This thesis introduces compiler-assisted staggered checkpointing, where processes checkpoint at different places in the application text, thereby reducing contention for the network and file system. This checkpointing technique is algorithmically challenging since the number of possible solutions is enormous and the number of desirable solutions is small, but we have developed a compiler algorithm that both places staggered checkpoints in an application and ensures that the solution is desirable. This algorithm successfully places staggered checkpoints in parallel applications configured to use tens of thousands of processes. For our benchmarks, this algorithm successfully finds and places useful recovery lines that are up to 37% faster for all configurations than recovery lines where all processes write their data at approximately the same time. We also analyze the success of staggered checkpointing by investigating sets of application and system characteristics for which it reduces network and file system contention. We find that for many configurations, staggered checkpointing reduces both checkpointing time and overall execution time. To perform these analyses, we develop an event-driven simulator for large-scale systems that estimates the behavior of the network, global file system, and local hardware using predictive models. Our simulator allows us to accurately study applications that have thousands of processes; it on average predicts execution times as 83% of their measured value. / text Supercomputing Checkpointing Simulator Large-scale parallel applications
2	Uso das características computacionais de regiões paralelas OpenMP para redução do consumo de energia Moro, Gabriel Bronzatti January 2018 (has links) Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System. Processamento paralelo Consumo : Energia Energy Consumption OpenMP Parallel Applications
3	Performance Projections of HPC Applications on Chip Multiprocessor (CMP) Based Systems Shawky Sharkawi, Sameh Sh 2011 May 1900 (has links) Performance projections of High Performance Computing (HPC) applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems and help HPC users with system procurement and application refinements. In this dissertation, we present an efficient method to project the performance of HPC applications onto Chip Multiprocessor (CMP) based systems using widely available standard benchmark data. The main advantage of this method is the use of published data about the target machine; the target machine need not be available. With the current trend in HPC platforms shifting towards cluster systems with chip multiprocessors (CMPs), efficient and accurate performance projection becomes a challenging task. Typically, CMP-based systems are configured hierarchically, which significantly impacts the performance of HPC applications. The goal of this research is to develop an efficient method to project the performance of HPC applications onto systems that utilize CMPs. To provide for efficiency, our projection methodology is automated (projections are done using a tool) and fast (with small overhead). Our method, called the surrogate-based workload application projection method, utilizes surrogate benchmarks to project an HPC application performance on target systems where computation component of an HPC application is projected separately from the communication component. Our methodology was validated on a variety of systems utilizing different processor and interconnect architectures with high accuracy and efficiency. The average projection error on three target systems was 11.22 percent with standard deviation of 1.18 percent for twelve HPC workloads. High Performance Computing Parallel Systems Parallel Applications Performance Projection Prediction
4	Definition of Framework-based Performance Models for Dynamic Performance Tuning Cesar Galobardes, Eduardo 07 April 2006 (has links) Parallel and distributed programming constitutes a highly promising approach to improving the performance of many applications. However, in comparison to sequential programming, many new problems arise in all phases of the development cycle of this kind of applications. For example, in the analysis phase of parallel/distributed programs, the programmer has to decompose the problem (data and/or code) to find the concurrency of the algorithm. In the design phase, the programmer has to be aware of the communication and synchronization conditions between tasks. In the implementation phase, the programmer has to learn how to use specific communication libraries and runtime environments but also to find a way of debugging programs. Finally, to obtain the best performance, the programmer has to tune the application by using monitoring tools, which collect information about the application's behavior. Tuning can be a very difficult task because it can be difficult to relate the information gathered by the monitor to the application's source code. Moreover, tuning can be even more difficult for those applications that change their behavior dynamically because, in this case, a problem might happen or not depending on the execution conditions.It can be seen that these issues require a high degree of expertise, which prevents the more widespread use of this kind of solution. One of the best ways to solve these problems would be to develop, as has been done in sequential programming, tools to support the analysis, design, coding, and tuning of parallel/distributed applications. In the particular case of performance analysis and/or tuning, it is important to note that the best way of analyzing and tuning parallel/distributed applications depends on some of their behavioral characteristics. If the application to be tuned behaves in a regular way then a static analysis (predictive or trace based) would be enough to find the application's performance bottlenecks and to indicate what should be done to overcome them. However, if the application changes its behavior from execution to execution or even dynamically changes its behavior in a single execution then the static analysis cannot offer efficient solutions for avoiding performance bottlenecks. In this case, dynamic monitoring and tuning techniques should be used instead. However, in dynamic monitoring and tuning, decisions must be taken efficiently, which means that the application's performance analysis outcome must be accurate and punctual in order to effectively tackle problems; at the same time, intrusion on the application must be minimized because the instrumentation inserted in the application in order to monitor and tune it alters its behavior and could introduce performance problems that were not there before the instrumentation. This is more difficult to achieve if there is no information about the structure and behavior of the application; therefore, blind automatic dynamic tuning approaches have limited success, whereas cooperative dynamic tuning approaches can cope with more complex problems at the cost of asking for user collaboration. We have proposed a third approach. If a programming tool, based on the use of skeletons or frameworks, has been used in the development of the application then much information about the structure and behavior of the application is available and a performance model associated to the structure of the application can be defined for use by the dynamic tuning tool. The resulting tuning tool should produce the outcome of a collaborative one while behaving like an automatic one from the point of view of the application developer. Performance models Performance tuning Parallel applications Tecnologies 519.1
5	Acceleration of Parallel Applications by Moving Code Instead of Data Farahaninia, Farzad January 2014 (has links) After the performance improvement rate in single-core processors decreased in 2000s, most CPU manufacturers have steered towards parallel computing. Parallel computing has been in the spotlight for a while now. Several hardware and software innovations are being examined and developed, in order to improve the efficiency of parallel computing. Signal processing is an important application area of parallel computing, and this makes parallel computing interesting for Ericsson AB, a company that among other business areas, is mainly focusing on communication technologies. The Ericsson baseband research team at Lindholmen, has been developing a small, experimental basic operating system (BOS) for research purposes within the area of parallel computing. One major overhead in parallel applications, which increases the latency in applications, is the communication overhead between the cores. It had been observed that in some signal processing applications, it is common for some tasks of the parallel application to have a large data size but a small code size. The question was risen then, could it be beneficial to move code instead of data in such cases, to reduce the communication overhead. In this thesis work the gain and practical difficulties of moving code are investigated through implementation. A method has been successfully developed and integrated into BOS to move the code between the cores on a multi-core architecture. While it can be a very specific class of applications in which it is useful to move code, it is shown that it is possible to move the code between the cores with zero extra overhead. Many Core Parallel Applications Epiphany LTE Signal Processing Embedded
6	Processamento térmico de grafeno e sua síntese pela técnica de epitaxia por feixes moleculares Rolim, Guilherme Koszeniewski January 2018 (has links) Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System. Microeletrônica Grafeno Feixes moleculares Energy Consumption OpenMP Parallel Applications
7	Uso das características computacionais de regiões paralelas OpenMP para redução do consumo de energia Moro, Gabriel Bronzatti January 2018 (has links) Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System. Processamento paralelo Consumo : Energia Energy Consumption OpenMP Parallel Applications
8	Processamento térmico de grafeno e sua síntese pela técnica de epitaxia por feixes moleculares Rolim, Guilherme Koszeniewski January 2018 (has links) Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System. Microeletrônica Grafeno Feixes moleculares Energy Consumption OpenMP Parallel Applications
9	Uso das características computacionais de regiões paralelas OpenMP para redução do consumo de energia Moro, Gabriel Bronzatti January 2018 (has links) Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System. Processamento paralelo Consumo : Energia Energy Consumption OpenMP Parallel Applications
10	A scheduling framework for dynamically resizable parallel applications Swaminathan, Gautam 18 February 2005 (has links) Applications in science and engineering require large parallel systems in order to solve computational problems within a reasonable timeframe. These applications can benefit from dynamic resizing during the course of their execution. Dynamic resizing enables fine-grained control over resource allocation to jobs and results in better system throughput and job turn around time. We have implemented a framework that enabled dynamic resizing of MPI applications. Our framework uses the recently released MPI-2 standard that enables dynamic resizing. The work described in this thesis is part of a larger effort to design and implement a system for supporting and leveraging dynamically resizable parallel applications. We provide a scheduling framework, an API for dynamic resizing and libraries to efficiently redistribute data to new processor topologies. / Master of Science Dynamic resizing parallel applications MPI-2 scheduling framework

Search results