• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 49
  • 17
  • 6
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 99
  • 99
  • 20
  • 19
  • 17
  • 14
  • 11
  • 10
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Adaptive Brain-Computer Interface Systems For Communication in People with Severe Neuromuscular Disabilities

Mainsah, Boyla O. January 2016 (has links)
<p>Brain-computer interfaces (BCI) have the potential to restore communication or control abilities in individuals with severe neuromuscular limitations, such as those with amyotrophic lateral sclerosis (ALS). The role of a BCI is to extract and decode relevant information that conveys a user's intent directly from brain electro-physiological signals and translate this information into executable commands to control external devices. However, the BCI decision-making process is error-prone due to noisy electro-physiological data, representing the classic problem of efficiently transmitting and receiving information via a noisy communication channel. </p><p>This research focuses on P300-based BCIs which rely predominantly on event-related potentials (ERP) that are elicited as a function of a user's uncertainty regarding stimulus events, in either an acoustic or a visual oddball recognition task. The P300-based BCI system enables users to communicate messages from a set of choices by selecting a target character or icon that conveys a desired intent or action. P300-based BCIs have been widely researched as a communication alternative, especially in individuals with ALS who represent a target BCI user population. For the P300-based BCI, repeated data measurements are required to enhance the low signal-to-noise ratio of the elicited ERPs embedded in electroencephalography (EEG) data, in order to improve the accuracy of the target character estimation process. As a result, BCIs have relatively slower speeds when compared to other commercial assistive communication devices, and this limits BCI adoption by their target user population. The goal of this research is to develop algorithms that take into account the physical limitations of the target BCI population to improve the efficiency of ERP-based spellers for real-world communication. </p><p>In this work, it is hypothesised that building adaptive capabilities into the BCI framework can potentially give the BCI system the flexibility to improve performance by adjusting system parameters in response to changing user inputs. The research in this work addresses three potential areas for improvement within the P300 speller framework: information optimisation, target character estimation and error correction. The visual interface and its operation control the method by which the ERPs are elicited through the presentation of stimulus events. The parameters of the stimulus presentation paradigm can be modified to modulate and enhance the elicited ERPs. A new stimulus presentation paradigm is developed in order to maximise the information content that is presented to the user by tuning stimulus paradigm parameters to positively affect performance. Internally, the BCI system determines the amount of data to collect and the method by which these data are processed to estimate the user's target character. Algorithms that exploit language information are developed to enhance the target character estimation process and to correct erroneous BCI selections. In addition, a new model-based method to predict BCI performance is developed, an approach which is independent of stimulus presentation paradigm and accounts for dynamic data collection. The studies presented in this work provide evidence that the proposed methods for incorporating adaptive strategies in the three areas have the potential to significantly improve BCI communication rates, and the proposed method for predicting BCI performance provides a reliable means to pre-assess BCI performance without extensive online testing.</p> / Dissertation
52

Throughput-oriented analytical models for performance estimation on programmable hardware accelerators / Analyse de performance potentielle d'une simulation de QCD sur réseau sur processeur Cell et GPU

Lai, Junjie 15 February 2013 (has links)
Durant cette thèse, nous avons principalement travaillé sur deux sujets liés à l'analyse de la performance GPU (Graphics Processing Unit - Processeur graphique). Dans un premier temps, nous avons développé une méthode analytique et un outil d'estimation temporel (TEG) pour prédire les performances d'applications CUDA s’exécutant sur des GPUs de la famille GT200. Cet outil peut prédire les performances avec une précision approchant celle des outils précis au cycle près. Dans un second temps, nous avons développé une approche pour estimer la borne supérieure des performances d'une application GPU, en se basant sur l'analyse de l'application et de son code assembleur. Avec cette borne, nous connaissons la marge d'optimisation restante, et nous pouvons décider des efforts d'optimisation à fournir. Grâce à cette analyse, nous pouvons aussi comprendre quels paramètres sont critiques à la performance. / In this thesis work, we have mainly worked on two topics of GPU performance analysis. First, we have developed an analytical method and a timing estimation tool (TEG) to predict CUDA application's performance for GT200 generation GPUs. TEG can predict GPU applications' performance in cycle-approximate level. Second, we have developed an approach to estimate GPU applications' performance upper bound based on application analysis and assembly code level benchmarking. With the performance upper bound of an application, we know how much optimization space is left and can decide the optimization effort. Also with the analysis we can understand which parameters are critical to the performance.
53

Avaliação e predição de desempenho de programas paralelos em redes de estações de trabalho. / Parallel program performance analysis and prediction on NOW systems.

Li, Kuan Ching 25 October 2001 (has links)
Processamento distribuído tem sido utilizado amplamente para melhorar o desempenho de aplicações com alta demanda computacional. Diferentes arquiteturas e topologias distribuídas têm sido pesquisadas e utilizadas para prover o alto desempenho, proporcionando assim o recurso necessário para a exploração do paralelismo presente nas aplicações. A facilidade para construir sistemas computacionais de alto desempenho a partir de estações de trabalho interligadas através de redes de alta velocidade, aliada ao custo relativamente baixo e ao crescente avanço da tecnologia de circuitos integrados, possibilita a montagem de redes de computadores de baixo custo para a execução de aplicações paralelas. Devido a este fato, diversos sistemas de software para redes de estações têm sido desenvolvidos, visando a integração dos componentes distribuídos para a agregação das suas capacidades de processamento. No entanto, o processo de desenvolvimento de aplicações é complexo e difícil, dado que são necessários identificar o paralelismo existente nestas aplicações, e providenciar as comunicações necessárias. Neste trabalho, é apresentada uma proposta de metodologia de análise e predição de desempenho de programas paralelos, implementados com interface de passagem de mensagem (MPI), em ambientes de redes de estações de trabalho. É definida neste trabalho uma extensão da classe de grafos de tempo T-graph, denominado T-graph*, que representa, em alto nível, os programas paralelos instrumentados com MPI no nível de grafos. Com a construção de um grafo nesta classe, é possível conhecer o fluxo da execução do programa, do ponto de vista algorítmico. Ainda, é definida uma outra classe de grafos, denominada DP*Graph, que representa os programas paralelos com alto grau de detalhes, como mostrar de forma clara pontos de ocorrência de comunicação entre os nós de processamento do sistema computacional. Em paralelo com recursos e técnicas de modelagem analítica, são definidas estratégias para a avaliação de desempenho dos sistemas computacionais envolvidos. Uma vez obtidas as representações em grafos do programa paralelo e junto com as modelagens já refinadas e definidas, é possível efetuar avaliações necessárias e obter assim predições de desempenho, baseadas em dados experimentais obtidos previamente. Finalmente, os resultados experimentais obtidos mostram a viabilidade da metodologia definida nesta proposta, tanto a sua utilização e quanto à coerência das estratégias aplicadas neste trabalho. / Distributed processing has been widely used to improve the performance of applications that highly demand computational power. Different distributed architectures and topologies have been used in a search for high performance, providing further the necessary resource for the parallelism exploitation present in the applications. The ease to build high performance computer systems, by interconnecting workstations using a high speed network, together with relatively low cost and IC technology advances, it\'s possible to assembly a low cost computer network for the execution of parallel applications. Due to this fact, several applications and software systems for network of workstations have been developed, aiming the integration of distributed components for the aggregation of their processing power. Unfortunately, the process of application developing is complex and difficult, given that it is necessary identify the existing parallelism in these applications, and provide the communication needed. The control of multiple processes and their interactions are the main reasons for such complexity. It is shown, in this work, a methodology proposal for the performance analysis and prediction of parallel programs, implemented with message passing interface (MPI) in a network of workstations environment. We define, still in this work, an extension for T-graph (timing graphs), named T-graph*, a newer class of graphs from which we can represent parallel programs with MPI functions by using timing graphs. Together with resources and analytical modeling techniques, strategies are defined for the performance evaluation of computer systems involved. Once obtained the graph representation of a parallel program, in parallel with defined and refined models designed, it is possible to proceed with necessary evaluations and from this, performance prediction data, based on the experimental data obtained previously. Finally, experimental results obtained show the viability of the methodology proposed in this research, coherent strategies applied in this work and also, correct utilization of the techniques.
54

Análise comparativa de modelos de previsão de desempenho de pavimentos flexíveis

Nascimento, Deise Menezes 01 June 2005 (has links)
Os modelos de previsão de desempenho de pavimentos são importantes ferramentas utilizadas pelos sistemas de gerência, essenciais para o planejamento das atividades de manutenção e reabilitação, assim como para a estimativa dos recursos necessários para a preservação das rodovias. Este trabalho tem por objetivo comparar modelos de desempenho de pavimentos, desenvolvidos por análises empíricas e empírico-mecanísticas, que predizem a evolução da condição de pavimentos flexíveis, ao longo do tempo e/ou tráfego acumulado. Os modelos de desempenho analisados foram desenvolvidos por pesquisadores e órgãos rodoviários brasileiros e internacionais, inclusive os modelos de deterioração utilizados pelo programa computacional de gerência de pavimentos desenvolvido pelo Banco Mundial, o HDM–4 (Highway Development and Management). A pesquisa está baseada na comparação do desempenho real de seções de pavimentos rodoviários, obtido a partir da base de dados dos experimentos LTPP (Long-Term Pavement Performance) do FHWA (Federal Highway Administration), com o comportamento previsto pelos modelos de desempenho desenvolvidos por Queiroz (1981), Paterson (1987), Marcon (1996) e Yshiba (2003). Neste trabalho, a análise do comportamento das seções de teste LTPP-FHWA é feita utilizando-se uma programação fatorial que, através da análise de variância (ANOVA), permite a determinação do nível de significância de fatores pré- selecionados (variáveis independentes: tráfego, idade e número estrutural corrigido) bem como a modelagem do desempenho dos pavimentos dessas seções (variáveis dependentes: irregularidade longitudinal e deformação permanente). / The pavement performance prediction models are important tools used for pavement management, essential for the planning of maintenance and rehabilitation activities, as well as for budgeting. The aim of this work is to compare performance prediction models developed through empirical and empirical-mechanistic analyses, which predict the evolution of the condition of flexible pavements, throughout the time and/or accumulated traffic. The performance prediction models analyzed were developed by researchers and Brazilian and international road agencies, including the deterioration models used by the pavement management comuputer program HDM-4 (Highway Development and Management), developed by the World Bank. The research is based on the comparison of the real performance of pavement of sections, obtained from the data base of the LTPP Program (Long-Term Pavement Performance) of FHWA (Federal Highway Administration), with the behavior predicted by deterioration models developed by Queiroz (1981), Paterson (1987), Marcon (1996) and Yshiba (2003). In this work, the analysis of the behavior of the LTPP-FHWA test sections is made through a factorial programming. Analysis of Variance (ANOVA) allows the determination of the level of significance of pre-selected factors (independent variables: traffic, age and pavement structure) and the development of performance prediction models (dependent variables: roughness and rutting).
55

Performance prediction of application executed on GPUs using a simple analytical model and machine learning techniques / Predição de desempenho de aplicações executadas em GPUs usando um modelo analítico simples e técnicas de aprendizado de máquina

González, Marcos Tulio Amarís 25 June 2018 (has links)
The parallel and distributed platforms of High Performance Computing available today have became more and more heterogeneous (CPUs, GPUs, FPGAs, etc). Graphics Processing Units (GPU) are specialized co-processor to accelerate and improve the performance of parallel vector operations. GPUs have a high degree of parallelism and can execute thousands or millions of threads concurrently and hide the latency of the scheduler. GPUs have a deep hierarchical memory of different types as well as different configurations of these memories. Performance prediction of applications executed on these devices is a great challenge and is essential for the efficient use of resources in machines with these co-processors. There are different approaches for these predictions, such as analytical modeling and machine learning techniques. In this thesis, we present an analysis and characterization of the performance of applications executed on GPUs. We propose a simple and intuitive BSP-based model for predicting the CUDA application execution times on different GPUs. The model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. We also compare three different Machine Learning (ML) approaches: Linear Regression, Support Vector Machines and Random Forests with BSP-based analytical model. This comparison is made in two contexts, first, data input or features for ML techniques were the same than analytical model, and, second, using a process of feature extraction, using correlation analysis and hierarchical clustering. We show that GPU applications that scale regularly can be predicted with simple analytical models, and an adjusting parameter. This parameter can be used to predict these applications in other GPUs. We also demonstrate that ML approaches provide reasonable predictions for different cases and ML techniques required no detailed knowledge of application code, hardware characteristics or explicit modeling. Consequently, whenever a large data set with information about similar applications are available or it can be created, ML techniques can be useful for deploying automated on-line performance prediction for scheduling applications on heterogeneous architectures with GPUs. / As plataformas paralelas e distribuídas de computação de alto desempenho disponíveis hoje se tornaram mais e mais heterogêneas (CPUs, GPUs, FPGAs, etc). As Unidades de processamento gráfico são co-processadores especializados para acelerar operações vetoriais em paralelo. As GPUs têm um alto grau de paralelismo e conseguem executar milhares ou milhões de threads concorrentemente e ocultar a latência do escalonador. Elas têm uma profunda hierarquia de memória de diferentes tipos e também uma profunda configuração da memória hierárquica. A predição de desempenho de aplicações executadas nesses dispositivos é um grande desafio e é essencial para o uso eficiente dos recursos computacionais de máquinas com esses co-processadores. Existem diferentes abordagens para fazer essa predição, como técnicas de modelagem analítica e aprendizado de máquina. Nesta tese, nós apresentamos uma análise e caracterização do desempenho de aplicações executadas em Unidades de Processamento Gráfico de propósito geral. Nós propomos um modelo simples e intuitivo fundamentado no modelo BSP para predizer a execução de funções kernels de CUDA sobre diferentes GPUs. O modelo está baseado no número de computações e acessos à memória da GPU, com informação adicional do uso das memórias cachês obtidas do processo de profiling. Nós também comparamos três diferentes enfoques de aprendizado de máquina (ML): Regressão Linear, Máquinas de Vetores de Suporte e Florestas Aleatórias com o nosso modelo analítico proposto. Esta comparação é feita em dois diferentes contextos, primeiro, dados de entrada ou features para as técnicas de aprendizado de máquinas eram as mesmas que no modelo analítico, e, segundo, usando um processo de extração de features, usando análise de correlação e clustering hierarquizado. Nós mostramos que aplicações executadas em GPUs que escalam regularmente podem ser preditas com modelos analíticos simples e um parâmetro de ajuste. Esse parâmetro pode ser usado para predizer essas aplicações em outras GPUs. Nós também demonstramos que abordagens de ML proveem predições aceitáveis para diferentes casos e essas abordagens não exigem um conhecimento detalhado do código da aplicação, características de hardware ou modelagens explícita. Consequentemente, sempre e quando um banco de dados com informação de \\textit esteja disponível ou possa ser gerado, técnicas de ML podem ser úteis para aplicar uma predição automatizada de desempenho para escalonadores de aplicações em arquiteturas heterogêneas contendo GPUs.
56

Une méthode fondée sur les modèles pour gérer les propriétés temporelles des systèmes à composants logiciels / Design and implementation of a model driven design methodology for trusted realtime component

Nguyen, Viet Hoa 15 October 2013 (has links)
Cette thèse propose une approche pour intégrer l'utilisation des propriétés temporisées stochastiques dans un processus continu de design fondé sur des modèles à l'exécution. La spécification temporelle de services est un aspect important des architectures à base de composants, par exemple dans des réseaux distribués volatiles de nœuds informatiques. L'approche models@runtime facilite la gestion de ces architectures en maintenant des modèles abstraits des architectures synchronisés avec la structure physique de la plate-forme d'exécution distribuée. Pour les systèmes auto-adaptatifs, la prédiction de délais et de débit d'un assemblage de composants est primordial pour prendre la décision d'adaptation et accepter les évolutions qui sont conformes aux spécifications temporelles. Dans ce but, nous définissons une extension du métamodèle fondée sur les réseaux de Petri stochastiques comme un modèle temporisé interne pour la prédiction. Nous concevons une bibliothèque de patrons pour faciliter la spécification et la prédiction des propriétés temporisées classiques de modèles à l'exécution et rendre la synchronisation des comportements et des changements structurels plus facile. D'autre part, nous appliquons l'approche de la modélisation par aspects pour tisser les modèles temporisés internes dans les modèles temporisés de comportement du composant et du système. Notre moteur de prédiction est suffisamment rapide pour effectuer la prédiction à l'exécution dans un cadre réaliste et valider des modèles à l'exécution. / This thesis proposes an approach to integrate the use of time-related stochastic properties in a continuous design process based on models at runtime. Time-related specification of services are an important aspect of component-based architectures, for instance in distributed, volatile networks of computer nodes. The models at runtime approach eases the management of such architectures by maintaining abstract models of architectures synchronized with the physical, distributed execution platform. For self-adapting systems, prediction of delays and throughput of a component assembly is of utmost importance to take adaptation decision and accept evolutions that conform to the specifications. To this aim we define a metamodel extension based on stochastic Petri nets as an internal time model for prediction. We design a library of patterns to ease the specification and prediction of common time properties of models at runtime and make the synchronization of behaviors and structural changes easier. Furthermore, we apply the approach of Aspect-Oriented Modeling to weave the internal time models into timed behavior models of the component and the system. Our prediction engine is fast enough to perform prediction at runtime in a realistic setting and validate models at runtime.
57

Avaliação e predição de desempenho de programas paralelos em redes de estações de trabalho. / Parallel program performance analysis and prediction on NOW systems.

Kuan Ching Li 25 October 2001 (has links)
Processamento distribuído tem sido utilizado amplamente para melhorar o desempenho de aplicações com alta demanda computacional. Diferentes arquiteturas e topologias distribuídas têm sido pesquisadas e utilizadas para prover o alto desempenho, proporcionando assim o recurso necessário para a exploração do paralelismo presente nas aplicações. A facilidade para construir sistemas computacionais de alto desempenho a partir de estações de trabalho interligadas através de redes de alta velocidade, aliada ao custo relativamente baixo e ao crescente avanço da tecnologia de circuitos integrados, possibilita a montagem de redes de computadores de baixo custo para a execução de aplicações paralelas. Devido a este fato, diversos sistemas de software para redes de estações têm sido desenvolvidos, visando a integração dos componentes distribuídos para a agregação das suas capacidades de processamento. No entanto, o processo de desenvolvimento de aplicações é complexo e difícil, dado que são necessários identificar o paralelismo existente nestas aplicações, e providenciar as comunicações necessárias. Neste trabalho, é apresentada uma proposta de metodologia de análise e predição de desempenho de programas paralelos, implementados com interface de passagem de mensagem (MPI), em ambientes de redes de estações de trabalho. É definida neste trabalho uma extensão da classe de grafos de tempo T-graph, denominado T-graph*, que representa, em alto nível, os programas paralelos instrumentados com MPI no nível de grafos. Com a construção de um grafo nesta classe, é possível conhecer o fluxo da execução do programa, do ponto de vista algorítmico. Ainda, é definida uma outra classe de grafos, denominada DP*Graph, que representa os programas paralelos com alto grau de detalhes, como mostrar de forma clara pontos de ocorrência de comunicação entre os nós de processamento do sistema computacional. Em paralelo com recursos e técnicas de modelagem analítica, são definidas estratégias para a avaliação de desempenho dos sistemas computacionais envolvidos. Uma vez obtidas as representações em grafos do programa paralelo e junto com as modelagens já refinadas e definidas, é possível efetuar avaliações necessárias e obter assim predições de desempenho, baseadas em dados experimentais obtidos previamente. Finalmente, os resultados experimentais obtidos mostram a viabilidade da metodologia definida nesta proposta, tanto a sua utilização e quanto à coerência das estratégias aplicadas neste trabalho. / Distributed processing has been widely used to improve the performance of applications that highly demand computational power. Different distributed architectures and topologies have been used in a search for high performance, providing further the necessary resource for the parallelism exploitation present in the applications. The ease to build high performance computer systems, by interconnecting workstations using a high speed network, together with relatively low cost and IC technology advances, it\'s possible to assembly a low cost computer network for the execution of parallel applications. Due to this fact, several applications and software systems for network of workstations have been developed, aiming the integration of distributed components for the aggregation of their processing power. Unfortunately, the process of application developing is complex and difficult, given that it is necessary identify the existing parallelism in these applications, and provide the communication needed. The control of multiple processes and their interactions are the main reasons for such complexity. It is shown, in this work, a methodology proposal for the performance analysis and prediction of parallel programs, implemented with message passing interface (MPI) in a network of workstations environment. We define, still in this work, an extension for T-graph (timing graphs), named T-graph*, a newer class of graphs from which we can represent parallel programs with MPI functions by using timing graphs. Together with resources and analytical modeling techniques, strategies are defined for the performance evaluation of computer systems involved. Once obtained the graph representation of a parallel program, in parallel with defined and refined models designed, it is possible to proceed with necessary evaluations and from this, performance prediction data, based on the experimental data obtained previously. Finally, experimental results obtained show the viability of the methodology proposed in this research, coherent strategies applied in this work and also, correct utilization of the techniques.
58

Energy Demand Response for High-Performance Computing Systems

Ahmed, Kishwar 22 March 2018 (has links)
The growing computational demand of scientific applications has greatly motivated the development of large-scale high-performance computing (HPC) systems in the past decade. To accommodate the increasing demand of applications, HPC systems have been going through dramatic architectural changes (e.g., introduction of many-core and multi-core systems, rapid growth of complex interconnection network for efficient communication between thousands of nodes), as well as significant increase in size (e.g., modern supercomputers consist of hundreds of thousands of nodes). With such changes in architecture and size, the energy consumption by these systems has increased significantly. With the advent of exascale supercomputers in the next few years, power consumption of the HPC systems will surely increase; some systems may even consume hundreds of megawatts of electricity. Demand response programs are designed to help the energy service providers to stabilize the power system by reducing the energy consumption of participating systems during the time periods of high demand power usage or temporary shortage in power supply. This dissertation focuses on developing energy-efficient demand-response models and algorithms to enable HPC system's demand response participation. In the first part, we present interconnection network models for performance prediction of large-scale HPC applications. They are based on interconnected topologies widely used in HPC systems: dragonfly, torus, and fat-tree. Our interconnect models are fully integrated with an implementation of message-passing interface (MPI) that can mimic most of its functions with packet-level accuracy. Extensive experiments show that our integrated models provide good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance. In the second part, we present an energy-efficient demand-response model to reduce HPC systems' energy consumption during demand response periods. We propose HPC job scheduling and resource provisioning schemes to enable HPC system's emergency demand response participation. In the final part, we propose an economic demand-response model to allow both HPC operator and HPC users to jointly reduce HPC system's energy cost. Our proposed model allows the participation of HPC systems in economic demand-response programs through a contract-based rewarding scheme that can incentivize HPC users to participate in demand response.
59

Throughput-oriented analytical models for performance estimation on programmable hardware accelerators

Lai, Junjie 15 February 2013 (has links) (PDF)
In this thesis work, we have mainly worked on two topics of GPU performance analysis. First, we have developed an analytical method and a timing estimation tool (TEG) to predict CUDA application's performance for GT200 generation GPUs. TEG can predict GPU applications' performance in cycle-approximate level. Second, we have developed an approach to estimate GPU applications' performance upper bound based on application analysis and assembly code level benchmarking. With the performance upper bound of an application, we know how much optimization space is left and can decide the optimization effort. Also with the analysis we can understand which parameters are critical to the performance.
60

Performance Evaluation and Prediction of Parallel Applications

Markomanolis, Georgios 20 January 2014 (has links) (PDF)
Analyzing and understanding the performance behavior of parallel applicationson various compute infrastructures is a long-standing concern in the HighPerformance Computing community. When the targeted execution environments arenot available, simulation is a reasonable approach to obtain objectiveperformance indicators and explore various ''what-if?'' scenarios. In thiswork we present a framework for the off-line simulation of MPIapplications. The main originality of our work with regard to the literature is to rely on\tit execution traces. This allows for an extreme scalability as heterogeneousand distributed resources can be used to acquire a trace. We propose a formatwhere for each event that occurs during the execution of an application we logthe volume of instructions for a computation phase or the bytes and the type ofa communication. To acquire time-independent traces of the execution of MPI applications, wehave to instrument them to log the required data. There exist many profilingtools which can instrument an application. We propose a scoring system thatcorresponds to our framework specific requirements and evaluate the mostwell-known and open source profiling tools according to it. Furthermore weintroduce an original tool called Minimal Instrumentation that was designed tofulfill the requirements of our framework. We study different instrumentationmethods and we also investigate several acquisition strategies. We detail thetools that extract the \tit traces from the instrumentation traces of somewell-known profiling tools. Finally we evaluate the whole acquisition procedureand we present the acquisition of large scale instances. We describe in detail the procedure to provide a realistic simulated platformfile to our trace replay tool taking under consideration the topology of thereal platform and the calibration procedure with regard to the application thatis going to be simulated. Moreover we present the implemented trace replaytools that we used during this work. We show that our simulator can predictthe performance of some MPI benchmarks with less than 11\% relativeerror between the real execution and simulation for the cases that there is noperformance issue. Finally, we identify the reasons of the performance issuesand we propose solutions.

Page generated in 0.0941 seconds