Spelling suggestions: "subject:"openmp"" "subject:"openmpi""
21 |
De l’exécution structurée d’applications scientifiques OpenMP sur architectures hiérarchiquesBroquedis, François 09 December 2010 (has links)
Le domaine applicatif de la simulation numérique requiert toujours plus de puissance de calcul. La technologie multicœur aide à satisfaire ces besoins mais impose toutefois de nouvelles contraintes aux programmeurs d’applications scientifiques qu’ils devront respecter s’ils souhaitent en tirer la quintessence. En particulier, il devient plus que jamais nécessaire de structurer le parallélisme des applications pour s’adapter au relief imposé par la hiérarchie mémoire des architectures multicœurs. Les approches existantes pour les programmer ne tiennent pas compte de cette caractéristique, et le respect de la structure du parallélisme reste à la charge du programmeur. Il reste de ce fait très difficile de développer une application qui soit à la fois performante et portable.La contribution de cette thèse s’articule en trois axes. Il s’agit dans un premier temps de s’appuyer sur le langage OpenMP pour générer du parallélisme structuré, et de permettre au programmeur de transmettre cette structure au support exécutif ForestGOMP. L’exécution structurée de ces flots de calcul est ensuite laissée aux ordonnanceurs Cacheet Memory développés au cours de cette thèse, permettant respectivement de maximiser la réutilisation des caches partagés et de maximiser la bande passante mémoire accessible par les programmes OpenMP. Enfin, nous avons étudié la composition de ces ordonnanceurs, et plus généralement de bibliothèques parallèles, en considérant cette voie comme une piste sérieuse pour exploiter efficacement les multiples unités de calcul des architectures multicœurs.Les gains obtenus sur des applications scientifiques montrent l’intérêt d’une communication forte entre l’application et le support exécutif, permettant l’ordonnancement dynamique et portable de parallélisme structuré sur les architectures hiérarchiques. / Abstract
|
22 |
Uso das características computacionais de regiões paralelas OpenMP para redução do consumo de energiaMoro, Gabriel Bronzatti January 2018 (has links)
Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System.
|
23 |
A Case Study of Semi-Automatic Parallelization of Divide and Conquer Algorithms Using Invasive Interactive ParallelizationHansson, Erik January 2009 (has links)
<p>Since computers supporting parallel execution have become more and more common the last years, especially on the consumer market, the need for methods and tools for parallelizing existing sequential programs has highly increased. Today there exist different methods of achieving this, in a more or less user friendly way. We have looked at one method, Invasive Interactive Parallelization (IIP), on a special problem area, divide and conquer algorithms, and performed a case study. This case study shows that by using IIP, sequential programs can be parallelized both for shared and distributed memory machines. We have focused on parallelizing Quick Sort for OpenMP and MPI environment using a tool, Reuseware, which is based on the concepts of Invasive Software Composition.</p>
|
24 |
Exploring heterogeneous scheduling using the task-centric programming modelPodobas, Artur, Brorsson, Mats, Vlassov, Vladimir January 2012 (has links)
Computer architecture technology is moving towards more heteroge-neous solutions, which will contain a number of processing units with different capabilities that may increase the performance of the system as a whole. How-ever, with increased performance comes increased complexity; complexity that is now barely handled in homogeneous multiprocessing systems. The present study tries to solve a small piece of the heterogeneous puzzle; how can we exploit all system resources in a performance-effective and user-friendly way? Our proposed solution includes a run-time system capable of using a variety of different heterogeneous components while providing the user with the already familiar task-centric programming model interface. Furthermore, when dealing with non-uniform workloads, we show that traditional approaches based on centralized or work-stealing queue algorithms do not work well and propose a scheduling algorithm based on trend analysis to distribute work in a performance-effective way across resources. / <p>QC 20130429</p> / ENCORE
|
25 |
Architecture-aware Task-scheduling : A thermal approachPodobas, Artur, Brorsson, Mats January 2011 (has links)
Current task-centric many-core schedulers share a “naive” view of processor architecture; a view that does not care about its thermal, architectural or power consuming properties. Future processor will be more heterogeneous than what we see today, and following Moore’s law of transistor doubling, we foresee an increase in power consumption and thus temperature. Thermal stress can induce errors in processors, and so a common way to counter this is by slowing the processor down; something task-centric schedulers should strive to avoid. The Thermal-Task-Interleaving scheduling algorithm proposed in this paper takes both the application temperature behavior and architecture into account when making decisions. We show that for a mixed workload, our scheduler outperforms some of the standard, architecture-unaware scheduling solutions existing today. / QC 20120215
|
26 |
Parallel SVM with Application to Protein Structure PredictionPanaganti, Shilpa 20 December 2004 (has links)
A learning task with thousands of training examples in Support Vector Machine (SVM) demands large amounts of memory and time requirements. SVMlight by Dr. Thorsten Joachims has been implemented in C using a fast optimizing algorithm for handling thousands of such support vectors. SVMlight solves the problem of classification, pattern recognition, regression and learning ranking function. The C code also provides methods for XiAlpha estimation of error rate and precision. Implementing these two methods leads to generalized performance of Support Vector Machine even for computation intensive text classification functions. SVMlight code allows users to define their own kernel functions. The SVMlight software employs an efficient algorithm and minimizes the cost, but it still takes considerable amount of time for computing thousands of support vectors and training examples. This time can be still reduced by parallelizing the code. In our work we refined the SVMlight code by removing unnecessary iterations and rewriting it as cost efficient. Then we parallelized the code individually using two different types, OpenMP and POSIX Threads shared memory parallelism. The code is parallelized for these two methods on Intel’s C compiler for Linux 7.1 using hyper threading technology. The parallelized code is tested for protein structure prediction. Different types of Protein Sequences are tested on these methods by varying the number of training examples and support vectors. The time consumption and speedup are calculated for both OpenMP and Pthreads. Implementation of OpenMP and Pthreads together showed good increase in speedup.
|
27 |
Μελέτη σύνθετων εφαρμογών και ανάλυση δυνατοτήτων παραλληλοποίησης τους σε αρχιτεκτονικές κοινής και κοινής/κατανεμημένης μνήμηςΜουτσουρούφης, Γεώργιος 26 September 2007 (has links)
Η χρήση μετροπρογραμμάτων για την μέτρηση της επίδοσης και της απόδοσης συστημάτων αρχιτεκτονικής πολλαπλών επεξεργαστικών στοιχείων και συστημάτων λογισμικού για την υποστήριξη εκτέλεσης παράλληλων εφαρμογών πάνω σε πολυεπεξεργαστικές πλατφόρμες, είναι μία έγκυρη και ευρέως διαδεδομένη μέθοδος. Μάλιστα για την τυποποίηση τέτοιων προγραμμάτων, μεγάλες και παγκοσμίως αναγνωρισμένες ερευνητικές ομάδες, έχουν προτείνει και έχουν παραλληλοποιήσει συλλογή μετροπρογραμμάτων, ως αποτέλεσμα πολυετούς εμπειρίας και έρευνας. Οι πιο γνωστές συλλογές μετροπρογραμμάτων είναι τα SPEC, NAS και SPLASH. Όμως αν και οι κώδικες αυτών των συλλογών ετροπρογραμμάτων είναι διαθέσιμοι στην επιστημονική κοινότητα, δεν είναι δυνατόν να καλύπτουν πλήρως τις ανάγκες των ερευνητικών ομάδων παγκοσμίως που ασχολούνται με την έρευνα και την ανάπτυξη συστημάτων λογισμικού για την υποστήριξη παράλληλων εφαρμογών πάνω σε πολυεπεξεργαστικές πλατφόρμες.
Η παρούσα διπλωματική, επιχειρεί τη μελέτη, την ανάλυση και την παραλληλοποίηση δύο σύνθετων ακολουθιακών εφαρμογών που θα χρησιμοποιηθούν ως μετροπρογράμματα για την αξιολόγηση συστημάτων υποστήριξης παράλληλων εφαρμογών. Οι πλατφόρμες υποστήριξης παράλληλων εφαρμογών αναπτύσσονται στο Εργαστήριο Πληροφοριακών Συστημάτων Υψηλών Επιδόσεων (ΕΠΣΥΕ).
Η διπλωματική αποτελείται από την παρουσίαση και την ανάλυση μεθόδων βελτιστοποίησης εφαρμογών. Αναφέρεται σε βελτιώσεις μονοεπεξεργαστικών συστημάτων, που συνήθως συντελούνται με την αλλαγή κώδικα για την καλύτερη εκμετάλλευση των πόρων του συστήματος, και από την προσαρμογή ή και την αλλαγή των ακολουθιακών αλγορίθμων για παράλληλες αρχιτεκτονικές συστημάτων SMP. Στη συνέχεια επιχειρούμε την βελτιστοποίηση και την παραλληλοποίηση δύο εφαρμογών τις οποίες θα χρησιμοποιεί η ομάδα παρλλαλήλων συστημάτων του εργαστηρίου ΕΠΣΥΕ για την αξιολόγηση του λογισμικού που αναπτύσσει στα πλαίσια των ερευνητικών της δραστηριοτήτων. Η μία εφαρμογή είναι από το χώρο της Ιατρικής και αφορά τον υπολογισμό του απαιτούμενου ποσοστού ακτινοβολίας για την ακτινοθεραπεία όγκων. Η δεύτερη εφαρμογή είναι από το χώρο της μοριακής χημείας και αναφέρεται στον υπολογισμό της κίνησης των μορίων, αερίων εγκλωβισμένων σε μάζες στερεών σωμάτων. Τέλος παραθέτουμε τις μετρήσεις των βελτιστοποιημένων και παραλληλοποιημένων εφαρμογών και τις βελτιώσεις που επιτυγχάνουν.
Η προσπάθεια βελτιστοποίησης και προσαρμογής των συγκεκριμένων αλλά και επιπλέον εφαρμογών σε υπάρχουσες αλλά και σε νέες αρχιτεκτονικές θα συνεχιστεί, με στόχο την απόκτηση της απαιτούμενης τεχνογνωσίας για την βελτιστοποίηση και παραλληλοποίηση δικών μας εφαρμογών/μετροπρογραμμάτων, που θα χρησιμοποιούμε για την αξιολόγηση των πλατφορμών που αναπτύσσουμε για την υποστήριξη παράλληλης και ταυτόχρονης επεξεργασίας. / The use of benchmarks for the measurement of the effectiveness and performance of computer systems with multiple processors and of software systems that support parallel execution of applications on those platforms is a valid and widespread method.
In fact, towards the standardization of such programs large and worldwide acknowledged research teams have proposed and parallelized a collection of benchmarks which are the result of many years of experience and research. The most known collections of such benchmarks are these of SPEC, NASH and SPLASH. Although the codes of this collection of benchmarks are in the disposal of the scientific community, they could not possibly fully cover the needs of the scientific teams all over the world that work in the field of research and development of software systems that support parallel applications the multiprocessor platforms.
This thesis is an effort to study, analyze and parallelize two complex sequential applications that will be used as benchmarks for the evaluation of parallel application support systems. The platforms supporting parallel applications are developed in the High Performance Computer Laboratory in Patra (HPClab).
This thesis includes a presentation and analyses of the optimization methods for parallel applications. It refers to improvements which are usually the result of changes in the programming code, in order to achieve optimal utilization of the given system resources. These improvements may also derive from the adjustment or even the change of the sequential algorithms for parallel SMP system architectures.
What will follow, will be an attempt to optimize and parallelize two applications that will be used by the HPClab team in order to evaluate the software developed in the content of its research activity. The first application comes from the scientific field of medicine and regards the computation of the required amount of radiation applied for the cure of cancer.
The second one comes from the scientific field of engineer chemistry and regards the computation of molecule movement of gases enclosed in solid objects. Finally, the measurements for the optimized and parallelized applications and the improvements achieved will be presented.
The attempt to optimize and adjust these applications as well as others, will continue to be developed in the framework of existent and platforms, with the goal to attain the necessary know how for the optimization and parallelization of our own applications/benchmarks which will be used for the evaluation of the platforms developed and for the support of parallel applications.
|
28 |
Processamento térmico de grafeno e sua síntese pela técnica de epitaxia por feixes molecularesRolim, Guilherme Koszeniewski January 2018 (has links)
Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System.
|
29 |
Uso das características computacionais de regiões paralelas OpenMP para redução do consumo de energiaMoro, Gabriel Bronzatti January 2018 (has links)
Desempenho e consumo energético são requisitos fundamentais em sistemas de computação. Um desafio comumente encontrado é conciliar esses dois aspectos, buscando manter o mesmo desempenho, consumindo cada vez menos energia. Muitas técnicas possibilitam a redução do consumo de energia em aplicações paralelas, mas na maioria das vezes elas envolvem recursos encontrados apenas em processadores modernos ou um conhecimento amplo das características da aplicação e da plataforma alvo. Nesse trabalho propomos uma abordagem em formato de Workflow. Na primeira fase, o comportamento da aplicação paralela é investigado. A partir dessa investigação, a segunda fase realiza a execução da aplicação paralela com diferentes frequências (mínima e máxima) de processador, utilizando a caracterização das regiões, obtida na primeira fase da abordagem. Esse Workflow foi implementado em formato de biblioteca dinâmica, a fim de que ela possa ser utilizada em qualquer aplicação OpenMP. A biblioteca possui suporte as duas fases do Workflow, na primeira fase é gerado um arquivo que descreve as assinaturas comportamentais das regiões paralelas da aplicação. Esse arquivo é posteriormente utilizado na segunda fase, quando a biblioteca vai alterar dinamicamente a frequência de processador. O benchmark Lulesh é utilizado como cenário de testes da biblioteca, com isso o maior ganho obtido é a redução de 1,89% do consumo de energia. Esse ganho acarretou uma sobrecarga de 0,09% no tempo de execução. Ao comparar nossa técnica com a política de troca de frequência adotada pelo governor Ondemand do Sistema Operacional Linux, o ganho de 1,89% é significativo em relação ao benchmark utilizado, pois nele existem regiões paralelas de curta duração, o que impacta negativamente no overhead da operação de troca de frequência. / Performance and energy consumption are fundamental requirements in computer systems. A very frequent challenge is to combine both aspects, searching to keep the high performance computing while consuming less energy. There are a lot of techniques to reduce energy consumption, but in general, they use modern processors resources or they require specific knowledge about application and platform used. In this work, we propose a performance analysis workflow strategy divided into two steps. In the first step, we analyze the parallel application behavior through the use of hardware counters that reflect CPU and memory usage. The goal is to obtain a per-region computing signature. The result of this first step is a configuration file that describes the duration of each region, their hardware counters, and source code identification. The second step runs the parallel application with different frequencies (low or high) according to the characterization obtained in the previous step. The results show a reduction of 1,89% in energy consumption for the Lulesh benchmark with an increase of 0,09% in runtime when we compare our approach against the governor Ondemand of the Linux Operating System.
|
30 |
Uma metodologia de avaliação de desempenho para identificar as melhore regiões paralelas para reduzir o consumo de energia / A performance evaluation methodology to find the best parallel regions to reduce energy consumptionMillani, Luís Felipe Garlet January 2015 (has links)
Devido as limitações de consumo energético impostas a supercomputadores, métricas de eficiência energética estão sendo usadas para analisar aplicações paralelas desenvolvidas para computadores de alto desempenho. O objetivo é a redução do custo energético dessas aplicações. Algumas estratégias de redução de consumo energética consideram a aplicação como um todo, outras reduzem ajustam a frequência dos núcleos apenas em certas regiões do código paralelo. Fases de balanceamento de carga ou de comunicação bloqueante podem ser oportunas para redução do consumo energético. A análise de eficiência dessas estratégias é geralmente realizada com metodologias tradicionais derivadas do domínio de análise de desempenho. Uma metodologia de grão mais fino, onde a redução de energia é avaliada para cada região de código e frequência pode lever a um melhor entendimento de como o consumo energético pode ser minimizado para uma determinada implementação. Para tal, os principais desafios são: (a) a detecção de um número possivelmente grande de regiões paralelas; (b) qual frequência deve ser adotada para cada região de forma a limitar o impacto no tempo de execução; e (c) o custo do ajuste dinâmico da frequência dos núcleos. O trabalho descrito nesta dissertação apresenta uma metodologia de análise de desempenho para encontrar, dentre as regiões paralelas, os melhores candidatos a redução do consumo energético. (Cotninua0 Esta proposta consiste de: (a) um design inteligente de experimentos baseado em Plackett-Burman, especialmente importante quando um grande número de regiões paralelas é detectado na aplicação; (b) análise tradicional de energia e desempenho sobre as regiões consideradas candidatas a redução do consumo energético; e (c) análise baseada em eficiência de Pareto mostrando a dificuldade em otimizar o consumo energético. Em (c) também são mostrados os diferentes pontos de equilíbrio entre desempenho e eficiência energética que podem ser interessantes ao desenvolvedor. Nossa abordagem é validada por três aplicações: Graph500, busca em largura, e refinamento de Delaunay. / Due to energy limitations imposed to supercomputers, parallel applications developed for High Performance Computers (HPC) are currently being investigated with energy efficiency metrics. The idea is to reduce the energy footprint of these applications. While some energy reduction strategies consider the application as a whole, certain strategies adjust the core frequency only for certain regions of the parallel code. Load balancing or blocking communication phases could be used as opportunities for energy reduction, for instance. The efficiency analysis of such strategies is usually carried out with traditional methodologies derived from the performance analysis domain. It is clear that a finer grain methodology, where the energy reduction is evaluated per each code region and frequency configuration, could potentially lead to a better understanding of how energy consumption can be reduced for a particular algorithm implementation. To get this, the main challenges are: (a) the detection of such, possibly parallel, code regions and the large number of them; (b) which frequency should be adopted for that region (to reduce energy consumption without too much penalty for the runtime); and (c) the cost to dynamically adjust core frequency. The work described in this dissertation presents a performance analysis methodology to find the best parallel region candidates to reduce energy consumption. The proposal is three folded: (a) a clever design of experiments based on screening, especially important when a large number of parallel regions is detected in the applications; (b) a traditional energy and performance evaluation on the regions that were considered as good candidates for energy reduction; and (c) a Pareto-based analysis showing how hard is to obtain energy gains in optimized codes. In (c), we also show other trade-offs between performance loss and energy gains that might be of interest of the application developer. Our approach is validated against three HPC application codes: Graph500; Breadth-First Search, and Delaunay Refinement.
|
Page generated in 0.183 seconds