Global ETD Search

381	A runtime system for data-flow task programming on multicore architectures with accelerators / Uma ferramenta para programação com dependência de dados em arquiteturas multicore com aceleradores / Vers un support exécutif avec dépendance de données pour les architectures multicoeur avec des accélérateurs Lima, João Vicente Ferreira January 2014 (has links) Dans cette thèse , nous proposons d’étudier des questions sur le parallélism de tâche avec dépendance de données dans le cadre de machines multicoeur avec des accélérateurs. La solution proposée a été développée en utilisant l’interface de programmation haute niveau XKaapi du projet MOAIS de l’INRIA Rhône-Alpes. D’abord nous avons étudié des questions liés à une approche d’exécution totalement asyncrone et l’ordonnancement par vol de travail sur des architectures multi-GPU. Le vol de travail avec localité de données a montré des résultats significatifs, mais il ne prend pas en compte des différents ressources de calcul. Ensuite nous avons conçu une interface et une modèle de coût qui permettent d’écrire des politiques d’ordonnancement sur XKaapi. Finalement on a évalué XKaapi sur un coprocesseur Intel Xeon Phi en mode natif. Notre conclusion est double. D’abord nous avons montré que le modèle de programmation data-flow peut être efficace sur des accélérateurs tels que des GPUs ou des coprocesseurs Intel Xeon Phi. Ensuite, le support à des différents politiques d’ordonnancement est indispensable. Les modèles de coût permettent d’obtenir de performance significatifs sur des calculs très réguliers, tandis que le vol de travail permet de redistribuer la charge en cours d’exécution. / Esta tese investiga os desafios no uso de paralelismo de tarefas com dependências de dados em arquiteturas multi-CPU com aceleradores. Para tanto, o XKaapi, desenvolvido no grupo de pesquisa MOAIS (INRIA Rhône-Alpes), é a ferramenta de programação base deste trabalho. Em um primeiro momento, este trabalho propôs extensões ao XKaapi a fim de sobrepor transferência de dados com execução através de operações concorrentes em GPU, em conjunto com escalonamento por roubo de tarefas em multi-GPU. Os resultados experimentais sugerem que o suporte a asincronismo é importante à escalabilidade e desempenho em multi-GPU. Apesar da localidade de dados, o roubo de tarefas não pondera a capacidade de processamento das unidades de processamento disponíveis. Nós estudamos estratégias de escalonamento com predição de desempenho em tempo de execução através de modelos de custo de execução. Desenvolveu-se um framework sobre o XKaapi de escalonamento que proporciona a implementação de diferentes algoritmos de escalonamento. Esta tese também avaliou o XKaapi em coprocessodores Intel Xeon Phi para execução nativa. A conclusão desta tese é dupla. Primeiramente, nós concluímos que um modelo de programação com dependências de dados pode ser eficiente em aceleradores, tais como GPUs e coprocessadores Intel Xeon Phi. Não obstante, uma ferramenta de programação com suporte a diferentes estratégias de escalonamento é essencial. Modelos de custo podem ser usados no contexto de algoritmos paralelos regulares, enquanto que o roubo de tarefas poder reagir a desbalanceamentos em tempo de execução. / In this thesis, we propose to study the issues of task parallelism with data dependencies on multicore architectures with accelerators. We target those architectures with the XKaapi runtime system developed by the MOAIS team (INRIA Rhône-Alpes). We first studied the issues on multi-GPU architectures for asynchronous execution and scheduling. Work stealing with heuristics showed significant performance results, but did not consider the computing power of different resources. Next, we designed a scheduling framework and a performance model to support scheduling strategies over XKaapi runtime. Finally, we performed experimental evaluations over the Intel Xeon Phi coprocessor in native execution. Our conclusion is twofold. First we concluded that data-flow task programming can be efficient on accelerators, which may be GPUs or Intel Xeon Phi coprocessors. Second, the runtime support of different scheduling strategies is essential. Cost models provide significant performance results over very regular computations, while work stealing can react to imbalances at runtime. Programmation parallèle Accélérateur Parallélisme de tâche Dépendance de données Vol de travail Arquitetura : Computadores Processamento paralelo Parallel programming Accelerators Task parallelism Data flow dependencies Work stealing
382	Extensão da Ferramenta de Apoio à Programação Paralela (F.A.P.P.) para ambientes paralelos virtuais. / A parallel programming supporting tool extension for parallel virtual environments. Kalinka Regina Lucas Jaquie 30 March 1999 (has links) Os sistemas computacionais distribuídos aplicados à computação paralela permitem uma melhor relação custo/benefício para a computação paralela. Esses sistemas oferecem a potência computacional adequada às aplicações que não necessitam de uma máquina maciçamente paralela, porém necessitam de uma potência computacional maior que uma máquina seqüencial pode oferecer. P.V.M. (Parallel Virtual Machine) e M.P.I. (Message Passage Inteface) são exemplos de ambiente de paralelos virtuais amplamente discutido na literatura. Tendo em vista a grande utilização desses ambientes tanto em nível acadêmico quanto em níveis comerciais e industriais, torna-se interessante a criação de uma ferramenta que apoie o desenvolvimento de programas para esses ambientes. Poucas são as ferramentas desse tipo que aparecem na literatura; uma delas e que permite ser estendida para dar suporte a tais ambientes é a F.A.P.P. (Ferramenta de Apoio à Programação Paralela). Dentro desse contexto, este trabalho apresenta a modelagem dos ambientes paralelos virtuais segundo a abordagem proposta na definição da F.A.P.P., para que arcabouços de programas P.V.M. e M.P.I possam ser gerados. Essa ferramenta permite a utilização da computação paralela a um maior número de usuários, ou seja, auxiliando os iniciante na confecção dos programa e os experientes na manutenção, além de permitir maior produtividade. Foram realizados estudos visando a validação e a avaliação da ferramenta. Os resultados obtidos demonstram que a ferramenta possui comportamento estável e tem potencial para ser utilizada livremente em ambientes P.V.M. e M.P.I.. / Distributed computing systems applied to parallel computing allow the realisation of a better cost/benefit relation for parallel programming. These systems offer an adequate computing power to those applications which do not require a massively parallel architecture but need such a computer power not available from sequential computers. P.V.M. (Parallel Virtual Machine) and M.P.I. (Message Passing Interface) are good examples of parallel virtual environments being widely discussed in the literature. These virtual environments are broadly used in both academic, commercial and industrial applications, making attractive the development of supporting tools for these parallel programming environments. There are few of such tools available in the literature. F.A.P.P. is one of these tools and it can be extended to support parallel virtual environments. This work addresses the extension of the F.A.P.P. in order to produced P.V.M. and M.P.I. source code. This extension can help a large number of users to develop parallel programs either by giving support for the beginners or by increasing the productivity of the experienced parallel programmers, besides helping in the maintenance phase. The tool produced is tested by means of several examples which show a stable behaviour and that the tool can be easily used in both P.V.M. and M.P.I. environments. ambientes virtuais ferramentas de apoio modelagem orientada a objetos programação paralela sistemas distribuídos distributed systems object-oriented modeling parallel programming supporting tools virtual environments
383	A runtime system for data-flow task programming on multicore architectures with accelerators / Uma ferramenta para programação com dependência de dados em arquiteturas multicore com aceleradores / Vers un support exécutif avec dépendance de données pour les architectures multicoeur avec des accélérateurs Lima, João Vicente Ferreira January 2014 (has links) Dans cette thèse , nous proposons d’étudier des questions sur le parallélism de tâche avec dépendance de données dans le cadre de machines multicoeur avec des accélérateurs. La solution proposée a été développée en utilisant l’interface de programmation haute niveau XKaapi du projet MOAIS de l’INRIA Rhône-Alpes. D’abord nous avons étudié des questions liés à une approche d’exécution totalement asyncrone et l’ordonnancement par vol de travail sur des architectures multi-GPU. Le vol de travail avec localité de données a montré des résultats significatifs, mais il ne prend pas en compte des différents ressources de calcul. Ensuite nous avons conçu une interface et une modèle de coût qui permettent d’écrire des politiques d’ordonnancement sur XKaapi. Finalement on a évalué XKaapi sur un coprocesseur Intel Xeon Phi en mode natif. Notre conclusion est double. D’abord nous avons montré que le modèle de programmation data-flow peut être efficace sur des accélérateurs tels que des GPUs ou des coprocesseurs Intel Xeon Phi. Ensuite, le support à des différents politiques d’ordonnancement est indispensable. Les modèles de coût permettent d’obtenir de performance significatifs sur des calculs très réguliers, tandis que le vol de travail permet de redistribuer la charge en cours d’exécution. / Esta tese investiga os desafios no uso de paralelismo de tarefas com dependências de dados em arquiteturas multi-CPU com aceleradores. Para tanto, o XKaapi, desenvolvido no grupo de pesquisa MOAIS (INRIA Rhône-Alpes), é a ferramenta de programação base deste trabalho. Em um primeiro momento, este trabalho propôs extensões ao XKaapi a fim de sobrepor transferência de dados com execução através de operações concorrentes em GPU, em conjunto com escalonamento por roubo de tarefas em multi-GPU. Os resultados experimentais sugerem que o suporte a asincronismo é importante à escalabilidade e desempenho em multi-GPU. Apesar da localidade de dados, o roubo de tarefas não pondera a capacidade de processamento das unidades de processamento disponíveis. Nós estudamos estratégias de escalonamento com predição de desempenho em tempo de execução através de modelos de custo de execução. Desenvolveu-se um framework sobre o XKaapi de escalonamento que proporciona a implementação de diferentes algoritmos de escalonamento. Esta tese também avaliou o XKaapi em coprocessodores Intel Xeon Phi para execução nativa. A conclusão desta tese é dupla. Primeiramente, nós concluímos que um modelo de programação com dependências de dados pode ser eficiente em aceleradores, tais como GPUs e coprocessadores Intel Xeon Phi. Não obstante, uma ferramenta de programação com suporte a diferentes estratégias de escalonamento é essencial. Modelos de custo podem ser usados no contexto de algoritmos paralelos regulares, enquanto que o roubo de tarefas poder reagir a desbalanceamentos em tempo de execução. / In this thesis, we propose to study the issues of task parallelism with data dependencies on multicore architectures with accelerators. We target those architectures with the XKaapi runtime system developed by the MOAIS team (INRIA Rhône-Alpes). We first studied the issues on multi-GPU architectures for asynchronous execution and scheduling. Work stealing with heuristics showed significant performance results, but did not consider the computing power of different resources. Next, we designed a scheduling framework and a performance model to support scheduling strategies over XKaapi runtime. Finally, we performed experimental evaluations over the Intel Xeon Phi coprocessor in native execution. Our conclusion is twofold. First we concluded that data-flow task programming can be efficient on accelerators, which may be GPUs or Intel Xeon Phi coprocessors. Second, the runtime support of different scheduling strategies is essential. Cost models provide significant performance results over very regular computations, while work stealing can react to imbalances at runtime. Programmation parallèle Accélérateur Parallélisme de tâche Dépendance de données Vol de travail Arquitetura : Computadores Processamento paralelo Parallel programming Accelerators Task parallelism Data flow dependencies Work stealing
384	Adaptatividade hp em paralelo / Hp adaptive technique in parallel Rylo, Edimar Cesar 14 August 2018 (has links) Orientador: Philippe Remy Bernard Devloo / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Civil, Arquitetura e Urbanismo / Made available in DSpace on 2018-08-14T01:58:48Z (GMT). No. of bitstreams: 1 Rylo_EdimarCesar_D.pdf: 6164747 bytes, checksum: e5675a350bb563815a0ed3a28155e3ee (MD5) Previous issue date: 2007 / Resumo: Esse trabalho apresenta uma abordagem para a implementação de métodos auto-adaptativos hp em malhas de elementos finitos utilizando processamento paralelo para a seleção do padrão hp a ser utilizado em cada elemento da malha. Dois tópicos são destacados: análise da qualidade da aproximação e modo de melhoria do espaço de aproximação. O trabalho apresentado propõe uma estrutura para a implementação de métodos hp autoadaptativos no ambiente PZ. Essa estrutura é genérica e pode ser utilizada independentemente de: formulação fraca, tipo de elemento utilizado, método de resolução etc. A estrutura proposta define a interface requerida de um estimador de erros, bem como a interface para a seleção do padrão de refinamento. Tal interface contempla a possibilidade de análise de malhas com elementos contínuos ou descontínuos. A implementação apresentada contempla o processamento em máquinas paralelas, de modo que o tempo de obtenção de uma malha adaptada seja aceitável em aplicações práticas. O cálculo do erro bem como a definição dos padrões de refinamento pode ser feito utilizando processamento paralelo, em ambientes com memória compartilhada ou distribuída. Uma metodologia de refinamento h baseado em padrões de refinamento foi desenvolvida, implementada e validada. Essa metodologia facilita a implementação de padrões de refinamento. Em contrapartida, a geração de malhas com espaços de aproximação contínuos impõe restrições para a seleção do padrão de refinamento de um elemento. Assim, para a seleção de um padrão de refinamento de um elemento foi desenvolvida uma metodologia de análise de padrões admissíveis. A seleção do padrão de refinamento tendo por base uma análise de padrões admissíveis é um ponto que requer novas pesquisas, sendo considerado um dos desafios da auto-adaptatividade (ver Zienkiewicz [55]). / Abstract: This work presents a study of hp adaptive methods applied to finite element approximations. Two topics are emphasized: analysis of the quality of the approximation and methodology of refinement of the approximation space. The main objective of the work is to conceive a framework for developing hp-adaptive algorithms within the PZ environment. The framework is independent of the weak statement, type of element or resolution method. The framework uses separate interfaces to define the error estimation method and selection of refinement pattern. Secondly, the framework was ported to parallel processing using the object oriented framework OOPar. The intent of parallelizing the adaptive process is to reduce the time spent in error estimation and choice of the optimal refinement pattern and thus bring adaptivity to a level where it can be used as a routine analysis method. Both error estimation and choice of refinement pattern are implemented on a shared and/or distributed machine. Finally, a methodology was developed to extend the h-adaptive refinement process based on refinement patterns. Together with the implementation of refinement patterns, a procedure was developed to check on the compatibility of refinement patterns of two neighboring elements. The choice of the "best" refinement patterns is one of the main challenges of adaptive methods (Zienkiewicz [55]). The availability of different ways of refining elements increases the flexibility of the code, but also introduces the challenge of deciding which pattern is the "best" pattern. It is possible that the combination of optimized h-refinement together with choice of h and/or p refinement may lead to very efficient approximation spaces for a given problem. / Doutorado / Estruturas / Doutor em Engenharia Civil Programação paralela (Computação) Método dos elementos finitos Parallel programming (Computer science) Finite element method
385	Técnicas de computação paralela aplicadas ao método das características em sistemas hidráulicos = Parallel computing applied to method of characteristics in hydraulic systems. / Parallel computing applied to method of characteristics in hydraulic systems Nascimento Júnior, Orlando Saraiva, 1981- 22 August 2018 (has links) Orientadores: Vitor Rafael Coluci, Lubienska Cristina Lucas Jaquiê Ribeiro / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Tecnologia / Made available in DSpace on 2018-08-22T12:49:14Z (GMT). No. of bitstreams: 1 NascimentoJunior_OrlandoSaraiva_M.pdf: 5339800 bytes, checksum: f37d5c4041d5404f5f45d33c5af054c5 (MD5) Previous issue date: 2013 / Resumo: Uma instalação hidráulica é um conjunto de dispositivos hidromecânicos e tubos com a função de transportar um fluido. O controle do escoamento deste fluido ocorre por meio de manobras nos dispositivos hidromecânicos. Uma investigação sobre o impacto das manobras destes dispositivos em uma instalação hidráulica pode evitar danos físicos ao sistema (como rompimento de tubos, por exemplo). Uma das formas de se investigar o efeito destas manobras é por meio da simulação. A simulação permite estudar um sistema hidráulico, que após uma manobra hidráulica sai de uma situação contínua (regime permanente inicial), entra em um estado transitório (regime transiente) para posteriormente entrar em uma nova situação contínua (regime permanente final). No regime de transiente hidráulico são formadas ondas de sobrepressão e subpressão internas na tubulação e que podem levar a danos. Um dos métodos mais aceitos para simulações de transiente hidráulico é o método das características, que permite transformar as equações diferenciais parciais que descrevem o fenômeno em um conjunto de equações diferenciais ordinárias. Dependendo do tamanho do sistema hidráulico (número e comprimento de tubos, número de dispositivos eletromecânicos, etc), o custo computacional pode ser elevado para se obter as informações sobre o comportamento do transiente. Neste trabalho aplicamos técnicas de computação paralela em placas de vídeos para processamento de propósito geral (GPU) e em multi-núcleos (OpenMP) para acelerar os cálculos do transiente hidráulico. Utilizamos um sistema hidráulico composto por um reservatório, uma válvula e um tubo e determinamos o ganho de desempenho em função do tamanho do tubo do sistema. A técnica OpenMP forneceu ganhos computacionais de até 3.3× enquanto a técnica envolvendo GPUs forneceu ganhos de 17×. Dessa forma, placas gráficas se mostraram muito interessantes para acelerar simulações de transientes hidráulicos com o método das características / Abstract: A hydraulic system is a set of hydromechanical devices and tubes designed to transport fluids through controlled operations. Investigating the impact of these operations on hydraulic systems can avoid physical damage to its parts (such as breakage of pipes, for example). One way to investigate these impacts is through computational simulations. The simulations allow to study a hydraulic system during initial and final steady states (after some device operation, for instance), and the transient state between them. During the hydraulic transient state, high and low pressure waves are formed in the tubes and are the main cause of tube damages. One of the most accepted methods for transient hydraulic simulations is the method of characteristics, which allows to transform the partial differential equations that describe the phenomenon in a set of ordinary differential equations. Depending on the size of the hydraulic system (number and length of tubes, number of electromechanical devices, etc), the computational cost to obtain information about the behavior of the transient can be large. In this work, we apply techniques of parallel computing involving video cards for general purpose processing (GPU) and multi-cores (OpenMP) to accelerate hydraulic transient calculations. We simulated a hydraulic system consisting of a reservoir, a valve and a pipe to determine the performance speedup as a function of the size of the pipe. The OpenMP technique provided computational speedup up to 3.3× whereas the GPU technique provided speedup of 17×. Therefore, our results indicated that GPUs are very interesting to accelerate hydraulic transients simulations using the method of characteristics / Mestrado / Tecnologia e Inovação / Mestre em Tecnologia Programação paralela (Computação) Computadores paralelos Parallel programming (Computing) Parallel computers Simulation (computers) - Fluid dynamics
386	Simulação acelerada de baixo custo para aplicações em nanoengenharia de materiais / Low cost accelerated simulation for application in nanoengineering materials Turatti, Luiz Gustavo, 1977- 23 August 2018 (has links) Orientadores: Jacobus Willibrordus Swart, Stanislav Moshkalev / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-23T22:11:44Z (GMT). No. of bitstreams: 1 Turatti_LuizGustavo_D.pdf: 35255933 bytes, checksum: dbbe11c7c0f55012ba27274415c2494d (MD5) Previous issue date: 2013 / Resumo: Este é um trabalho multidisciplinar que aborda questões de química, física, engenharia elétrica (nanoengenharia) e principalmente avanços obtidos com simulações por computador. Os programas comumente utilizados para simulações de fótons/íons focalizados em outro material consomem recursos computacionais por diversas horas ou até dias, para concluir os cálculos de determinado experimento, como a simulação de um processo efetuado com o equipamento FIB/SEM (Focused Ion Beam/Scanning Electron Miscroscopy), por exemplo. Através do uso de ambientes computacionais virtualizados, associados a programação paralela em CPU (Central Processing Unit) e GPGPU (General Purpose Graphics Processing Unit) é possível reduzir significativamente o tempo da simulação de horas para minutos, em situações de interação de partículas, que envolvem aproximação de colisões binárias (BCA, Binary Collision Approximation) e o Método de Monte Carlo (MMC), principalmente. O uso de placas gráficas (comumente utilizadas para jogos) potencializou o poder de processamento numérico para uso acadêmico a baixo custo, reduzindo o tempo para obtenção de resultados que foram comprovados experimentalmente. A utilização de programas análogos que empregam BCA e MMC, tais como TRIM/SRIM (Transport of Ions in Matter, atualizado para Stopping and Range of Ions in Matter), MCML (Monte Carlo for Multi Layered media) e CUDAMCML (Compute Unified Device Architecture, MCML) auxiliam a comparação de ganho de desempenho entre CPU e GPGPU evidenciando o melhor desempenho desta última arquitetura, com CUDA. Em simulações equivalentes com matrizes esparsas executadas em CPU e GPGPU, a redução do tempo de processamento variou entre três e quinze mil vezes, respectivamente. Com o Método de Monte Carlo, a redução foi de até cento e quarenta e uma vezes para melhores resultados. As simulações de alto desempenho e baixo custo computacional permitem antever algumas situações experimentais, diminuindo a necessidade de explorar todas as possibilidades práticas e, dessa forma, reduzindo o custo com laboratório / Abstract: This is a multidisciplinary work that addresses issues of chemistry, physics, electrical engineering (Nanoengineering) and especially advances obtained with computer simulations. Programs commonly used for simulations of photons/ions focused onto other materials consume computational resources for several hours or even days, to complete the simulations of a process performed with the equipment FIB/SEM (Focused Ion Beam/Scanning Electron Miscroscopy), for example. Through virtualized computing environments associated with parallel programming on CPU (Central Processing Unit) and GPGPU (General Purpose Graphics Processing Unit) is possible to significantly reduce the simulation total time from hours to minutes in the interactions of particles, involving binary collision approximation (BCA) and Monte Carlo method (MMC), mostly. The use of graphics cards (generaly used for games) enhanced the numerical processing power to be used in academia with low cost and reduced the time to obtain results experimentally verified. The use of similar software using BCA and MMC, such as TRIM/SRIM (Transport of Ions in Matter, upgraded to Stopping and Range of Ions in Matter), MCML (Monte Carlo for Multi Layered media) and CUDAMCML (Compute Unified Device Architecture, MCML) helped us to make a comparison of performance between CPU and GPGPU showing the best performance of the latter architecture, with CUDA. In equivalent simulations using sparse matrices in CPU and GPGPU, the time reduction of processing varied between three and fifteen thousand times, respectively. With the Monte Carlo method, reduction was up to one hundred forty one times for best results. Simulations of high performance and low computational cost allow us to predict some experimental situations, reducing the need to explore all practical possibilities and thus, reducing the lab costs / Doutorado / Eletrônica, Microeletrônica e Optoeletrônica / Doutor em Engenharia Elétrica Computação de alto desempenho Programação paralela (Computação) Simulação (Computadores) Monte Carlo, Método de Feixes de íons focalizados High performance computing Parallel programming Computer simulation Monte Carlo Method Focused ion beam
387	Enhanced SAR Image Processing Using A Heterogeneous Multiprocessor SHI, YU January 2008 (has links) Synthetic antenna aperture (SAR) is a pulses focusing airborne radar which can achieve high resolution radar image. A number of image process algorithms have been developed for this kind of radar, but the calculation burden is still heavy. So the image processing of SAR is normally performed “off-line”. The Fast Factorized Back Projection (FFBP) algorithm is considered as a computationally efficient algorithm for image formation in SAR, and several applications have been implemented which try to make the process “on-line”. CELL Broadband Engine is one of the newest multi-core-processor jointly developed by Sony, Toshiba and IBM. CELL is good at parallel computation and floating point numbers, which all fit the demands of SAR image formation. This thesis is going to implement FFBP algorithm on CELL Broadband Engine, and compare the results with pre-projects. In this project, we try to make it possible to perform SAR image formation in real-time. CELL Broadband Engine Synthetic antenna aperture C language parallel programming parallel computing Matlab Computer Engineering Datorteknik
388	Comparison of Shared memory based parallel programming models Ravela, Srikar Chowdary January 2010 (has links) Parallel programming models are quite challenging and emerging topic in the parallel computing era. These models allow a developer to port a sequential application on to a platform with more number of processors so that the problem or application can be solved easily. Adapting the applications in this manner using the Parallel programming models is often influenced by the type of the application, the type of the platform and many others. There are several parallel programming models developed and two main variants of parallel programming models classified are shared and distributed memory based parallel programming models. The recognition of the computing applications that entail immense computing requirements lead to the confrontation of the obstacle regarding the development of the efficient programming models that bridges the gap between the hardware ability to perform the computations and the software ability to support that performance for those applications [25][9]. And so a better programming model is needed that facilitates easy development and on the other hand porting high performance. To answer this challenge this thesis confines and compares four different shared memory based parallel programming models with respect to the development time of the application under a shared memory based parallel programming model to the performance enacted by that application in the same parallel programming model. The programming models are evaluated in this thesis by considering the data parallel applications and to verify their ability to support data parallelism with respect to the development time of those applications. The data parallel applications are borrowed from the Dense Matrix dwarfs and the dwarfs used are Matrix-Matrix multiplication, Jacobi Iteration and Laplace Heat Distribution. The experimental method consists of the selection of three data parallel bench marks and developed under the four shared memory based parallel programming models considered for the evaluation. Also the performance of those applications under each programming model is noted and at last the results are used to analytically compare the parallel programming models. Results for the study show that by sacrificing the development time a better performance is achieved for the chosen data parallel applications developed in Pthreads. On the other hand sacrificing a little performance data parallel applications are extremely easy to develop in task based parallel programming models. The directive models are moderate from both the perspectives and are rated in between the tasking models and threading models. / From this study it is clear that threading model Pthreads model is identified as a dominant programming model by supporting high speedups for two of the three different dwarfs but on the other hand the tasking models are dominant in the development time and reducing the number of errors by supporting high growth in speedup for the applications without any communication and less growth in self-relative speedup for the applications involving communications. The degrade of the performance by the tasking models for the problems based on communications is because task based models are designed and bounded to execute the tasks in parallel without out any interruptions or preemptions during their computations. Introducing the communications violates the purpose and there by resulting in less performance. The directive model OpenMP is moderate in both aspects and stands in between these models. In general the directive models and tasking models offer better speedup than any other models for the task based problems which are based on the divide and conquer strategy. But for the data parallelism the speedup growth however achieved is low (i.e. they are less scalable for data parallel applications) are equally compatible in execution times with threading models. Also the development times are considerably low for data parallel applications this is because of the ease of development supported by those models by introducing less number of functional routines required to parallelize the applications. This thesis is concerned about the comparison of the shared memory based parallel programming models in terms of the speedup. This type of work acts as a hand in guide that the programmers can consider during the development of the applications under the shared memory based parallel programming models. We suggest that this work can be extended in two different ways: one is from the developer‘s perspective and the other is a cross-referential study about the parallel programming models. The former can be done by using a similar study like this by a different programmer and comparing this study with the new study. The latter can be done by including multiple data points in the same programming model or by using a different set of parallel programming models for the study. / C/O K. Manoj Kumar; LGH 555; Lindbloms Vägan 97; 37233; Ronneby. Phone no: 0738743400 Home country phone no: +91 9948671552 Parallel Programming models Distributed memory Shared memory Dwarfs Development time Speedup Data parallelism Dense Matrix dwarfs threading models Tasking models Directive models. Computer Sciences Datavetenskap (datalogi)
389	Modeling Intel® Cilk™ Plus Programs with Unified Modeling Languages Ata-Ul-Nasar, Mansoor January 2015 (has links) Recently multi-core processors have become very popular in computer systems. It allows multiple threads to be executed simultaneously. The advantage of multi-core comes by parallelizing codes to expand the work across hardware. Furthermore, this can be done by using a parallel environment developed by M.I.T. called Intel Cilk Plus, which is design to provide an easy and well-structured parallel programming approach. Intel Cilk Plus is an extension of C and C++ programming languages that describes data parallelism. This extension is very helpful and easy to use among other languages in this field. It has different features including keywords, reducers and array notations etc. In general, this article describes Intel Cilk Plus and its features. In addition, Unified Modelling Language, activity diagrams are used in term of graphical modelling of Intel Cilk Plus by describing the process of a system, capturing the dynamic behaviour of it and representing the flow from one activity to another using control flow. Later on Intel Cilk Plus keywords and UML diagrams tools will be evaluated, a comparison of different UML modelling tools will also be provided. Parallel Programming Intel Cilk Plus Unified Modelling Languages Activity Models Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap Software Engineering Programvaruteknik
390	Passage à l'echelle d'un support d'exécution à base de tâches pour l'algèbre linéaire dense / Scalability of a task-based runtime system for dense linear algebra applications Sergent, Marc 08 December 2016 (has links) La complexification des architectures matérielles pousse vers l’utilisation de paradigmes de programmation de haut niveau pour concevoir des applications scientifiques efficaces, portables et qui passent à l’échelle. Parmi ces paradigmes, la programmation par tâches permet d’abstraire la complexité des machines en représentant les applications comme des graphes de tâches orientés acycliques (DAG). En particulier, le modèle de programmation par tâches soumises séquentiellement (STF) permet de découpler la phase de soumission des tâches, séquentielle, de la phase d’exécution parallèle des tâches. Même si ce modèle permet des optimisations supplémentaires sur le graphe de tâches au moment de la soumission, il y a une préoccupation majeure sur la limite que la soumission séquentielle des tâches peut imposer aux performances de l’application lors du passage à l’échelle. Cette thèse se concentre sur l’étude du passage à l’échelle du support d’exécution StarPU (développé à Inria Bordeaux dans l’équipe STORM), qui implémente le modèle STF, dans le but d’optimiser les performances d’un solveur d’algèbre linéaire dense utilisé par le CEA pour faire de grandes simulations 3D. Nous avons collaboré avec l’équipe HiePACS d’Inria Bordeaux sur le logiciel Chameleon, qui est une collection de solveurs d’algèbre linéaire portés sur supports d’exécution à base de tâches, afin de produire un solveur d’algèbre linéaire dense sur StarPU efficace et qui passe à l’échelle jusqu’à 3 000 coeurs de calcul et 288 accélérateurs de type GPU du supercalculateur TERA-100 du CEA-DAM. / The ever-increasing supercomputer architectural complexity emphasizes the need for high-level parallel programming paradigms to design efficient, scalable and portable scientific applications. Among such paradigms, the task-based programming model abstracts away much of the architecture complexity by representing an application as a Directed Acyclic Graph (DAG) of tasks. Among them, the Sequential-Task-Flow (STF) model decouples the task submission step, sequential, from the parallel task execution step. While this model allows for further optimizations on the DAG of tasks at submission time, there is a key concern about the performance hindrance of sequential task submission when scaling. This thesis’ work focuses on studying the scalability of the STF-based StarPU runtime system (developed at Inria Bordeaux in the STORM team) for large scale 3D simulations of the CEA which uses dense linear algebra solvers. To that end, we collaborated with the HiePACS team of Inria Bordeaux on the Chameleon software, which is a collection of linear algebra solvers on top of task-based runtime systems, to produce an efficient and scalable dense linear algebra solver on top of StarPU up to 3,000 cores and 288 GPUs of CEA-DAM’s TERA-100 cluster. Calcul haute performance Supports d’exécution Calcul distribué Programmation par tâches Modèles de programmation parallèle High performance computing Run-time systems Distributed computing Task-based programming Parallel programming models

Search results