Spelling suggestions: "subject:"arallel programming"" "subject:"arallel erogramming""
431 |
Otimização de algoritmo evolucionário multiobjetivo paralelo para a geração automática de projetos de iluminação de áreas externas / Optimization evolutionary algorithms multiobjective parallel to generate automated lighting outdoors designsRocha, Hugo Xavier 20 November 2015 (has links)
This paper presents the study of Parallel Multiobjective Evolutionary Algorithms to
enable the automation of exterior lighting designs by computers and results in an optimized
version of the algorithm. The resulting algorithm basically works with variable length
chromosomes and for which intrinsic operators of crossover and mutation were created.
The fitness function was determined through a statistical evaluation method (difference of
means), thus enabling the comparison of how different options of fitness functions could
impact the performance of the proposed parallel multi-objective evolutionary algorithm.
The chosen fitness function enables to develop more efficiently automated designs for exterior
lighting. Moreover, adding to the proposed evolutionary algorithm, an application
was developed in which the user chooses which the heights of the poles, lamps and fixtures
to use and also the layout of the area to be illuminated (allowed to be irregular). Within
this area, can be defined sub-areas where there are restrictions on the placement of lighting
poles. The user must be set average illumination with a respective tolerance range,
though. As a case study, the area of an airport parking lot in the city of Uberlândia-MG
(Brazil) is presented. Evolved designs present a low coefficient of variation evaluated for
30 runs. This demonstrates that the system is converging on designs for similar metrics.
By identifying the worst and the best of designs achieved by the system for those executions,
one could note that there are savings regarding installed capacity when compared to
the design of reference: 37.5 % for the worst evolved design and 50.0 % for the best evolved
design. Also, evolved designs have better lighting uniformity and energy efficiency,
as well as their respective quantities of lighting poles have decreased. / Este trabalho apresenta o estudo de um Algoritmo Evolucionário Multiobjetivo Paralelo
que viabiliza a criação de projetos de iluminação de áreas externas automatizadas
por computador e que resulta em uma versão otimizada desse algoritmo. O algoritmo
resultante, essencialmente, trabalha com cromossomos de tamanho variável e para os
quais foram criados operadores intrínsecos de cruzamento e mutação. A determinação
da função de aptidão ocorreu por meio do método de avaliação estatística (diferença de
médias), possibilitando, assim, a comparação de diferentes opções das funções de aptidão
no desempenho do algoritmo evolucionário multiobjetivo paralelo proposto. Com a função
escolhida, tornou-se possível construir projetos automatizados de iluminação externa
de forma mais eficiente. Além disso, por meio do algoritmo evolucionário proposto, foi
desenvolvida uma aplicação, pela qual o usuário escolhe quais as alturas dos postes, lâmpadas
e luminárias que deseja utilizar e também o layout de área a ser iluminada (mesmo
que irregular). Dentro dessa área, podem ser definidas subáreas onde existem restrições
quanto à colocação de postes de iluminação. O usuário deve definir a iluminação média
associada à sua respectiva tolerância, ou faixa de variação. Como estudo de caso, é apresentada
a área de um estacionamento do aeroporto da cidade de Uberlândia, MG. Os
projetos desenvolvidos, apresentam um baixo coeficiente de variação calculado a partir
de 30 execuções. Isso demonstra que o sistema está convergindo para projetos com métricas
similares. Ao identificar o pior e o melhor dos projetos apresentados como solução
pelo sistema para essas execuções, pode-se notar que apresentam economia nas potências
instaladas quando comparados ao projeto de referência: 37,5% no pior dos projetos e
50% no melhor projeto apresentado. Além disso, constataram-se melhores uniformidades para iluminação e maiores eficiências energéticas, bem como a diminuição das respectivas
quantidades de unidades de iluminação. / Doutor em Ciências
|
432 |
Athapascan-0 : exploitation de la multiprogrammation légère sur grappes de multiprocesseursCarissimi, Alexandre da Silva January 1999 (has links)
L'accroissement d'efficacite des réseaux d'interconnexion et la vulgarisation des machines multiprocesseurs permettent la réalisation de machines parallèles a mémoire distribuée de faible coût: les grappes de multiprocesseurs. Elles nécessitent l'exploitation à la fois du parallélismeà grain fin, interne à un multiprocesseur offert par la multiprogrammation légère, et du parallélisme à gros grain entre les différents multiprocesseurs. L'exploitation simultanée de ces deux types de parallélisme exige une méthode de communication entre les processus légers qui ne partagent pas le mêmme espace d'adressage. Le travail de cette thèse porte sur le problème de l'Intégration de la multiprogrammation légère et des communications sur grappes de multiprocesseurs symétriques (SMP). II porte plus précisément sur evaluation et le reglage du noyau exécutif ATHAPASCAN-0 sur ce type d'architecture. ATHAPASCAN-0 est un noyau exécutif, portable, développé au sein du projet APACHE (CNRS-INPG-INRIA-UJF), qui combine la multiprogrammation légère et la communication par échange de messages. La portabilité est assurée par une organisation en couches basée sur les standards POSIX threads et MPI largement répandus. ATHAPASCAN-0 étend le modèle de réseau statique de processus «lourds» communicants tel que MPI, PVM, etc,à celui d'un réseau dynamique de processus légers communicants. La technique de base est la multiprogrammation lègere des communications et des calculs. La progression des communications exige la scrutation de état du reseau et l'enchainement des opérations de transferts. L'efficacité repose sur la minimisation de ces opérations. De plus, l'emploi de multiprocesseurs ajoute des problèmes spécifiques dus à l'apparition d'un parallélisme réel entre calcul et communication. Ces problèmes sont présentés et des solutions sont proposées pour l'environnement ATHAPASCAN-0. Ces solutions sont évaluées sur des grappes de multiprocesseurs. / The continuous price reduction for commodity PC multiprocessors and the availability of fast network interfaces have made cluster of multiprocessors an attractive low-price alternative to build parallel systems. Multiprocessor clusters offer two levels of parallelism: a fine grain parallelism inside a single multiprocessor and a coarse grain among them. A mechanism must be provided to exploit both levels of parallelism simultaneously. This requires to provide communications between threads belonging to different addresses spaces. This dissertation addresses the problem of integrating threads and communications on ATHAPASCAN-0 run time system. ATHAPASCAN-0 is a portable run time for cluster of multiprocessors developed as part of the APACHE project (CNRS-INPG-INRIA-UJF). Portability is achieved by a layered organization based on standards like POSIX threads and MPI. The ATHAPASCAN-0 run time system extends the heavy-weight process communication model of message passing libraries such as MPI, PVM, etc, into a lighter dynamic network of communicating threads. Multiprogramming is the key concept used. Communication progress is based on a network polling basis to handle incoming messages and to deliver outgoing communications requests. Performance is strongly dependent on the way these operations are implemented. Additionally, multiprocessors introduce some programming problems like overhead of cache coherency mechanisms, method of managing concurrent accesses and efficient mutex locking to avoid unnecessary context switching. These problems are analyzed and solutions are implemented in the ATHAPASCAN-0 run time system. An evaluation of these solutions is performed on a cluster of multiprocessors.
|
433 |
GPUHELP: um ambiente de apoio à execução de programas paralelos em arquiteturas de GPU / GPUHELP: an environment supporting to execution of parallel programs for GPU architecturesBorges, Douglas Pires 07 March 2014 (has links)
Faced with complex problems that involve scientific applications, researchers are looking
for new ways to optimize the processing of these, using new concepts and paradigms for parallel
and distributed programming. An emerging alternative to this scenario is the use of GPUs
(Graphics Processing Unit) due to its high computational power. However, along with the benefits
from the use of such techniques has been diverse and complex issues related to teaching
and learning from them. Thus, researchers began to devote efforts to obtain better results in
teaching these areas. So, the environments to support teaching of parallel programming have
emerged. Such environments provide a set of tools for the development and testing of applications,
thereby improving the educational experience. However, the current researches focuses
on environments supporting teaching parallel programming for CPU architectures, not exist
environments to teaching support teaching oriented architectures GPU. The absence of such
environments has a negative impact, proven in various scientific researches. In this context, this
work presents an environment for supporting parallel programming in GPU, called GPUHelp.
The GPUHelp provides to users a complete solution for developing and codes test for GPU
architectures, the CUDA and OpenCL, even for those users that do not have graphics cards on
their computers, which was not possible before, given the need to graphics card compatible with
such architectures. Evaluations have shown that GPUHelp is a feasible solution with different
applicability scenarios in education and training on parallel programming GPU. / Frente às complexas dificuldades que envolvem as aplicações científicas, pesquisadores
buscam novos meios de otimizar o processamento destas, utilizando-se de novos conceitos e
paradigmas em programação paralela e distribuída. Uma alternativa emergente a este cenário, é
a utilização de GPUs (Graphics Processing Unit) devido a seu alto poder computacional. Contudo,
juntamente com os benefícios advindos da utilização de tais técnicas, tem-se diversas e
complexas questões relacionadas ao ensino e aprendizado das mesmas. Desse modo, pesquisadores
passaram a dedicar esforços para obter um melhor resultado no ensino destas áreas.
Assim, surgiram os ambientes de apoio ao ensino de programação paralela. Tais ambientes provêem
um conjunto de ferramentas para o desenvolvimento e teste de aplicações, aprimorando
assim a experiência educacional. Entretanto, as pesquisas atuais focam em ambientes de apoio
a programação paralela para arquiteturas de CPU, não existindo assim, ambientes de apoio voltados
as arquiteturas de GPU. A inexistência de tais ambientes tem impacto negativo, durante o
processo de aprendizado, comprovado em diferentes pesquisas científicas. Neste contexto, este
trabalho apresenta um ambiente de apoio a programação paralela em GPU, intitulado GPUHelp.
O GPUHelp proporciona aos usuários uma solução completa para o desenvolvimento e teste
de códigos para arquiteturas de GPU, o CUDA e OpenCL, mesmo para aqueles usuários que
não possuem placas gráficas em seus computadores, o que não era possível até então, visto a
necessidade de uma placa gráfica compatível com tais arquiteturas. As avaliações realizadas
demonstraram que o GPUHelp é uma solução viável com aplicabilidades distintas nos cenários
de ensino e treinamento de programação paralela em GPU.
|
434 |
XFOR (Multifor) : A new programming structure to ease the formulation of efficient loop optimizations / XFOR (Multifor) : nouvelle structure de programmation pour faciliter la formulation des optimisations efficaces de bouclesFassi, Imen 27 November 2015 (has links)
Nous proposons une nouvelle structure de programmation appelée XFOR (Multifor), dédiée à la programmation orientée réutilisation de données. XFOR permet de gérer simultanément plusieurs boucles "for" ainsi que d’appliquer/composer des transformations de boucles d’une façon intuitive. Les expérimentations ont montré des accélérations significatives des codes XFOR par rapport aux codes originaux, mais aussi par rapport au codes générés automatiquement par l’optimiseur polyédrique de boucles Pluto. Nous avons mis en œuvre la structure XFOR par le développement de trois outils logiciels: (1) un compilateur source-à-source nommé IBB, qui traduit les codes XFOR en un code équivalent où les boucles XFOR ont été remplacées par des boucles for sémantiquement équivalentes. L’outil IBB bénéficie également des optimisations implémentées dans le générateur de code polyédrique CLooG qui est invoqué par IBB pour générer des boucles for à partir d’une description OpenScop; (2) un environnement de programmation XFOR nommé XFOR-WIZARD qui aide le programmeur dans la ré-écriture d’un programme utilisant des boucles for classiques en un programme équivalent, mais plus efficace, utilisant des boucles XFOR; (3) un outil appelé XFORGEN, qui génère automatiquement des boucles XFOR à partir de toute représentation OpenScop de nids de boucles transformées générées automatiquement par un optimiseur automatique. / We propose a new programming structure named XFOR (Multifor), dedicated to data-reuse aware programming. It allows to handle several for-loops simultaneously and map their respective iteration domains onto each other. Additionally, XFOR eases loop transformations application and composition. Experiments show that XFOR codes provides significant speed-ups when compared to the original code versions, but also to the Pluto optimized versions. We implemented the XFOR structure through the development of three software tools: (1) a source-to-source compiler named IBB for Iterate-But-Better!, which automatically translates any C/C++ code containing XFOR-loops into an equivalent code where XFOR-loops have been translated into for-loops. IBB takes also benefit of optimizations implemented in the polyhedral code generator CLooG which is invoked by IBB to generate for-loops from an OpenScop specification; (2) an XFOR programming environment named XFOR-WIZARD that assists the programmer in re-writing a program with classical for-loops into an equivalent but more efficient program using XFOR-loops; (3) a tool named XFORGEN, which automatically generates XFOR-loops from any OpenScop representation of transformed loop nests automatically generated by an automatic optimizer.
|
435 |
Programmation des architectures hiérarchiques et hétérogènes / Programming hierarxchical and heterogenous machinesHamidouche, Khaled 10 November 2011 (has links)
Les architectures de calcul haute performance de nos jours sont des architectures hiérarchiques et hétérogènes: hiérarchiques car elles sont composées d’une hiérarchie de mémoire, une mémoire distribuée entre les noeuds et une mémoire partagée entre les coeurs d’un même noeud. Hétérogènes due à l’utilisation des processeurs spécifiques appelés Accélérateurs tel que le processeur CellBE d’IBM et les CPUs de NVIDIA. La complexité de maîtrise de ces architectures est double. D’une part, le problème de programmabilité: la programmation doit rester simple, la plus proche possible de la programmation séquentielle classique et indépendante de l’architecture cible. D’autre part, le problème d’efficacité: les performances doivent êtres proches de celles qu’obtiendrait un expert en écrivant le code à la main en utilisant des outils de bas niveau. Dans cette thèse, nous avons proposé une plateforme de développement pour répondre à ces problèmes. Pour cela, nous proposons deux outils : BSP++ est une bibliothèque générique utilisant des templates C++ et BSPGen est un framework permettant la génération automatique de code hybride à plusieurs niveaux de la hiérarchie (MPI+OpenMP ou MPI + Cell BE). Basée sur un modèle hiérarchique, la bibliothèque BSP++ prend les architectures hybrides comme cibles natives. Utilisant un ensemble réduit de primitives et de concepts intuitifs, BSP++ offre une simplicité d'utilisation et un haut niveau d' abstraction de la machine cible. Utilisant le modèle de coût de BSP++, BSPGen estime et génère le code hybride hiérarchique adéquat pour une application donnée sur une architecture cible. BSPGen génère un code hybride à partir d'une liste de fonctions séquentielles et d'une description de l'algorithme parallèle. Nos outils ont été validés sur différentes applications de différents domaines allant de la vérification et du calcul scientifique au traitement d'images en passant par la bioinformatique. En utilisant une large sélection d’architecture cible allant de simple machines à mémoire partagée au machines Petascale en passant par les architectures hétérogènes équipées d’accélérateurs de type Cell BE. / Today’s high-performance computing architectures are hierarchical and heterogeneous. With a hierarchy of memory, they are composed of distributed memory between nodes and shared memory between cores of the same node. heterogeneous due to the use of specific processors called accelerators such as the CellBE IBM processor and/or NVIDIA GPUs. The programming complexity of these architectures is twofold. On the one hand, the problem of programmability: the programming should be simple, as close as possible to the conventional sequential programming and independent of the target architecture. On the other hand, the problem of efficiency: performance should be similar to those obtained by a expert in writing code by hand using low-level tools. In this thesis, we proposed a development platform to address these problems. For this, we propose two tools: BSP++ is a generic library using C++ templates and BSPGen is a framework for the automatic hybrid multi-level hierarchy (MPI + OpenMP or MPI + Cell BE) code generation.Based on a hierarchical model, the BSP++ library takes the hybrid architectures as native targets. Using a small set of primitives and intuitive concepts, BSP++ provides a simple way to use and a high level of abstraction of the target machine. Using the cost model of BSP++, BSPGen predicts and generates the appropriate hierarchical hybrid code for a given application on target architecture. BSPGen generates hybrid code from a sequential list of functions and a description of the parallel algorithm.Our tools have been validated with various applications in different fields ranging from verification to scientific computing and image processing through bioinformatics. Using a wide selection of target architecture ranging from simple shared memory machines to Petascale machines through the heterogeneous architectures equipped with Cell BE accelerators.
|
436 |
Programmtransformationen für Vielteilchensimulationen auf Multicore-RechnernSchwind, Michael 01 December 2010 (has links)
In dieser Dissertation werden Programmtransformationen für die Klasse
der regulär-irregulären Schleifenkomplexe, welche typischerweise in
komplexen Simulationscodes für Vielteilchensysteme auftreten,
betrachtet. Dabei wird die Effizienz der resultierenden Programme auf
modernen Multicore-Systemen untersucht. Reguläre Schleifenkomplexe
zeichnen sich durch feste Schleifengrenzen und eine regelmäßige
Struktur der Abhängigkeiten der Berechnungen aus, bei irregulären
Berechnungen sind Abhängigkeiten zwischen Berechnungen erst zur
Laufzeit bekannt und stark von den Eingabedaten abhängig. Die hier
betrachteten regulären-irregulären Berechnungen koppeln beide Arten
von Berechnungen eng. Die Herausforderung der effizienten Realisierung
regulär-irregulärer Schleifenkomplexe auf modernen Multicore-Systemen
liegt in der Kombination von Transformationstechnicken, die sowohl ein
hohes Maß an Parallelität erlauben als auch die Lokalität der
Berechnungen berücksichtigen.
Moderne Multicore-Systeme bestehen aus einer komplexen
Speicherhierachie aus privaten und gemeinsam genutzten Caches, sowie
einer gemeinsamen Speicheranbindung. Diese neuen architektonischen
Merkmale machen es notwendig Programmtransformationen erneut zu
betrachten und die Effizienz der Berechnungen neu zu bewerten. Es
werden eine Reihe von Transformationen betrachtet, die sowohl die
Reihenfolge der Berechnungen als auch die Reihenfolge der
Abspeicherung der Daten im Speicher ändern, um eine erhöhte räumliche
und zeitliche Lokalität zu erreichen.
Parallelisierung und Lokalität sind eng verknüpft und beeinflussen
gemeinsam die Effizienz von parallelen Programmen. Es werden in
dieser Arbeit verschiedene Parallelisierungsstrategien für
regulär-irreguläre Berechnungen für moderne Multicore-Systeme
betrachtet.
Einen weiteren Teil der Arbeit bildet die Betrachtung rein irregulärer
Berechnungen, wie sie typisch für eine große Anzahl von
Vielteilchensimualtionscodes sind. Auch diese Simulationscodes wurden
für Multicore-Systeme betrachtet und daraufhin untersucht, inwieweit
diese auf modernen Multicore-CPUs skalieren. Die neuartige Architektur
von Multicore-System, im besonderen die in hohem Maße geteilte
Speicherbandbreite, macht auch hier eine neue Betrachtung solcher rein
irregulärer Berechnungen notwendig. Es werden Techniken betrachtet,
die die Anzahl der zu ladenden Daten reduzieren und somit die
Anforderungen an die gemeinsame Speicherbandbreite reduzieren.
|
437 |
[en] PARALLEL PROGRAMING IN THE REDIS KEY-VALUE DATASTORE / [pt] PROGRAMAÇÃO PARALELA NO BANCO DE DADOS CHAVE-VALOR REDISJUAREZ DA SILVA BOCHI 12 April 2016 (has links)
[pt] Redis é um banco de dados chave-valor de código livre que dá suporte à
avaliação de scripts Lua, mas sua implementação utiliza apenas uma tarefa
de sistema operacional. Scripts longos são desencorajados porque a avaliação
do código é bloqueante, o que pode causar degradação de desempenho para
os demais usuários. Através da aplicação do modelo de concorrência M:N,
que combina tarefas de nível de sistema operacional com tarefas do nível
de usuário, adicionamos no Redis a capacidade de execução de scripts em
paralelo, permitindo que todos os núcleos do servidor sejam explorados.
Com a utilização de corotinas Lua, implementamos um escalonador capaz
de alocar e suspender a execução de tarefas de nível de usuário nos núcleos
disponíveis sem necessidade de alteração do código dos scripts. Este modelo
permitiu proteger o programador das complexidades naturais do paralelismo
como sincronização no acesso a recursos compartilhados e escalonamento
das tarefas. / [en] Redis is an open source key-value database that supports Lua programming
language scripts, but it s implementation is single threaded. Long running
scripts are discouraged because script evaluation is blocking, which may
cause service levels deterioration. Applying the M:N threading model,
which combines user and operating system threads, we added to Redis the
ability of running scripts in parallel, leveraging all server cores.With the use
of Lua coroutines, we implemented a scheduler able to allocate and suspend
user-level tasks in the available cores without the need of changing scripts
source code. The M:N model allowed us to protect the programmer from the
natural complexities that arise from parallel programming, such as access
to shared resources synchronization and scheduling of tasks into different
operational system threads.
|
438 |
Machine Vision Assisted In Situ Ichthyoplankton Imaging SystemIyer, Neeraj 12 July 2013 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Recently there has been a lot of effort in developing systems for sampling and automatically classifying plankton from the oceans. Existing methods assume the specimens have already been precisely segmented, or aim at analyzing images containing single specimen (extraction of their features and/or recognition of specimens as single targets in-focus in small images). The resolution in the existing systems is limiting. Our goal is to develop automated, very high resolution image sensing of critically important, yet under-sampled, components of the planktonic community by addressing both the physical sensing system (e.g. camera, lighting, depth of field), as well as crucial image extraction and recognition routines. The objective of this thesis is to develop a framework that aims at (i) the detection and segmentation of all organisms of interest automatically, directly from the raw data, while filtering out the noise and out-of-focus instances, (ii) extract the best features from images and (iii) identify and classify the plankton species. Our approach focusses on utilizing the full computational power of a multicore system by implementing a parallel programming approach that can process large volumes of high resolution plankton images obtained from our newly designed imaging system (In Situ Ichthyoplankton Imaging System (ISIIS)). We compare some of the widely used segmentation methods with emphasis on accuracy and speed to find the one that works best on our data. We design a robust, scalable, fully automated system for high-throughput processing of the ISIIS imagery.
|
439 |
Parallélisation de simulations interactives de champs ultrasonores pour le contrôle non destructif / Parallelization of ultrasonic field simulations for non destructive testingLambert, Jason 03 July 2015 (has links)
La simulation est de plus en plus utilisée dans le domaine industriel du Contrôle Non Destructif. Elle est employée tout au long du processus de contrôle, que ce soit pour en accélérer la mise au point ou en comprendre les résultats. Les travaux menés au cours de cette thèse présentent une méthode de calcul rapide de champ ultrasonore rayonné par un capteur multi-éléments dans une pièce isotrope, permettant un usage interactif des simulations. Afin de tirer parti des architectures parallèles communément disponibles, un modèle régulier (qui limite au maximum les branchements divergents) dérivé du modèle générique présent dans la plateforme logicielle CIVA a été mis au point. Une première implémentation de référence a permis de le valider par rapport aux résultats CIVA et d'analyser son comportement en termes de performances. Le code a ensuite été porté et optimisé sur trois classes d'architectures parallèles aujourd'hui disponibles dans les stations de calcul : le processeur généraliste central (GPP), le coprocesseur manycore (Intel MIC) et la carte graphique (nVidia GPU). Concernant le processeur généraliste et le coprocesseur manycore, l'algorithme a été réorganisé et le code implémenté afin de tirer parti des deux niveaux de parallélisme disponibles, le multithreading et les instructions vectorielles. Sur la carte graphique, les différentes étapes de simulation de champ ont été découpées en une série de noyaux CUDA. Enfin, des bibliothèques de calculs spécifiques à ces architectures, Intel MKL et nVidia cuFFT, ont été utilisées pour effectuer les opérations de Transformées de Fourier Rapides. Les performances et la bonne adéquation des codes produits ont été analysées en détail pour chaque architecture. Dans plusieurs cas, sur des configurations de contrôle réalistes, des performances autorisant l'interactivité ont été atteintes. Des perspectives pour traiter des configurations plus complexes sont dressées. Enfin la problématique de l'industrialisation de ce type de code dans la plateforme logicielle CIVA est étudiée. / The Non Destructive Testing field increasingly uses simulation.It is used at every step of the whole control process of an industrial part, from speeding up control development to helping experts understand results. During this thesis, a simulation tool dedicated to the fast computation of an ultrasonic field radiated by a phase array probe in an isotropic specimen has been developped. Its performance enables an interactive usage. To benefit from the commonly available parallel architectures, a regular model (aimed at removing divergent branching) derived from the generic CIVA model has been developped. First, a reference implementation was developped to validate this model against CIVA results, and to analyze its performance behaviour before optimization. The resulting code has been optimized for three kinds of parallel architectures commonly available in workstations: general purpose processors (GPP), manycore coprocessors (Intel MIC) and graphics processing units (nVidia GPU). On the GPP and the MIC, the algorithm was reorganized and implemented to benefit from both parallelism levels, multhreading and vector instructions. On the GPU, the multiple steps of field computing have been divided in multiple successive CUDA kernels.Moreover, libraries dedicated to each architecture were used to speedup Fast Fourier Transforms, Intel MKL on GPP and MIC and nVidia cuFFT on GPU. Performance and hardware adequation of the produced algorithms were thoroughly studied for each architecture. On multiple realistic control configurations, interactive performance was reached. Perspectives to adress more complex configurations were drawn. Finally, the integration and the industrialization of this code in the commercial NDT plateform CIVA is discussed.
|
440 |
Environnement d'exécution parallèle : conception et architectureCosta, Celso Maciel da January 1993 (has links)
L'objectif de cette thèse est l'étude d'un environnement d'exécution pour machines parallèles sans mémoire commune. Elle comprend la définition d'un modèle de programme parallèle, basé sur l'échange de message offrant une forme restreinte de mémoire partagée. La communication est indirecte, via des portes; les processus utilisent les barrières pour la synchronisation. Les entités du système. processus, portes et barrières, sont créées dynamiquement, et placées sur un processeur quelconque du réseau de processeurs de façon explicite. Nous proposons une implantation de ce modèle comme la mise en oeuvre systématique d'une architecture client/serveur. Cette implantation a été efféctuée sur une machine Supemode. La base est un Micro Noyau Parallèle, où le composant principal est un mécanisme d'appel de procédure à distance minimal. / This thesis describes an execution environment for parallel machines without shared memory. A parallel programming model based on message passing, with a special shared memory. In this model, process communication occurs indirectly, via ports, and the processes use barriers for synchronization. All the entities of the system, such as processes, ports and barriers, are created dynamically and loaded on any processor of the network of processors. The implementation architecture of our model is a systematic realization of the client/server model. An implementation is proposed in a Supernode parallel machine as a parallel micro kernel. The principal parallel micro kernel component is a minimal remote procedure call mechanism.
|
Page generated in 0.0803 seconds