• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 67
  • 20
  • 16
  • 9
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 143
  • 59
  • 42
  • 35
  • 26
  • 21
  • 17
  • 15
  • 15
  • 14
  • 14
  • 14
  • 14
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Construção automática de mosaicos de imagens digitais aéreas agrícolas utilizando transformada SIFT e processamento paralelo / Automatic construction of mosaics from aerial digital images agricultural using SIFT transform and parallel processing

André de Souza Tarallo 26 August 2013 (has links)
A construção automática de grandes mosaicos a partir de imagens digitais de alta resolução é uma área de grande importância, encontrando aplicações em diferentes áreas, como agricultura, meteorologia, sensoriamento remoto e biológicas. Na agricultura, a eficiência no processo de tomada de decisão para controle de pragas, doenças ou queimadas está relacionada com a obtenção rápida de informações. Até o presente momento este controle vem sendo feito de maneira semiautomática, necessitando obter o modelo digital do terreno, fazer a ortorretificação de imagens, inserir marcações manualmente no solo, para que um software possa construir um mosaico de maneira convencional. Para automatizar este processo, o presente projeto propõe três metodologias (1, 2, 3) baseadas em algoritmos já consolidados na literatura (SIFT, BBF e RANSAC) e processamento paralelo (OpenMP), utilizando imagens aéreas agrícolas de alta resolução, de pastagens e plantações de diversas culturas. As metodologias diferem na maneira como os cálculos são realizados para a construção dos mosaicos. Construir mosaicos com este padrão de imagem não é uma tarefa trivial, pois requer alto esforço computacional para processamento destas imagens. As metodologias incluem um pré-processamento das imagens para minimizar possíveis distorções que surgem no processo de aquisição de imagens e contém também algoritmos para suavização das emendas das imagens no mosaico. A base de imagens, denominada base de imagens sem redimensionamento, foi construída a partir de imagens com dimensão de 2336 x 3504 pixels (100 imagens divididas em 10 grupos de 10 imagens), obtidas na região de Santa Rita do Sapucaí - MG, as quais foram utilizadas para validar a metodologia. Outra base de imagens, referida como base de imagens redimensionada, contêm 200 imagens de 533 x 800 pixels (10 grupos de 20 imagens) e foi utilizada para avaliação de distorção para comparação com os softwares livres Autostitch e PTGui, os quais possuem parâmetros pré-definidos para a resolução de 533 x 800 pixels. Os resultados do tempo de processamento sequencial para as três metodologias evidenciaram a metodologia 3 com o menor tempo, sendo esta 63,5% mais rápida que a metodologia 1 e 44,5% do que a metodologia 2. O processamento paralelo foi avaliado para um computador com 2, 4 e 8 threads (4 núcleos físicos e 4 núcleos virtuais), reduzindo em 60% o tempo para a construção dos mosaicos de imagens para a metodologia 1. Verificou-se que um computador com 4 threads (núcleos físicos) é o mais adequado em tempo de execução e Speedup, uma vez que quando se utilizam 8 threads são incluídos as threads virtuais. Os resultados dos testes de distorção obtidos evidenciam que os mosaicos gerados com a metodologia 1 apresentam menores distorções para 7 grupos de imagens em um total de 10. Foram também avaliadas as distorções nas junções de cinco mosaicos constituídos apenas por pares de imagens utilizando a metodologia 3, evidenciando que a metodologia 3 apresenta menor distorção para 4 mosaicos, em um total de 5. O presente trabalho contribui com a metodologia 2 e 3, com a minimização das distorções das junções do mosaico, com o paralelismo em OpenMP e com a avaliação de paralelismo com MPI. / The automatic construction of large mosaics from high resolution digital images is an area of great importance, which finds applications in different areas, especially agriculture, meteorology, biology and remote sensing. In agriculture, the efficiency of decision making is linked to obtaining faster and more accurate information, especially in the control of pests, diseases or fire control. So far this has been done semiautomatically and it is necessary to obtain the digital terrain model, do the orthorectification of images, insert markings on the ground by manual labor, so that software can build a mosaic in the conventional way. To automate this process, this project proposes three methodologies (1, 2, 3) based on algorithms already well-established in the literature (SIFT, BBF e RANSAC) and parallel processing (OpenMP), using high resolution/size aerial images agricultural of pasture and diverse cultures. The methodologies differ in how the calculations are performed for the construction of mosaics. Build mosaics with this kind of picture isn´t a trivial task, as it requires high computational effort for processing these images. The methodologies include a pre-processing of images to minimize possible distortions that arise in the process of image acquisition and also contain algorithms for smoothing the seams of the images in the mosaic. The image database, called image database without scaling, was constructed from images with dimensions of 2336 x 3504 pixels (100 images divided into 10 groups of 10 pictures), obtained in the region of Santa Rita do Sapucaí - MG, which were used to validate the methodology. Another image database, referred to as base images resize, contains 200 images of 533 x 800 pixels (10 groups of 20 pictures). It was used for evaluation of distortion compared to the free softwares Autostitch and PTGui, which have pre-defined parameters for the resolution of 533 x 800 pixels. The results of sequential processing time for the three methodologies showed the methodology 3 with the shortest time, which is 63.5% faster than the methodology 1 and 44.5% faster than the methodology 2. Parallel processing was evaluated for a computer with 2, 4 and 8 threads (4 physical cores and 4 virtual cores), reducing by 60% the time to build the mosaics of images for the methodology 1. It was found that a computer with 4 threads (physical cores) is most appropriate in execution time and Speedup, since when using 8 threads, threads virtual are included. The test results of distortion show that the mosaics generated with the methodology 1 have lower distortion for 7 groups of images in a total of 10. Distortions at the junctions of five mosaics consisting only of pairs of images were also evaluate utilizing the methodology 3, indicating that the methodology 3 has less distortion for 4 mosaics, for a total of 5. Contributions of this work have been the methodologies 2 and 3, with the distortions minimization of the mosaic junction, the parallelism in OpenMP and the assessment of parallelism with MPI.
52

Formalisation et automatisation de YAO, générateur de code pour l’assimilation variationnelle de données

Nardi, Luigi 08 March 2011 (has links)
L’assimilation variationnelle de données 4D-Var est une technique très utilisée en géophysique, notamment en météorologie et océanographie. Elle consiste à estimer des paramètres d’un modèle numérique direct, en minimisant une fonction de coût mesurant l’écart entre les sorties du modèle et les mesures observées. La minimisation, qui est basée sur une méthode de gradient, nécessite le calcul du modèle adjoint (produit de la transposée de la matrice jacobienne avec le vecteur dérivé de la fonction de coût aux points d’observation). Lors de la mise en œuvre de l’AD 4D-Var, il faut faire face à des problèmes d’implémentation informatique complexes, notamment concernant le modèle adjoint, la parallélisation du code et la gestion efficace de la mémoire. Afin d’aider au développement d’applications d’AD 4D-Var, le logiciel YAO qui a été développé au LOCEAN, propose de modéliser le modèle direct sous la forme d’un graphe de flot de calcul appelé graphe modulaire. Les modules représentent des unités de calcul et les arcs décrivent les transferts des données entre ces modules. YAO est doté de directives de description qui permettent à un utilisateur de décrire son modèle direct, ce qui lui permet de générer ensuite le graphe modulaire associé à ce modèle. Deux algorithmes, le premier de type propagation sur le graphe et le second de type rétropropagation sur le graphe permettent, respectivement, de calculer les sorties du modèle direct ainsi que celles de son modèle adjoint. YAO génère alors le code du modèle direct et de son adjoint. En plus, il permet d’implémenter divers scénarios pour la mise en œuvre de sessions d’assimilation.Au cours de cette thèse, un travail de recherche en informatique a été entrepris dans le cadre du logiciel YAO. Nous avons d’abord formalisé d’une manière plus générale les spécifications deYAO. Par la suite, des algorithmes permettant l’automatisation de certaines tâches importantes ont été proposés tels que la génération automatique d’un parcours “optimal” de l’ordre des calculs et la parallélisation automatique en mémoire partagée du code généré en utilisant des directives OpenMP. L’objectif à moyen terme, des résultats de cette thèse, est d’établir les bases permettant de faire évoluer YAO vers une plateforme générale et opérationnelle pour l’assimilation de données 4D-Var, capable de traiter des applications réelles et de grandes tailles. / Variational data assimilation 4D-Var is a well-known technique used in geophysics, and in particular in meteorology and oceanography. This technique consists in estimating the control parameters of a direct numerical model, by minimizing a cost function which measures the misfit between the forecast values and some actual observations. The minimization, which is based on a gradient method, requires the computation of the adjoint model (product of the transpose Jacobian matrix and the derivative vector of the cost function at the observation points). In order to perform the 4DVar technique, we have to cope with complex program implementations, in particular concerning the adjoint model, the parallelization of the code and an efficient memory management. To address these difficulties and to facilitate the implementation of 4D-Var applications, LOCEAN is developing the YAO framework. YAO proposes to represent a direct model with a computation flow graph called modular graph. Modules depict computation units and edges between modules represent data transfer. Description directives proper to YAO allow a user to describe its direct model and to generate the modular graph associated to this model. YAO contains two core algorithms. The first one is a forward propagation algorithm on the graph that computes the output of the numerical model; the second one is a back propagation algorithm on the graph that computes the adjoint model. The main advantage of the YAO framework, is that the direct and adjoint model programming codes are automatically generated once the modular graph has been conceived by the user. Moreover, YAO allows to cope with many scenarios for running different data assimilation sessions.This thesis introduces a computer science research on the YAO framework. In a first step, we have formalized in a more general way the existing YAO specifications. Then algorithms allowing the automatization of some tasks have been proposed such as the automatic generation of an “optimal” computational ordering and the automatic parallelization of the generated code on shared memory architectures using OpenMP directives. This thesis permits to lay the foundations which, at medium term, will make of YAO a general and operational platform for data assimilation 4D-Var, allowing to process applications of high dimensions.
53

Simulações numéricas 3D em ambiente paralelo de hipertermia com nanopartículas magnéticas

Reis, Ruy Freitas 05 November 2014 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-02-24T15:43:42Z No. of bitstreams: 1 ruyfreitasreis.pdf: 10496081 bytes, checksum: 05695a7e896bd684b83ab5850df95449 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-06T19:28:45Z (GMT) No. of bitstreams: 1 ruyfreitasreis.pdf: 10496081 bytes, checksum: 05695a7e896bd684b83ab5850df95449 (MD5) / Made available in DSpace on 2017-03-06T19:28:45Z (GMT). No. of bitstreams: 1 ruyfreitasreis.pdf: 10496081 bytes, checksum: 05695a7e896bd684b83ab5850df95449 (MD5) Previous issue date: 2014-11-05 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Este estudo tem como objetivo a modelagem numérica do tratamento de tumores sólidos com hipertermia utilizando nanopartículas magnéticas, considerando o modelo tridimensional de biotransferência de calor proposto por Pennes (1948). Foram comparadas duas diferentes possibilidades de perfusão sanguínea, a primeira constante e, a segunda, dependente da temperatura. O tecido é modelado com as camadas de pele, gordura e músculo, além do tumor. Para encontrar a solução aproximada do modelo foi aplicado o método das diferenças finitas (MDF) em um meio heterogêneo. Devido aos diferentes parâmetros de perfusão, foram obtidos sistemas de equações lineares (perfusão constante) e não lineares (perfusão dependente da temperatura). No domínio do tempo foram utilizados dois esquemas numéricos explícitos, o primeiro utilizando o método clássico de Euler e o segundo um algoritmo do tipo preditor-corretor adaptado dos métodos de integração generalizada da família-alpha trapezoidal. Uma vez que a execução de um modelo tridimensional demanda um alto custo computacional, foram empregados dois esquemas de paralelização do método numérico, o primeiro baseado na API de programação paralela OpenMP e o segundo com a plataforma CUDA. Os resultados experimentais mostraram que a paralelização em OpenMP obteve aceleração de até 39 vezes comparada com a versão serial, e, além disto, a versão em CUDA também foi eficiente, obtendo um ganho de 242 vezes, também comparando-se com o tempo de execução sequencial. Assim, o resultado da execução é obtido cerca de duas vezes mais rápido do que o fenômeno biológico. / This work deals with the numerical modeling of solid tumor treatments with hyperthermia using magnetic nanoparticles considering a 3D bioheat transfer model proposed by Pennes(1948). Two different possibilities of blood perfusion were compared, the first assumes a constant value, and the second one a temperature-dependent function. The living tissue was modeled with skin, fat and muscle layers, in addition to the tumor. The model solution was approximated with the finite difference method (FDM) in an heterogeneous medium. Due to different blood perfusion parameters, a system of linear equations (constant perfusion), and a system of nonlinear equations (temperaturedependent perfusion) were obtained. To discretize the time domain, two explicit numerical strategies were used, the first one was using the classical Euler method, and the second one a predictor-corrector algorithm originated from the generalized trapezoidal alpha-family of time integration methods. Since the computational time required to solve a threedimensional model is large, two different parallel strategies were applied to the numerical method. The first one uses the OpenMP parallel programming API, and the second one the CUDA platform. The experimental results showed that the parallelization using OpenMP improves the performance up to 39 times faster than the sequential execution time, and the CUDA version was also efficient, yielding gains up to 242 times faster than the sequential execution time. Thus, this result ensures an execution time twice faster than the biological phenomenon.
54

PROGRAMAÇÃO PARALELA HÍBRIDA PARA CPU E GPU: UMA AVALIAÇÃO DO OPENACC FRENTE A OPENMP E CUDA / HYBRID PARALLEL PROGRAMMING FOR CPU AND GPU: AN EVALUATION OF OPENACC AS RELATED TO OPENMP AND CUDA

Sulzbach, Maurício 22 August 2014 (has links)
As a consequence of the CPU and GPU's architectures advance, in the last years there was a raise of the number of parallel programming APIs for both devices. While OpenMP is used to make parallel programs for the CPU, CUDA and OpenACC are employed in the parallel processing in the GPU. In the programming for the GPU, CUDA presents a model based on functions that make the source code extensive and prone to errors, in addition to leading to low development productivity. OpenACC emerged aiming to solve these problems and to be an alternative to the utilization of CUDA. Similar to OpenMP, this API has policies that ease the development of parallel applications that run on the GPU only. To further increase performance and take advantage of the parallel aspects of both CPU and GPU, it is possible to develop hybrid algorithms that split the processing on the two devices. In that sense, the main objective of this work is to verify if the advantages that OpenACC introduces are also positively reflected on the hybrid programming using OpenMP, if compared to the OpenMP + CUDA model. A second objective of this work is to identify aspects of the two programming models that could limit the performance or on the applications' development. As a way to accomplish these goals, this work presents the development of three hybrid parallel algorithms that are based on the Rodinia's benchmark algorithms, namely, RNG, Hotspot and SRAD, using the hybrid models OpenMP + CUDA and OpenMP + OpenACC. In these algorithms, the CPU part of the code is programmed using OpenMP, while it's assigned for the CUDA and OpenACC the parallel processing on the GPU. After the execution of the hybrid algorithms, the performance, efficiency and the processing's splitting in each one of the devices were analyzed. It was verified, through the hybrid algorithms' runs, that, in the two proposed programming models it was possible to outperform the performance of a parallel application that runs on a single API and in only one of the devices. In addition to that, in the hybrid algorithms RNG and Hotspot, CUDA's performance was superior to that of OpenACC, while in the SRAD algorithm OpenACC was faster than CUDA. / Como consequência do avanço das arquiteturas de CPU e GPU, nos últimos anos houve um aumento no número de APIs de programação paralela para os dois dispositivos. Enquanto que OpenMP é utilizada no processamento paralelo em CPU, CUDA e OpenACC são empregadas no processamento paralelo em GPU. Na programação para GPU, CUDA apresenta um modelo baseado em funções que deixam o código fonte extenso e propenso a erros, além de acarretar uma baixa produtividade no desenvolvimento. Objetivando solucionar esses problemas e sendo uma alternativa à utilização de CUDA surgiu o OpenACC. Semelhante ao OpenMP, essa API disponibiliza diretivas que facilitam o desenvolvimento de aplicações paralelas, porém para execução em GPU. Para aumentar ainda mais o desempenho e tirar proveito da capacidade de paralelismo de CPU e GPU, é possível desenvolver algoritmos híbridos que dividam o processamento nos dois dispositivos. Nesse sentido, este trabalho objetiva verificar se as facilidades que o OpenACC introduz também refletem positivamente na programação híbrida com OpenMP, se comparado ao modelo OpenMP + CUDA. Além disso, o trabalho visa relatar as limitações nos dois modelos de programação híbrida que possam influenciar no desempenho ou no desenvolvimento de aplicações. Como forma de cumprir essas metas, este trabalho apresenta o desenvolvimento de três algoritmos paralelos híbridos baseados nos algoritmos do benchmark Rodinia, a saber, RNG, Hotspot e SRAD, utilizando os modelos híbridos OpenMP + CUDA e OpenMP + OpenACC. Nesses algoritmos é atribuída ao OpenMP a execução paralela em CPU, enquanto que CUDA e OpenACC são responsáveis pelo processamento paralelo em GPU. Após as execuções dos algoritmos híbridos foram analisados o desempenho, a eficiência e a divisão da execução em cada um dos dispositivos. Verificou-se através das execuções dos algoritmos híbridos que nos dois modelos de programação propostos foi possível superar o desempenho de uma aplicação paralela em uma única API, com execução em apenas um dos dispositivos. Além disso, nos algoritmos híbridos RNG e Hotspot o desempenho de CUDA foi superior ao desempenho de OpenACC, enquanto que no algoritmo SRAD a API OpenACC apresentou uma execução mais rápida, se comparada à API CUDA.
55

Loop parallelization in the cloud using OpenMP and MapReduce / Paralelização de laços na nuvem usando OpenMP e MapReduce

Wottrich, Rodolfo Guilherme, 1990- 04 September 2014 (has links)
Orientadores: Guido Costa Souza de Araújo, Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-24T12:44:05Z (GMT). No. of bitstreams: 1 Wottrich_RodolfoGuilherme_M.pdf: 2132128 bytes, checksum: b8ac1197909b6cdaf96b95d6097649f3 (MD5) Previous issue date: 2014 / Resumo: A busca por paralelismo sempre foi um importante objetivo no projeto de sistemas computacionais, conduzida principalmente pelo constante interesse na redução de tempos de execução de aplicações. Programação paralela é uma área de pesquisa ativa, na qual o interesse tem crescido devido à emergência de arquiteturas multicore. Por outro lado, aproveitar as grandes capacidades de computação e armazenamento da nuvem e suas características desejáveis de flexibilidade e escalabilidade oferece várias oportunidades interessantes para abordar problemas de pesquisa relevantes em computação científica. Infelizmente, em muitos casos a implementação de aplicações na nuvem demanda conhecimento específico de interfaces de programação paralela e APIs, o que pode se tornar um fardo na programação de aplicações complexas. Para superar tais limitações, neste trabalho propomos OpenMR, um modelo de execução baseado na sintaxe e nos princípios da API OpenMP que facilita a tarefa de programar sistemas distribuídos (isto é, clusters locais ou a nuvem remota). Especificamente, este trabalho aborda o problema de executar a paralelização de laços, usando OpenMR, em um ambiente distribuído, através do mapeamento de iterações do laço para nós MapReduce. Assim, a interface de programação para a nuvem se torna a própria linguagem, livrando o desenvolvedor da tarefa de se preocupar com detalhes da distribuição de cargas de trabalho e dados. Para avaliar a validade da proposta, modificamos benchmarks da suite SPEC OMP2012 para se encaixarem no modelo proposto, desenvolvemos outros toy benchmarks que são I/O-bound e executamo-os em duas configurações: (a) um cluster de computadores disponível localmente através de uma LAN padrão; e (b) clusters disponíveis remotamente através dos serviços Amazon AWS. Comparamos os resultados com a execução utilizando OpenMP em uma arquitetura SMP e mostramos que a técnica de paralelização proposta é factível e demonstra boa escalabilidade / Abstract: The pursuit of parallelism has always been an important goal in the design of computer systems, driven mainly by the constant interest in reducing program execution time. Parallel programming is an active research area, which has grown in interest due to the emergence of multicore architectures. On the other hand, harnessing the large computing and storage capabilities of the cloud and its desirable flexibility and scaling features offers a number of interesting opportunities to address some relevant research problems in scientific computing. Unfortunately, in many cases the implementation of applications on the cloud demands specific knowledge of parallel programming interfaces and APIs, which may become a burden when programming complex applications. To overcome such limitations, in this work we propose OpenMR, an execution model based on the syntax and principles of the OpenMP API which eases the task of programming distributed systems (i.e. local clusters or remote cloud). Specifically, this work addresses the problem of performing loop parallelization, using OpenMR, in a distributed environment, through the mapping of loop iterations to MapReduce nodes. By doing so, the cloud programming interface becomes the programming language itself, freeing the developer from the task of worrying about the details of distributing workload and data. To assess the validity of the proposal, we modified benchmarks from the SPEC OMP2012 suite to fit the proposed model, developed other I/O-bound toy benchmarks and executed them in two settings: (a) a computer cluster locally available through a standard LAN; and (b) clusters remotely available through the Amazon AWS services. We compare the results to the execution using OpenMP in an SMP architecture and show that the proposed parallelization technique is feasible and demonstrates good scalability / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
56

Traitement de données multi-spectrales par calcul intensif et applications chez l'homme en imagerie par résonnance magnétique nucléaire / Processing of multi-spectral data by high performance computing and its applications on human nuclear magnetic resonance imaging

Angeletti, Mélodie 21 February 2019 (has links)
L'imagerie par résonance magnétique fonctionnelle (IRMf) étant une technique non invasive pour l'étude de cerveau, elle a été employée pour comprendre les mécanismes cérébraux sous-jacents à la prise alimentaire. Cependant, l'utilisation de stimuli liquides pour simuler la prise alimentaire engendre des difficultés supplémentaires par rapport aux stimulations visuellement habituellement mises en œuvre en IRMf. L'objectif de cette thèse a donc été de proposer une méthode robuste d'analyse des données tenant compte de la spécificité d'une stimulation alimentaire. Pour prendre en compte le mouvement dû à la déglutition, nous proposons une méthode de censure fondée uniquement sur le signal mesuré. Nous avons de plus perfectionné l'étape de normalisation des données afin de réduire la perte de signal. La principale contribution de cette thèse est d'implémenter l'algorithme de Ward de sorte que parcelliser l'ensemble du cerveau soit réalisable en quelques heures et sans avoir à réduire les données au préalable. Comme le calcul de la distance euclidienne entre toutes les paires de signaux des voxels représente une part importante de l'algorithme de Ward, nous proposons un algorithme cache-aware du calcul de la distance ainsi que trois parallélisations sur les architectures suivantes : architecture à mémoire partagée, architecture à mémoire distribuée et GPU NVIDIA. Une fois l'algorithme de Ward exécuté, il est possible d'explorer toutes les échelles de parcellisation. Nous considérons plusieurs critères pour évaluer la qualité de la parcellisation à une échelle donnée. À une échelle donnée, nous proposons soit de calculer des cartes de connectivités entre les parcelles, soit d'identifier les parcelles répondant à la stimulation à l'aide du coefficient de corrélation de Pearson. / As a non-invasive technology for studying brain imaging, functional magnetic resonance imaging (fMRI) has been employed to understand the brain underlying mechanisms of food intake. Using liquid stimuli to fake food intake adds difficulties which are not present in fMRI studies with visual stimuli. This PhD thesis aims to propose a robust method to analyse food stimulated fMRI data. To correct the data from swallowing movements, we have proposed to censure the data uniquely from the measured signal. We have also improved the normalization step of data between subjects to reduce signal loss.The main contribution of this thesis is the implementation of Ward's algorithm without data reduction. Thus, clustering the whole brain in several hours is now feasible. Because Euclidean distance computation is the main part of Ward algorithm, we have developed a cache-aware algorithm to compute the distance between each pair of voxels. Then, we have parallelized this algorithm for three architectures: shared-memory architecture, distributed memory architecture and NVIDIA GPGPU. Once Ward's algorithm has been applied, it is possible to explore multi-scale clustering of data. Several criteria are considered in order to evaluate the quality of clusters. For a given number of clusters, we have proposed to compute connectivity maps between clusters or to compute Pearson correlation coefficient to identify brain regions activated by the stimulation.
57

Benchmarking of Vision-Based Prototyping and Testing Tools

Balasubramanian, ArunKumar 21 September 2017 (has links)
The demand for Advanced Driver Assistance System (ADAS) applications is increasing day by day and their development requires efficient prototyping and real time testing. ADTF (Automotive Data and Time Triggered Framework) is a software tool from Elektrobit which is used for Development, Validation and Visualization of Vision based applications, mainly for ADAS and Autonomous driving. With the help of ADTF tool, Image or Video data can be recorded and visualized and also the testing of data can be processed both on-line and off-line. The development of ADAS applications needs image and video processing and the algorithm has to be highly efficient and must satisfy Real-time requirements. The main objective of this research would be to integrate OpenCV library with ADTF cross platform. OpenCV libraries provide efficient image processing algorithms which can be used with ADTF for quick benchmarking and testing. An ADTF filter framework has been developed where the OpenCV algorithms can be directly used and the testing of the framework is carried out with .DAT and image files with a modular approach. CMake is also explained in this thesis to build the system with ease of use. The ADTF filters are developed in Microsoft Visual Studio 2010 in C++ and OpenMP API are used for Parallel programming approach.
58

Formalisation et automatisation de YAO, générateur de code pour l’assimilation variationnelle de données / Formalisation and automation of YAO, code generator for variational data assimilation

Nardi, Luigi 08 March 2011 (has links)
L’assimilation variationnelle de données 4D-Var est une technique très utilisée en géophysique, notamment en météorologie et océanographie. Elle consiste à estimer des paramètres d’un modèle numérique direct, en minimisant une fonction de coût mesurant l’écart entre les sorties du modèle et les mesures observées. La minimisation, qui est basée sur une méthode de gradient, nécessite le calcul du modèle adjoint (produit de la transposée de la matrice jacobienne avec le vecteur dérivé de la fonction de coût aux points d’observation). Lors de la mise en œuvre de l’AD 4D-Var, il faut faire face à des problèmes d’implémentation informatique complexes, notamment concernant le modèle adjoint, la parallélisation du code et la gestion efficace de la mémoire. Afin d’aider au développement d’applications d’AD 4D-Var, le logiciel YAO qui a été développé au LOCEAN, propose de modéliser le modèle direct sous la forme d’un graphe de flot de calcul appelé graphe modulaire. Les modules représentent des unités de calcul et les arcs décrivent les transferts des données entre ces modules. YAO est doté de directives de description qui permettent à un utilisateur de décrire son modèle direct, ce qui lui permet de générer ensuite le graphe modulaire associé à ce modèle. Deux algorithmes, le premier de type propagation sur le graphe et le second de type rétropropagation sur le graphe permettent, respectivement, de calculer les sorties du modèle direct ainsi que celles de son modèle adjoint. YAO génère alors le code du modèle direct et de son adjoint. En plus, il permet d’implémenter divers scénarios pour la mise en œuvre de sessions d’assimilation.Au cours de cette thèse, un travail de recherche en informatique a été entrepris dans le cadre du logiciel YAO. Nous avons d’abord formalisé d’une manière plus générale les spécifications deYAO. Par la suite, des algorithmes permettant l’automatisation de certaines tâches importantes ont été proposés tels que la génération automatique d’un parcours “optimal” de l’ordre des calculs et la parallélisation automatique en mémoire partagée du code généré en utilisant des directives OpenMP. L’objectif à moyen terme, des résultats de cette thèse, est d’établir les bases permettant de faire évoluer YAO vers une plateforme générale et opérationnelle pour l’assimilation de données 4D-Var, capable de traiter des applications réelles et de grandes tailles. / Variational data assimilation 4D-Var is a well-known technique used in geophysics, and in particular in meteorology and oceanography. This technique consists in estimating the control parameters of a direct numerical model, by minimizing a cost function which measures the misfit between the forecast values and some actual observations. The minimization, which is based on a gradient method, requires the computation of the adjoint model (product of the transpose Jacobian matrix and the derivative vector of the cost function at the observation points). In order to perform the 4DVar technique, we have to cope with complex program implementations, in particular concerning the adjoint model, the parallelization of the code and an efficient memory management. To address these difficulties and to facilitate the implementation of 4D-Var applications, LOCEAN is developing the YAO framework. YAO proposes to represent a direct model with a computation flow graph called modular graph. Modules depict computation units and edges between modules represent data transfer. Description directives proper to YAO allow a user to describe its direct model and to generate the modular graph associated to this model. YAO contains two core algorithms. The first one is a forward propagation algorithm on the graph that computes the output of the numerical model; the second one is a back propagation algorithm on the graph that computes the adjoint model. The main advantage of the YAO framework, is that the direct and adjoint model programming codes are automatically generated once the modular graph has been conceived by the user. Moreover, YAO allows to cope with many scenarios for running different data assimilation sessions.This thesis introduces a computer science research on the YAO framework. In a first step, we have formalized in a more general way the existing YAO specifications. Then algorithms allowing the automatization of some tasks have been proposed such as the automatic generation of an “optimal” computational ordering and the automatic parallelization of the generated code on shared memory architectures using OpenMP directives. This thesis permits to lay the foundations which, at medium term, will make of YAO a general and operational platform for data assimilation 4D-Var, allowing to process applications of high dimensions.
59

Integrating SkePU's algorithmic skeletons with GPI on a cluster / Integrering av SkePUs algoritmiska skelett med GPI på ett cluster

Almqvist, Joel January 2022 (has links)
As processors' clock-speed flattened out in the early 2000s, multi-core processors became more prevalent and so did parallel programming. However this programming paradigm introduces additional complexities, and to combat this, the SkePU framework was created. SkePU does this by offering a single-threaded interface which executes the user's code in parallel in accordance to a chosen computational pattern. Furthermore it allows the user themselves to decide which parallel backend should perform the execution, be it OpenMP, CUDA or OpenCL. This modular approach of SkePU thus allows for different hardware to be used without changing the code, and it currently supports CPUs, GPUs and clusters. This thesis presents a new so-called SkePU-backend made for clusters, using the communication library GPI. It demonstrates that the new backend is able to scale better and handle workload imbalances better than the existing SkePU-cluster-backend. This is achieved despite it performing worse at low node amounts, indicating that it requires less scaling overhead. Its weaknesses are also analyzed, partially from a design point of view, and clear solutions are presented, combined with a discussion as to why they arose in the first place.
60

Pthreads and OpenMP : A  performance and productivity study

Swahn, Henrik January 2016 (has links)
Today most computer have a multicore processor and are depending on parallel execution to be able to keep up with the demanding tasks that exist today, that forces developers to write software that can take advantage of multicore systems. There are multiple programming languages and frameworks that makes it possible to execute the code in parallel on different threads, this study looks at the performance and effort required to work with two of the frameworks that are available to the C programming language, POSIX Threads(Pthreads) and OpenMP. The performance is measured by paralleling three algorithms, Matrix multiplication, Quick Sort and calculation of the Mandelbrot set using both Pthreads and OpenMP, and comparing first against a sequential version and then the parallel version against each other. The effort required to modify the sequential program using OpenMP and Pthreads is measured in number of lines the final source code has. The results shows that OpenMP does perform better than Pthreads in Matrix Multiplication and Mandelbrot set calculation but not on Quick Sort because OpenMP has problem with recursion and Pthreads does not. OpenMP wins the effort required on all the tests but because there is a large performance difference between OpenMP and Pthreads on Quick Sort OpenMP cannot be recommended for paralleling Quick Sort or other recursive programs.

Page generated in 0.0631 seconds