Global ETD Search

371	Comparación del uso de GPGPU y cluster de multicore en problemas con alta demanda computacional Montes de Oca, Erica January 2012 (has links) La presente Tesina de Grado tiene como objetivo la investigación y el estudio de las plataformas de memoria compartida GPU y cluster de Multicore para la resolución de problemas con alta demanda computacional. Se presentan soluciones al problema planteado con el fin de comparar rendimiento en sus versiones secuencial, paralela con memoria compartida, paralela con pasaje de mensajes, paralela híbrida y paralela en GPU. Se analiza la bondad de las soluciones en relación al tiempo de ejecución y aceleración, y se introduce el análisis de consumo energético. programación paralela multicore cluster de multicore GPU GPGPU CUDA problemas de alta demanda computacional N-Body Parallel programming Clustering Information Systems Ciencias Informáticas
372	Sparsity, redundancy and robustness in artificial neural networks for learning and memory / Parcimonie, redondance et robustesse dans les réseaux de neurones artificiels pour l'apprentissage et la mémoire Tigreat, Philippe 16 October 2017 (has links) L'objectif de la recherche en Intelligence Artificielle (IA) est de répliquer les capacités cognitives humaines au moyen des ordinateurs modernes. Les résultats de ces dernières années semblent annoncer une révolution technologique qui pourrait changer profondément la société. Nous focalisons notre intérêt sur deux aspects cognitifs fondamentaux, l'apprentissage et la mémoire. Les mémoires associatives offrent la possibilité de stocker des éléments d'information et de les récupérer à partir d'une partie de leur contenu, et imitent ainsi la mémoire cérébrale. L'apprentissage profond permet de passer d'une perception analogique du monde extérieur à une représentation parcimonieuse et plus compacte. Dans le chapitre 2, nous présentons une mémoire associative inspirée des réseaux de Willshaw, avec une connectivité contrainte. Cela augmente la performance de récupération des messages et l'efficacité du stockage de l'information.Dans le chapitre 3, une architecture convolutive a été appliquée sur une tâche de lecture de mots partiellement affichés dans des conditions similaires à une étude de psychologie sur des sujets humains. Cette expérimentation montre la similarité de comportement du réseau avec les sujets humains concernant différentes caractéristiques de l'affichage des mots.Le chapitre 4 introduit une méthode de représentation des catégories par des assemblées de neurones dans les réseaux profonds. Pour les problèmes à grand nombre de classes, cela permet de réduire significativement les dimensions d'un réseau.Le chapitre 5 décrit une méthode d'interfaçage des réseaux de neurones profonds non supervisés avec les mémoires associatives à cliques. / The objective of research in Artificial Intelligence (AI) is to reproduce human cognitive abilities by means of modern computers. The results of the last few years seem to announce a technological revolution that could profoundly change society. We focus our interest on two fundamental cognitive aspects, learning and memory. Associative memories offer the possibility to store information elements and to retrieve them using a sub-part of their content, thus mimicking human memory. Deep Learning allows to transition from an analog perception of the outside world to a sparse and more compact representation.In Chapter 2, we present a neural associative memory model inspired by Willshaw networks, with constrained connectivity. This brings an performance improvement in message retrieval and a more efficient storage of information.In Chapter 3, a convolutional architecture was applied on a task of reading partially displayed words under similar conditions as in a former psychology study on human subjects. This experiment put inevidence the similarities in behavior of the network with the human subjects regarding various properties of the display of words.Chapter 4 introduces a new method for representing categories usingneuron assemblies in deep networks. For problems with a large number of classes, this allows to reduce significantly the dimensions of a network.Chapter 5 describes a method for interfacing deep unsupervised networks with clique-based associative memories. Apprentissage machine Réseaux de neurones Codage Parcimonie Robustesse Artificial intelligence Neural networks Deep learning Machine learning Associative memory Classification Parallel programming Error-correcting codes 004
373	Débogage des systèmes embarqués multiprocesseur basé sur la ré-exécution déterministe et partielle / Deterministic and partial replay debugging of multiprocessor embedded systems Georgiev, Kiril 04 December 2012 (has links) Les plates-formes MPSoC permettent de satisfaire les contraintes de performance, de flexibilité et de consommation énergétique requises par les systèmes embarqués émergents. Elles intègrent un nombre important de processeurs, des blocs de mémoire et des périphériques, hiérarchiquement organisés par un réseau d'interconnexion. Le développement du logiciel est réputé difficile, notamment dû à la gestion d'un grand nombre d'entités (tâches/threads/processus). L'exécution concurrente de ces entités permet d'exploiter efficacement l'architecture mais complexifie le processus de mise au point et notamment l'analyse des erreurs. D'une part, les exécutions peuvent être non-déterministes notamment dû à la concurrence, c'est à dire qu'elles peuvent se dérouler d'une manière différente à chaque reprise. En conséquence, il n'est pas garanti qu'une erreur se produirait durant la phase de mise au point. D'autre part, la complexité de l'architecture et de l'exécution peut rendre trop important le nombre d'éléments à analyser afin d'identifier une erreur. Il pourrait donc être difficile de se focaliser sur des éléments potentiellement fautifs. Un des défis majeurs du développement logiciel MPSoC est donc de réduire le temps de la mise au point. Dans cette thèse, nous proposons une méthodologie de mise au point qui aide le développeur à identifier les erreurs dans le logiciel MPSoC. Notre premier objectif est de déboguer une même exécution plusieurs fois afin d'analyser des sources potentielles de l'erreur jusqu'à son identification. Nous avons donc identifié les sources de non-déterminisme MPSoC et proposé des mécanismes de ré-exécution déterministe les plus adaptés. Notre deuxième objectif vise à minimiser les ressources pour reproduire une exécution afin de satisfaire la contrainte MPSoC de maîtrise de l'intrusion. Nous avons donc utilisé des mécanismes efficaces de ré-exécution déterministe et considéré qu'une partie du comportement non-déterministe. Le troisième objectif est de permettre le passage à l'échelle, c'est à dire de déboguer des exécutions caractérisées par un nombre d'éléments de plus en plus croissant. Nous avons donc proposé une méthode qui permet de circonscrire et de déboguer qu'une partie de l'exécution. De plus, cette méthode s'applique aux différents types d'architectures et d'applications MPSoC. / MPSoC platforms provide high performance, low power consumption and flexi-bility required by the emerging embedded systems. They incorporate many proces-sing units, memory blocs and peripherals, hierarchically organized by interconnec-tion network. The software development is known to be difficult, namely due to themanagement of multiple entities (tasks/threads/processes). The concurrent execu-tion of these entities allows to exploit efficiently the architecture but complicatesthe refinement process of the software and especially the debugging activity. Onthe one hand, the executions of the software can be non-deterministic, namely dueto the concurrency, i.e. they perform differently each time. Consequently, thereis no guaranties that an error will occur during the debugging activity. On theother hand, the complexity of the architecture and the execution can increase theelements to be analyzed in the debugging process. As a result, it can be difficultto concentrate on the potentially faulty elements. Therefore, one of the most im-portant challenges in the development process of MPSoC software is to reduce thetime of the refinement process.In this thesis, we propose a new methodology to refine the MPSoC softwarewhich helps the developers to do the debugging activity. Our first objective is tobe able to debug the same execution several times in order to analyze potentialsources of the error. To do so, we identified the sources of non-determinism in theMPSoC software executions and propose the most appropriate methods to recordand replay them. Our second objective is to reduce the execution overhead requi-red by the record mechanisms to limit the intrusiveness which is an importantMPSoC constraint. To accomplish this objective, we consider a part of the non-deterministic behaviour and selected efficient record-replay methods. The thirdobjective is to provide a scalable solution, i.e. to be able to debug more and morecomplex executions, characterized by an increasing number of elements. Therefore,we propose a partial replay method which allows to isolate and debug a fraction ofthe execution elements. Moreover, this method applies to different types of archi-tectures and applications MPSoC. Applications embarqués Debogage Traçage Re-Exécution Observation Embeded systems Debugging Tracing Replay Profiling Observation Parallel programming Programming models 004
374	RedBlue: cluster para pesquisa e ensino em Engenharia Pedras, Marcelo Br?ulio 13 November 2017 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-01-31T18:35:38Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) marcelo_braulio_pedras.pdf: 2382099 bytes, checksum: 3edc0615e188d815d0a9d1a514edfb8f (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-02-03T12:04:59Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) marcelo_braulio_pedras.pdf: 2382099 bytes, checksum: 3edc0615e188d815d0a9d1a514edfb8f (MD5) / Made available in DSpace on 2018-02-03T12:04:59Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) marcelo_braulio_pedras.pdf: 2382099 bytes, checksum: 3edc0615e188d815d0a9d1a514edfb8f (MD5) Previous issue date: 2017 / Programas de computadores s?o muito utilizados para resolu??o de problemas complexos em engenharia. Atualmente, espera-se que um engenheiro saiba mais que apenas utiliz?-los, sendo esta habilidade muito valorizada no mercado de trabalho. Tal habilidade possibilita que profissionais consigam utilizar um maior conjunto de ferramentas para solucionar problemas. As simula??es computacionais, por exemplo, podem ser utilizadas como ferramenta de aquisi??o de conhecimento, permitindo que um profissional ou um estudante crie, teste e valide suas hip?teses. As simula??es tamb?m s?o utilizadas em pesquisas cient?ficas como alternativa a experimentos de dif?cil obten??o e na ind?stria para reduzir custos. Por?m, uma simula??o pode consumir mais recursos do que os dispon?veis em um computador, tornando seu tempo de execu??o invi?vel. Uma forma barata de se obter mais desempenho ? utilizando um cluster de computadores comuns. Dessa forma, seria poss?vel utilizar os laborat?rios de inform?tica dispon?veis para execut?-las. Entretanto, isso implicaria em conhecimentos aprofundados em computa??o paralela e/ou distribu?da por parte dos usu?rios, dificultado o desenvolvimento de aplica??es. Com o objetivo de minimizar o tempo de execu??o de simula??es complexas utilizando clusters e permitir que usu?rios com poucos conhecimentos em programa??o paralela e/ou distribu?da possam utiliz?-lo, este trabalho apresenta uma solu??o denominada ?plataforma RedBlue?. Essa plataforma recebe a aplica??o do usu?rio e a executa nos n?s do cluster de forma autom?tica e transparente para o mesmo. Para testar a plataforma desenvolvida foram realizados testes com redes neurais artificiais e com um algoritmo gen?tico simples, ambos buscando descobrir a melhor configura??o de par?metros para determinado problema. Utilizaram-se 60 m?quinas de um laborat?rio de inform?tica para testar a plataforma. Os resultados mostram que houve uma redu??o de at? 98% no tempo de execu??o do experimento com redes neurais e 99,3% para o experimento com o algoritmo gen?tico em compara??o a execu??o sequencial. Esses resultados indicam que a plataforma ? vi?vel para utiliza??o em laborat?rios de inform?tica, possibilitando uma redu??o consider?vel no tempo de execu??o de simula??es complexas. A plataforma ? aplic?vel a um n?mero flex?vel de computadores, ajustando-se ? capacidade dos laborat?rios. Al?m disso, pode ser utilizada como instrumento ?til ao ensino e pesquisa. Ressalta-se que a utiliza??o de simula??es computacionais para ensino e pesquisa contribui n?o apenas para a aprendizagem de conte?dos, mas tamb?m para o surgimento de habilidades necess?rias ao mercado de trabalho do engenheiro. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / Computer programs are commonly used to solve complex engineering problems, and it is expected from an engineer a more than hands-on experience in using these computer programs with the ability to develop them using a wide range of tools. Computational simulations, for instance, can be used as tools for knowledge acquisition allowing a professional or student to create, test and validate their hypotheses. Such simulations are used at an academic setting as an alternative to expensive experiments. However, a simulation can take more resources than those available in a single computer machine, rendering long execution times. To create a cluster of regular computers, such as the ones already available at computer labs, is a cheaper alternative to improve such execution times. One major drawback of this approach is that the user must be knowledgeable in parallel and distributed programming, which makes software development harder. To overcome such constraints, this work presents a solution named ?RedBlue platform?that receives and runs user?s applications over a computer cluster in an automatic, transparent manner. To test the RedBlue platform, we performed a set of tests via artificial neural networks and a simplified genetic algorithm, whose main purpose was to search for the best-suited parameter configurations for the application problem at hand. To test the platform, the experiments were run using 60 computer machines from a computer lab. This study has identified a reduction in execution times of 98% for neural networks, and a reduction of 99,3% for the genetic algorithm, and also shown that the platform is suited for real-world applications of simulations at computer labs. Furthermore, the platform accepts a variable number of computers, easily adaptable to different academic environments, such as research and training. Lastly, we have noted that computational simulations not only contribute to research and learning, but also to develop the required industry skills. Simula??es computacionais Cluster de computadores Educa??o e pesquisa Programa??o paralela Programa??o distribu?da Computational simulation Computer cluster Education and research Parallel programming Distributed programming
375	Solveur parallèle pour l’équation de Poisson sur mailles superposées et hiérarchiques, dans le cadre du langage Python / Parallel solver for the Poisson equation on a hierarchy of superimposed meshes, under a Python framework Tesser, Federico 11 September 2018 (has links) Les discrétisations adaptatives sont importantes dans les problèmes de fluxcompressible/incompressible puisqu'il est souvent nécessaire de résoudre desdétails sur plusieurs niveaux, en permettant de modéliser de grandes régionsd'espace en utilisant un nombre réduit de degrés de liberté (et en réduisant letemps de calcul).Il existe une grande variété de méthodes de discrétisation adaptative, maisles grilles cartésiennes sont les plus efficaces, grâce à leurs stencilsnumériques simples et précis et à leurs performances parallèles supérieures.Et telles performance et simplicité sont généralement obtenues en appliquant unschéma de différences finies pour la résolution des problèmes, mais cetteapproche de discrétisation ne présente pas, au contraire, un chemin faciled'adaptation.Dans un schéma de volumes finis, en revanche, nous pouvons incorporer différentstypes de maillages, plus appropriées aux raffinements adaptatifs, en augmentantla complexité sur les stencils et en obtenant une plus grande flexibilité.L'opérateur de Laplace est un élément essentiel des équations de Navier-Stokes,un modèle qui gouverne les écoulements de fluides, mais il se produit égalementdans des équations différentielles qui décrivent de nombreux autres phénomènesphysiques, tels que les potentiels électriques et gravitationnels. Il s'agitdonc d'un opérateur différentiel très important, et toutes les études qui ontété effectuées sur celui-ci, prouvent sa pertinence.Dans ce travail seront présentés des approches de différences finies et devolumes finis 2D pour résoudre l'opérateur laplacien, en appliquant des patchsde grilles superposées où un niveau plus fin est nécessaire, en laissant desmaillages plus grossiers dans le reste du domaine de calcul.Ces grilles superposées auront des formes quadrilatérales génériques.Plus précisément, les sujets abordés seront les suivants:1) introduction à la méthode des différences finies, méthode des volumes finis,partitionnement des domaines, approximation de la solution;2) récapitulatif des différents types de maillages pour représenter de façondiscrète la géométrie impliquée dans un problème, avec un focussur la structure de données octree, présentant PABLO et PABLitO. Le premier estune bibliothèque externe utilisée pour gérer la création de chaque grille,l'équilibrage de charge et les communications internes, tandis que la secondeest l'API Python de cette bibliothèque, écrite ad hoc pour le projet en cours;3) la présentation de l'algorithme utilisé pour communiquer les données entreles maillages (en ignorant chacune l'existence de l'autre) en utilisant lesintercommunicateurs MPI et la clarification de l'approche monolithique appliquéeà la construction finale de la matrice pour résoudre le système, en tenantcompte des blocs diagonaux, de restriction et de prolongement;4) la présentation de certains résultats; conclusions, références.Il est important de souligner que tout est fait sous Python comme framework deprogrammation, en utilisant Cython pour l'écriture de PABLitO, MPI4Py pour lescommunications entre grilles, PETSc4py pour les parties assemblage et résolutiondu système d'inconnues, NumPy pour les objets à mémoire continue.Le choix de ce langage de programmation a été fait car Python, facile àapprendre et à comprendre, est aujourd'hui un concurrent significatif pourl'informatique numérique et l'écosystème HPC, grâce à son style épuré, sespackages, ses compilateurs et pourquoi pas ses versions optimisées pour desarchitectures spécifiques. / Adaptive discretizations are important in compressible/incompressible flow problems since it is often necessary to resolve details on multiple levels,allowing large regions of space to be modeled using a reduced number of degrees of freedom (reducing the computational time).There are a wide variety of methods for adaptively discretizing space, but Cartesian grids have often outperformed them even at high resolutions due totheir simple and accurate numerical stencils and their superior parallel performances.Such performance and simplicity are in general obtained applying afinite-difference scheme for the resolution of the problems involved, but this discretization approach does not present, by contrast, an easy adapting path.In a finite-volume scheme, instead, we can incorporate different types of grids,more suitable for adaptive refinements, increasing the complexity on thestencils and getting a greater flexibility.The Laplace operator is an essential building block of the Navier-Stokes equations, a model that governs fluid flows, but it occurs also in differential equations that describe many other physical phenomena, such as electric and gravitational potentials, and quantum mechanics. So, it is a very importantdifferential operator, and all the studies carried out on it, prove itsrelevance.In this work will be presented 2D finite-difference and finite-volume approaches to solve the Laplacian operator, applying patches of overlapping grids where amore fined level is needed, leaving coarser meshes in the rest of the computational domain.These overlapping grids will have generic quadrilateral shapes.Specifically, the topics covered will be:1) introduction to the finite difference method, finite volume method, domainpartitioning, solution approximation;2) overview of different types of meshes to represent in a discrete way thegeometry involved in a problem, with a focuson the octree data structure, presenting PABLO and PABLitO. The first one is anexternal library used to manage each single grid’s creation, load balancing and internal communications, while the second one is the Python API ofthat library written ad hoc for the current project;3) presentation of the algorithm used to communicate data between meshes (beingall of them unaware of each other’s existence) using MPI inter-communicators and clarification of the monolithic approach applied building the finalmatrix for the system to solve, taking into account diagonal, restriction and prolongation blocks;4) presentation of some results; conclusions, references.It is important to underline that everything is done under Python as programmingframework, using Cython for the writing of PABLitO, MPI4Py for the communications between grids, PETSc4py for the assembling and resolution partsof the system of unknowns, NumPy for contiguous memory buffer objects.The choice of this programming language has been made because Python, easy to learn and understand, is today a significant contender for the numerical computing and HPC ecosystem, thanks to its clean style, its packages, its compilers and, why not, its specific architecture optimized versions. Python Différences finies Volumes finis Programmation parallèle Opérateur de Laplace Discrétisations adaptatives Python Finite differences Finite volumes Parallel programming Laplace operator Adaptive discretizations
376	Balanceamento de carga dinâmico em aglomerados de GPUs Sant'Ana, Luis Felipe January 2015 (has links) Orientador: Prof. Dr. Márcio Katsumi Oikawa / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2015. / Este trabalho utiliza conceitos de Séries Históricas e Método dos Mínimos Quadrados para realização de estudo evolutivo da Doença de Alzheimer. Com estas técnicas, foram elaboradas a apresentação do panorama atual de um grupo de pacientes e, posteriormente, a previsão de resultados a partir de dados históricos obtidos do exame neuropsicológico denonimado Mini Exame do Estado Mental. Foram geradas trajetórias representadas pela unidade tempo (em anos) de cada um dos pacientes contidos na base de dados. Os resultados sugerem que a modelagem por meio de Séries Históricas e Método dos Mínimos Quadrados pode ser considerada adequada para o acompanhamento e previsão da progressão/estagnação da Doença de Alzheimer. / This study attempted of the concepts of Time Series and the Least Squares Method for accomplishment of evolutive study of Alzheimer¿s disease. With these techniques were development the presentation of the current situation of a group of patients and subsequently the prediction from historical data results of neuropsychological test called Mini Mental State Examination. For each of the patients in the database, it was generated trajectories represented by unit time (in years). The findings suggests that the modeling using Time Series associated with the Least Squares Method can be considered suitable for monitoring and prediction of the progression/stagnation of the Alzheimer¿s disease. PROGRAMAÇÃO PARALELA PARALLEL PROGRAMMING (COMPUTER SCIENCE) LOAD BALANCING GPGPU
377	HPSM: uma API em linguagem c++ para programas com laços paralelos com suporte a multi-CPUs e Multi-GPUs / HPSM: a c++ API for parallel loops programs Supporting multi-CPUs and multi-GPUs Di Domenico, Daniel 21 December 2016 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Parallel architectures has been ubiquitous for some time now. However, the word ubiquitous can’t be applied to parallel programs, because there is a greater complexity to code them comparing to ordinary programs. This fact is aggravated when the programming also involves accelerators, like GPUs, which demand the use of tools with scpecific resources. Considering this setting, there are programming models that make easier the codification of parallel applications to explore accelerators, nevertheless, we don’t know APIs that allow implementing programs with parallel loops that can be processed simultaneously by multiple CPUs and multiple GPUs. This works presents a high-level C++ API called HPSM aiming to make easier and more efficient the codification of parallel programs intended to explore multi-CPU and multi-GPU architectures. Following this idea, the desire is to improve performance through the sum of resources. HPSM uses parallel loops and reductions implemented by three parallel back-ends, being Serial, OpenMP and StarPU. Our hypothesis estimates that scientific applications can explore heterogeneous processing in multi-CPU and multi-GPU to achieve a better performance than exploring just accelerators. Comparisons with other parallel programming interfaces demonstrated that HPSM can reduce a multi-CPU and multi-GPU code in more than 50%. The use of the new API can introduce impact to program performance, where experiments showed a variable overhead for each application, that can achieve a maximum value of 16,4%. The experimental results confirmed the hypothesis, because the N-Body, Hotspot e CFD applications achieved gains using just CPUs and just GPUs, as well as overcame the performance achieved by just accelerators (GPUs) through the combination of multi-CPU and multi-GPU. / Arquiteturas paralelas são consideradas ubíquas atualmente. No entanto, o mesmo termo não pode ser aplicado aos programas paralelos, pois existe uma complexidade maior para codificálos em relação aos programas convencionais. Este fato é agravado quando a programação envolve também aceleradores, como GPUs, que demandam o uso de ferramentas com recursos muito específicos. Neste cenário, apesar de existirem modelos de programação que facilitam a codificação de aplicações paralelas para explorar aceleradores, desconhece-se a existência de APIs que permitam a construção de programas com laços paralelos que possam ser processados simultaneamente em múltiplas CPUs e múltiplas GPUs. Este trabalho apresenta uma API C++ de alto nível, denominada HPSM, visando facilitar e tornar mais eficiente a codificação de programas paralelos voltados a explorar arquiteturas com multi-CPU e multi-GPU. Seguindo esta ideia, deseja-se ganhar desempenho através da soma dos recursos. A HPSM é baseada em laços e reduções paralelas implementadas por meio de três diferentes back-ends paralelos, sendo Serial, OpenMP e StarPU. A hipótese deste estudo é que aplicações científicas podem valer-se do processamento heterogêneo em multi-CPU e multi-GPU para alcançar um desempenho superior em relação ao uso de apenas aceleradores. Comparações com outras interfaces de programação paralela demonstraram que o uso da HPSM pode reduzir em mais de 50% o tamanho de um programa multi-CPU e multi-GPU. O uso da nova API pode trazer impacto no desempenho do programa, sendo que experimentos demonstraram que seu sobrecusto é variável de acordo com a aplicação, chegando até 16,4%. Os resultados experimentais confirmaram a hipótese, pois as aplicações N-Body, Hotspot e CFD, além de alcançarem ganhos ao utilizar somente CPUs e somente GPUs, também superaram o desempenho obtido por somente aceleradores (GPUs) através da combinação de multi-CPU e multi-GPU. API C++ Programação paralela Laços paralelos Computação heterogênea GPU C++ API Parallel programming Parallel loops Heterogenous computing
378	Algoritmo de evolução diferencial paralelo aplicado ao problema da predição da estrutura de proteínas utilizando o modelo AB em 2D e 3D Kalegari, Diego Humberto 18 October 2010 (has links) O problema da predição da estrutura de proteínas (PPEP) é bastante conhecido na bioinformática. A identificação da conformação nativa de uma proteína permite predizer a sua função no organismo. Este conhecimento também é útil no desenvolvimento de novos fármacos ou na compreensão do mecanismo de várias doenças. Várias técnicas tem sido propostas para resolver este problema. Porém, o alto custo envolvido levou ao surgimento de vários modelos que simplificam, em parte, as estruturas protéicas. No entanto, mesmo com os modelos mais simplificados, a complexidade do problema traz inúmeros desafios computacionais na busca da sua conformação nativa. Este trabalho utiliza o algoritmo evolucionário denominado Evolução Diferenciada (ED) para solucionar o PPEP, representando as proteínas com o modelo AB (toy model), em duas e três dimensões (2D e 3D). O trabalho apresenta a implementação de duas versões da ED, paralelizadas num ambiente de processo em cluster, com Message Passing Interface e arquitetura mestre-escravo. Para a configuração dos operadores do algoritmo de ED, foram realizados vários estudos com diferentes configurações para ambos os modelos, e análises estatísticas determinaram quais os melhores valores. Além disso, foram criados dois operadores especiais: dizimação e mutação espelhada. O primeiro poder ser considerado um operador genérico, que pode ser utilizado em qualquer problema; o segundo é específico para o problema em questão. Além do algoritmo de ED básico, também foi proposta uma versão auto-adaptável, em que alguns de seus parâmetros são atualizados no decorrer da evolução. Os experimentos realizados utilizaram 4 sequências de aminoácidos de benchmark geradas a partir da sequência de Fibonacci, contendo entre 13 e 55 aminoácidos. Os resultados dos algoritmos de ED paralelos foram comparados com os resultados obtidos em outros trabalhos. O algoritmo de ED é capaz de obter resultados excelentes, competitivos com os métodos especializados, apesar de não atingir o ótimo conhecido em algumas instâncias. Os resultados promissores obtidos nesse trabalho mostram que o algoritmo de ED é adequado para o problema. Em trabalhos futuros poderão ser estudados novos operadores especiais ou outras técnicas de inspiração biológica, buscando melhorar os resultados. / Protein structure prediction is a well-known problem in bioinformactis. Identifying protein native conformation makes it possible to predict its function within the organism. Knowing this also helps in the development of new medicines and in comprehending how some illnesses work and act. During the past year some techniques have been proposed to solve this problem, but its high cost made it necessary to build models that simplify the protein structures. However, even with the simplicity of these models identifying the protein native conformation remains a highly complex, computationally challenging problem. This paper uses an evolutionary algorithm known as Differential Evolution (DE) to solve the protein structure prediction problem. The model used to represent the protein structure is the Toy Model (also known as the AB Model) in both 2D and 3D. This work implements two versions of the ED algorithm using a parallel architecture (master-slave) based on Message Passing interface in a cluster. A large number of tests were executed to define the final configuration of the DE operators for both models. A new set of special operators were developed: explosion and mirror mutation. We can consider the first as generic, because it can be used in any problem. The second one is more specific because it requires previous knowledge of the problem. Of the two DE algorithm implemented, one is a basic DE algorithm and the second is a self-adaptive DE. All tests executed in this work used four benchmark amino acid sequences generated from the Fibonacci sequence. Each sequence has 13 to 55 amino acids. The results for both parallel DE algorithms using both 2D and 3D models were compared with other works. The DE algorithm achieved excellent results. It did not achieve the optimal known values for some sequences, but it was competitive with other specialized methods. Overall results encourage further research toward the use of knowledge-based operators and biologically inspired techniques to improve DE algorithm performance. Bioinformática Controle preditivo Programação paralela (Computação) Proteínas Peptídeos Computação evolutiva Engenharia elétrica Bioinformatics Predictive control Parallel programming (Computer science) Proteins Peptides Evolutionary computation Electric engineering
379	Mega busca harmônica: algoritmo de busca harmônica baseado em população e implementado em unidades de processamento gráfico Scalabrin, Marlon Henrique 31 March 2012 (has links) CAPES / Este trabalho propõe uma modificação da meta-heurística Busca Harmônica (HS) a partir de uma nova abordagem baseada em população, empregando, também, algumas estratégias inspiradas em outras meta-heurísticas. Este novo modelo foi implementado utilizando a arquitetura de programação paralela CUDA em uma GPU. O uso de placas de processamento gráficas (GPU) para processamento de propósito geral está crescendo, e estas têm sido utilizadas por muitos pesquisadores para processamento científico. Seu uso se mostra interessante para meta-heurísticas populacionais, podendo realizar muitas operações simultaneamente. A HS é uma meta-heurística inspirada no objetivo de um músico em buscar uma harmonia perfeita. modelo proposto incluiu-se uma população de harmonias temporárias que são geradas a cada nova iteração, permitindo a realização simultânea de diversas avaliações de função. Assim aumenta-se o grau de paralelismo da HS, possibilitando maiores ganhos de velocidade com o uso de arquiteturas paralelas. O novo modelo proposto executado em GPU foi denominado Mega Harmony Search (MHS). Na implementação em GPU cada passo do algoritmo é tratado individualmente em forma de kernels com configurações particulares para cada um. Para demonstrar a eficácia do modelo proposto foram selecionados alguns problemas de benchmark, como a otimização de estruturas de proteínas, a otimização de treliças e problemas matemáticos. Através de experimentos fatoriais foi identificado um conjunto de parâmetros padrão, o qual foi utilizado nos outros experimentos. As análises realizadas sobre resultados experimentais mostram que o MHS apresentou solução de qualidade equivalente à HS e ganhos de velocidade, com a sua execução em GPU, superiores a 60x quando comparado a implementação em CPU. Em trabalhos futuros poderão ser estudadas novas modificações ao algoritmo, como a implementação de nichos e estudos de estratégias de interação entre eles. / This work propose a new approach for the metaheuristic Harmonic Search (HS), by using a population of solutiona and other strategies inspired in another metaheuristics. This new model was implemented using a parallel architecture of a graphical processing unity (GPU). The use of GPU for general-purpose processing is growing, specially for scientific processing. Its use is particularly interesting for populational metaheuristics, where multiple operations are executed simultaneously. The HS is a metaheuristic inspired by the way jazz musicians search for a perfect harmony. In the proposed model a population of temporary harmonies was included. Such population was generated at each iteration, enabling simultaneous evaluation of the objective function being optimized, and thus, increasing the level of parallelism of HS. The new approach implemented in GPU was named Mega Harmony Search (MHS), and each step of the algorithm is handled in the form of kernels with particular configurations for each one. To show the efficiency of MHS some benchmark problems were selected for testing, including mathematical optimization problems, protein structure prediction, and truss structure optimization. Factorial experiments were done so as to find the best set of parameters for the MHS. The analyzes carried out on the experimental results show that the solutions provided by MHS have comparable quality to those of the simple Harmony Search. However, by using GPU, MHS achieved a speedup of 60x, compared with the implementation in regular CPU. Future work will focus other improvements in the algorithm, such as the use of niches and species, as well a study of the interactions between them. Análise harmônica Heurística Algorítmos Otimização combinatória Computação gráfica Arquitetura de computador Programação paralela (Computação) Harmonic analysis Heuristic Algorithms Combinatorial optimization Computer graphics Computer architecture Parallel programming (Computer science)
380	Mega busca harmônica: algoritmo de busca harmônica baseado em população e implementado em unidades de processamento gráfico Scalabrin, Marlon Henrique 31 March 2012 (has links) CAPES / Este trabalho propõe uma modificação da meta-heurística Busca Harmônica (HS) a partir de uma nova abordagem baseada em população, empregando, também, algumas estratégias inspiradas em outras meta-heurísticas. Este novo modelo foi implementado utilizando a arquitetura de programação paralela CUDA em uma GPU. O uso de placas de processamento gráficas (GPU) para processamento de propósito geral está crescendo, e estas têm sido utilizadas por muitos pesquisadores para processamento científico. Seu uso se mostra interessante para meta-heurísticas populacionais, podendo realizar muitas operações simultaneamente. A HS é uma meta-heurística inspirada no objetivo de um músico em buscar uma harmonia perfeita. modelo proposto incluiu-se uma população de harmonias temporárias que são geradas a cada nova iteração, permitindo a realização simultânea de diversas avaliações de função. Assim aumenta-se o grau de paralelismo da HS, possibilitando maiores ganhos de velocidade com o uso de arquiteturas paralelas. O novo modelo proposto executado em GPU foi denominado Mega Harmony Search (MHS). Na implementação em GPU cada passo do algoritmo é tratado individualmente em forma de kernels com configurações particulares para cada um. Para demonstrar a eficácia do modelo proposto foram selecionados alguns problemas de benchmark, como a otimização de estruturas de proteínas, a otimização de treliças e problemas matemáticos. Através de experimentos fatoriais foi identificado um conjunto de parâmetros padrão, o qual foi utilizado nos outros experimentos. As análises realizadas sobre resultados experimentais mostram que o MHS apresentou solução de qualidade equivalente à HS e ganhos de velocidade, com a sua execução em GPU, superiores a 60x quando comparado a implementação em CPU. Em trabalhos futuros poderão ser estudadas novas modificações ao algoritmo, como a implementação de nichos e estudos de estratégias de interação entre eles. / This work propose a new approach for the metaheuristic Harmonic Search (HS), by using a population of solutiona and other strategies inspired in another metaheuristics. This new model was implemented using a parallel architecture of a graphical processing unity (GPU). The use of GPU for general-purpose processing is growing, specially for scientific processing. Its use is particularly interesting for populational metaheuristics, where multiple operations are executed simultaneously. The HS is a metaheuristic inspired by the way jazz musicians search for a perfect harmony. In the proposed model a population of temporary harmonies was included. Such population was generated at each iteration, enabling simultaneous evaluation of the objective function being optimized, and thus, increasing the level of parallelism of HS. The new approach implemented in GPU was named Mega Harmony Search (MHS), and each step of the algorithm is handled in the form of kernels with particular configurations for each one. To show the efficiency of MHS some benchmark problems were selected for testing, including mathematical optimization problems, protein structure prediction, and truss structure optimization. Factorial experiments were done so as to find the best set of parameters for the MHS. The analyzes carried out on the experimental results show that the solutions provided by MHS have comparable quality to those of the simple Harmony Search. However, by using GPU, MHS achieved a speedup of 60x, compared with the implementation in regular CPU. Future work will focus other improvements in the algorithm, such as the use of niches and species, as well a study of the interactions between them. Análise harmônica Heurística Algorítmos Otimização combinatória Computação gráfica Arquitetura de computador Programação paralela (Computação) Harmonic analysis Heuristic Algorithms Combinatorial optimization Computer graphics Computer architecture Parallel programming (Computer science)

Search results