• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 9
  • 6
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 47
  • 14
  • 14
  • 12
  • 11
  • 11
  • 11
  • 10
  • 10
  • 9
  • 8
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Estudio, Modelado e Implementación Paralela de Sistemas Celulares Utilizados en Microfabricación

Ferrando Jódar, Néstor 03 June 2011 (has links)
La presente tesis toma como eje central el modelado de sistemas dinámicos mediante Autómatas Celulares (ACs). Los ACs permiten modelar un sistema enunciando el comportamiento microscópico a fin de obtener un comportamiento macroscópico correcto. Una de los principales campos donde esta metodología ha sido aplicada (y la cual forma otro de los puntos centrales de esta tesis) es el modelado del Grabado Anisótropo Húmedo (GAH). El GAH es un proceso químico el cual permite realizar microestructuras de silicio tridimensionales, lo que le ha permitido convertirse en una importante técnica de microfabricación. El GAH se utiliza para el micromaquinado de Sistemas Micro-Electro-Mecánicos (MEMS). Los MEMS consisten en la integración de elementos mecánicos, sensores, actuadores y electrónica en un substrato de silicio común a través de la tecnología de microfabricación. Los MEMS tienen una gran influencia en la industria puesto que dispositivos fabricados mediante esta tecnología se utilizan de forma intensiva en diversos campos tales como: sistemas de seguridad en automoción, sensores de movimiento en electrónica de consumo o inyectores en sistemas de impresión. El GAH es un proceso complejo cuyo resultado depende en gran medida de los diversos parámetros del proceso: (disolución, temperatura, tiempo), por lo que la utilización de un simulador previo a la realización del experimento puede suponer un gran ahorro en cuestión de tiempo y material. Los simuladores actuales de GAH basados en ACs poseen diversas limitaciones: Tiempos de computación muy elevados debido a los altos requisitos computacionales de los ACs, un reducido conjunto de calibraciones existentes, así como la imposibilidad de simular el GAH basado en nuevos atacantes tales como TMAH+Triton. La resolución de estas limitaciones es abordada en diversos capítulos de la tesis. / Ferrando Jódar, N. (2011). Estudio, Modelado e Implementación Paralela de Sistemas Celulares Utilizados en Microfabricación [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/10984 / Palancia
12

Investigation of real-time coupled cluster methods for the efficient calculation of optical molecular properties in the time domain

Wang, Zhe 10 October 2023 (has links)
Optical and spectroscopic molecular properties are key to characterizing the behavior of molecules interacting with an applied electromagnetic field of light. Response theory has been used for a long time to calculate such properties in the frequency domain. Real-time (RT) methods solve for the frequency-dependent properties in the time domain by explicitly propagating the time-dependent wave function. Various quantum chemical methods can be incorporated with the RT formalism, including Hartree-Fock, density functional theory, configurational interaction, coupled cluster, etc. Among these, coupled cluster (CC) methods provide high accuracy for systems with strong electron correlation, making RT-CC implementations intriguing. All applications of CC methods face a substantial challenge due to their high-order polynomial scaling. For RT-CC methods, two aspects may be explored to improve the efficiency, the numerical techniques regarding the RT propagation and the reduced-scaling methods regarding CC itself. In this work, we start with the exploration of the hardware used for the calculations and the numerical integration methods for propagating the wave function parameters. Firstly, a GPU-enabled Python implementation has been developed by conducting the tensor contractions on GPUs utilizing PyTorch, a machine learning package, that has similar syntax as NumPy for tensor operations. A speedup of a factor of 14 is obtained for the RT-CCSD/cc-pVDZ absorption spectrum calculation of the water tetramer. Furthermore, to optimize the performance on GPUs, single-precision arithmetic is added to the implementation to achieve an additional speedup of a factor of two. Lastly, a group of integrators for solving differential equations are introduced to the RT framework, including regular explicit integrators, adaptive integrators, and a mixed-step-size approach customized for strong-field simulations. The optimal choice of the integrator depends on the requiring accuracy, stability and efficiency. In addition to being highly accurate, CC methods are also systematically improvable and provide a hierarchy of accuracy. Based upon the RT-CCSD implementation, the coupled cluster singles, doubles and approximate triples (CC3) method, favorable for calculating frequency-dependent properties, is tailored to the RT framework for high excitation and approximate orbital relaxation. The calculation is tested on both CPUs and GPUs, with a significant speedup gained from GPUs for the water cluster test cases. To further expand the range of applications of our RT-CC implementation, dynamic polarizabilities, first hyperpolarizabilities, and the G' tensor are calculated from induced electric and magnetic dipole moments using finite-difference methods. A discussion has also been conducted to compare RT-CC3 with RT-CCSD, and time-dependent nonorthogonal orbital-optimized coupled cluster doubles (TDNOCCD) method. Additionally, electron dynamics, including the Rabi oscillation and exited state to excited state transitions, have also been explored utilizing the well-developed RT-CC framework. / Doctor of Philosophy / Theoretical studies aim to match experiments, but more importantly, provide insights to interpret and predict experimental data. Calculating optical properties related to light-matter interactions is one of the most crucial tasks for characterizing molecular properties. In experiments, electromagnetic radiation in the form of light is applied to the system. The absorption or emission of light can be measured to identify, for example, the electronic structure of the molecule. In theoretical simulations, this applied radiation is represented by a perturbation operator that is added to the Hamiltonian in the Schrödinger equation. Quantum chemists are dedicated to developing methods that provide a better description of the spectroscopy. In the current work, the frequency, shape and the intensity of the radiation can all be finely-tuned, similar to experimental setups. The framework for extracting optical properties from time-dependent trajectories of induced dipole moments is established for accurate and efficient simulations. To improve efficiency and make the method feasible for real-world applications, a strong understanding of light-matter interactions on a quantum level and proper utilization of computational resources are both necessary. Improvements achieved and presented in this dissertation demonstrate a powerful tool for a better understanding of the nature of the interaction between the system and the electromagnetic radiation.
13

Trace-based Performance Analysis for Hardware Accelerators / Leistungsanalyse hardwarebeschleunigter Anwendungen mittels Programmspuren

Juckeland, Guido 14 February 2013 (has links) (PDF)
This thesis presents how performance data from hardware accelerators can be included in event logs. It extends the capabilities of trace-based performance analysis to also monitor and record data from this novel parallelization layer. The increasing awareness to power consumption of computing devices has led to an interest in hybrid computing architectures as well. High-end computers, workstations, and mobile devices start to employ hardware accelerators to offload computationally intense and parallel tasks, while at the same time retaining a highly efficient scalar compute unit for non-parallel tasks. This execution pattern is typically asynchronous so that the scalar unit can resume other work while the hardware accelerator is busy. Performance analysis tools provided by the hardware accelerator vendors cover the situation of one host using one device very well. Yet, they do not address the needs of the high performance computing community. This thesis investigates ways to extend existing methods for recording events from highly parallel applications to also cover scenarios in which hardware accelerators aid these applications. After introducing a generic approach that is suitable for any API based acceleration paradigm, the thesis derives a suggestion for a generic performance API for hardware accelerators and its implementation with NVIDIA CUPTI. In a next step the visualization of event logs containing data from execution streams on different levels of parallelism is discussed. In order to overcome the limitations of classic performance profiles and timeline displays, a graph-based visualization using Parallel Performance Flow Graphs (PPFGs) is introduced. This novel technical approach is using program states in order to display similarities and differences between the potentially very large number of event streams and, thus, enables a fast way to spot load imbalances. The thesis concludes with the in-depth analysis of a case-study of PIConGPU---a highly parallel, multi-hybrid plasma physics simulation---that benefited greatly from the developed performance analysis methods. / Diese Dissertation zeigt, wie der Ablauf von Anwendungsteilen, die auf Hardwarebeschleuniger ausgelagert wurden, als Programmspur mit aufgezeichnet werden kann. Damit wird die bekannte Technik der Leistungsanalyse von Anwendungen mittels Programmspuren so erweitert, dass auch diese neue Parallelitätsebene mit erfasst wird. Die Beschränkungen von Computersystemen bezüglich der elektrischen Leistungsaufnahme hat zu einer steigenden Anzahl von hybriden Computerarchitekturen geführt. Sowohl Hochleistungsrechner, aber auch Arbeitsplatzcomputer und mobile Endgeräte nutzen heute Hardwarebeschleuniger um rechenintensive, parallele Programmteile auszulagern und so den skalaren Hauptprozessor zu entlasten und nur für nicht parallele Programmteile zu verwenden. Dieses Ausführungsschema ist typischerweise asynchron: der Skalarprozessor kann, während der Hardwarebeschleuniger rechnet, selbst weiterarbeiten. Die Leistungsanalyse-Werkzeuge der Hersteller von Hardwarebeschleunigern decken den Standardfall (ein Host-System mit einem Hardwarebeschleuniger) sehr gut ab, scheitern aber an einer Unterstützung von hochparallelen Rechnersystemen. Die vorliegende Dissertation untersucht, in wie weit auch multi-hybride Anwendungen die Aktivität von Hardwarebeschleunigern aufzeichnen können. Dazu wird die vorhandene Methode zur Erzeugung von Programmspuren für hochparallele Anwendungen entsprechend erweitert. In dieser Untersuchung wird zuerst eine allgemeine Methodik entwickelt, mit der sich für jede API-gestützte Hardwarebeschleunigung eine Programmspur erstellen lässt. Darauf aufbauend wird eine eigene Programmierschnittstelle entwickelt, die es ermöglicht weitere leistungsrelevante Daten aufzuzeichnen. Die Umsetzung dieser Schnittstelle wird am Beispiel von NVIDIA CUPTI darstellt. Ein weiterer Teil der Arbeit beschäftigt sich mit der Darstellung von Programmspuren, welche Aufzeichnungen von den unterschiedlichen Parallelitätsebenen enthalten. Um die Einschränkungen klassischer Leistungsprofile oder Zeitachsendarstellungen zu überwinden, wird mit den parallelen Programmablaufgraphen (PPFGs) eine neue graphenbasisierte Darstellungsform eingeführt. Dieser neuartige Ansatz zeigt eine Programmspur als eine Folge von Programmzuständen mit gemeinsamen und unterchiedlichen Abläufen. So können divergierendes Programmverhalten und Lastimbalancen deutlich einfacher lokalisiert werden. Die Arbeit schließt mit der detaillierten Analyse von PIConGPU -- einer multi-hybriden Simulation aus der Plasmaphysik --, die in großem Maße von den in dieser Arbeit entwickelten Analysemöglichkeiten profiert hat.
14

Implementa??o do algoritmo (RTM) para processamento s?smico em arquiteturas n?o convencionais

Lima, Igo Pedro de 16 June 2014 (has links)
Made available in DSpace on 2014-12-17T14:08:57Z (GMT). No. of bitstreams: 1 IgoPL_DISSERT.pdf: 1338632 bytes, checksum: 5c21a0cb714155a0e215d803dca007ce (MD5) Previous issue date: 2014-06-16 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / With the growth of energy consumption worldwide, conventional reservoirs, the reservoirs called "easy exploration and production" are not meeting the global energy demand. This has led many researchers to develop projects that will address these needs, companies in the oil sector has invested in techniques that helping in locating and drilling wells. One of the techniques employed in oil exploration process is the reverse time migration (RTM), in English, Reverse Time Migration, which is a method of seismic imaging that produces excellent image of the subsurface. It is algorithm based in calculation on the wave equation. RTM is considered one of the most advanced seismic imaging techniques. The economic value of the oil reserves that require RTM to be localized is very high, this means that the development of these algorithms becomes a competitive differentiator for companies seismic processing. But, it requires great computational power, that it still somehow harms its practical success. The objective of this work is to explore the implementation of this algorithm in unconventional architectures, specifically GPUs using the CUDA by making an analysis of the difficulties in developing the same, as well as the performance of the algorithm in the sequential and parallel version / Com o crescimento do consumo energ?tico em todo o mundo, os reservat?rios convencionais, chamados de reservat?rios de f?cil explora??o e produ??o n?o est?o atendendo a demanda energ?tica mundial. Isso tem levado muitos pesquisadores a desenvolver trabalhos que venham sanar essas car?ncias. Empresas do setor petrol?fero tem investido em t?cnicas que ajudem na localiza??o e perfura??o de po?os. Uma das t?cnicas empregadas no processo de explora??o de petr?leo ? a Migra??o Reversa no Tempo (RTM), do ingl?s, Reverse Time Migration, que ? um m?todo de imageamento s?smico que produz excelente imagem de subsuperf?cie. ? um algoritmo baseado no c?lculo da equa??o de onda. A RTM ? considerada uma das t?cnicas mais avan?adas de imageamento s?smico. O valor econ?mico das reservas de petr?leo que requerem RTM para ser localizada ? muito alto, isso significa que o desenvolvimento desses algoritmos torna-se um diferencial competitivo para as empresas de processamento s?smico. No entanto, o mesmo requer grande poder computacional que, de alguma forma, ainda prejudica o seu sucesso pr?tico. Assim, o objetivo deste trabalho ? explorar a implementa??o desse algoritmo em arquiteturas n?o convencionais, especificamente as GPUs, utilizando a plataforma CUDA, fazendo uma an?lise das dificuldades no desenvolvimento do mesmo, bem como a performance do algoritmo na vers?o sequencial e paralela
15

Trace-based Performance Analysis for Hardware Accelerators

Juckeland, Guido 05 February 2013 (has links)
This thesis presents how performance data from hardware accelerators can be included in event logs. It extends the capabilities of trace-based performance analysis to also monitor and record data from this novel parallelization layer. The increasing awareness to power consumption of computing devices has led to an interest in hybrid computing architectures as well. High-end computers, workstations, and mobile devices start to employ hardware accelerators to offload computationally intense and parallel tasks, while at the same time retaining a highly efficient scalar compute unit for non-parallel tasks. This execution pattern is typically asynchronous so that the scalar unit can resume other work while the hardware accelerator is busy. Performance analysis tools provided by the hardware accelerator vendors cover the situation of one host using one device very well. Yet, they do not address the needs of the high performance computing community. This thesis investigates ways to extend existing methods for recording events from highly parallel applications to also cover scenarios in which hardware accelerators aid these applications. After introducing a generic approach that is suitable for any API based acceleration paradigm, the thesis derives a suggestion for a generic performance API for hardware accelerators and its implementation with NVIDIA CUPTI. In a next step the visualization of event logs containing data from execution streams on different levels of parallelism is discussed. In order to overcome the limitations of classic performance profiles and timeline displays, a graph-based visualization using Parallel Performance Flow Graphs (PPFGs) is introduced. This novel technical approach is using program states in order to display similarities and differences between the potentially very large number of event streams and, thus, enables a fast way to spot load imbalances. The thesis concludes with the in-depth analysis of a case-study of PIConGPU---a highly parallel, multi-hybrid plasma physics simulation---that benefited greatly from the developed performance analysis methods. / Diese Dissertation zeigt, wie der Ablauf von Anwendungsteilen, die auf Hardwarebeschleuniger ausgelagert wurden, als Programmspur mit aufgezeichnet werden kann. Damit wird die bekannte Technik der Leistungsanalyse von Anwendungen mittels Programmspuren so erweitert, dass auch diese neue Parallelitätsebene mit erfasst wird. Die Beschränkungen von Computersystemen bezüglich der elektrischen Leistungsaufnahme hat zu einer steigenden Anzahl von hybriden Computerarchitekturen geführt. Sowohl Hochleistungsrechner, aber auch Arbeitsplatzcomputer und mobile Endgeräte nutzen heute Hardwarebeschleuniger um rechenintensive, parallele Programmteile auszulagern und so den skalaren Hauptprozessor zu entlasten und nur für nicht parallele Programmteile zu verwenden. Dieses Ausführungsschema ist typischerweise asynchron: der Skalarprozessor kann, während der Hardwarebeschleuniger rechnet, selbst weiterarbeiten. Die Leistungsanalyse-Werkzeuge der Hersteller von Hardwarebeschleunigern decken den Standardfall (ein Host-System mit einem Hardwarebeschleuniger) sehr gut ab, scheitern aber an einer Unterstützung von hochparallelen Rechnersystemen. Die vorliegende Dissertation untersucht, in wie weit auch multi-hybride Anwendungen die Aktivität von Hardwarebeschleunigern aufzeichnen können. Dazu wird die vorhandene Methode zur Erzeugung von Programmspuren für hochparallele Anwendungen entsprechend erweitert. In dieser Untersuchung wird zuerst eine allgemeine Methodik entwickelt, mit der sich für jede API-gestützte Hardwarebeschleunigung eine Programmspur erstellen lässt. Darauf aufbauend wird eine eigene Programmierschnittstelle entwickelt, die es ermöglicht weitere leistungsrelevante Daten aufzuzeichnen. Die Umsetzung dieser Schnittstelle wird am Beispiel von NVIDIA CUPTI darstellt. Ein weiterer Teil der Arbeit beschäftigt sich mit der Darstellung von Programmspuren, welche Aufzeichnungen von den unterschiedlichen Parallelitätsebenen enthalten. Um die Einschränkungen klassischer Leistungsprofile oder Zeitachsendarstellungen zu überwinden, wird mit den parallelen Programmablaufgraphen (PPFGs) eine neue graphenbasisierte Darstellungsform eingeführt. Dieser neuartige Ansatz zeigt eine Programmspur als eine Folge von Programmzuständen mit gemeinsamen und unterchiedlichen Abläufen. So können divergierendes Programmverhalten und Lastimbalancen deutlich einfacher lokalisiert werden. Die Arbeit schließt mit der detaillierten Analyse von PIConGPU -- einer multi-hybriden Simulation aus der Plasmaphysik --, die in großem Maße von den in dieser Arbeit entwickelten Analysemöglichkeiten profiert hat.
16

Mapping parallel graph algorithms to throughput-oriented architectures

McLaughlin, Adam 07 January 2016 (has links)
The stagnant performance of single core processors, increasing size of data sets, and variety of structure in information has made the domain of parallel and high-performance computing especially crucial. Graphics Processing Units (GPUs) have recently become an exciting alternative to traditional CPU architectures for applications in this domain. Although GPUs are designed for rendering graphics, research has found that the GPU architecture is well-suited to algorithms that search and analyze unstructured, graph-based data, offering up to an order of magnitude greater memory bandwidth over their CPU counterparts. This thesis focuses on GPU graph analysis from the perspective that algorithms should be efficient on as many classes of graphs as possible, rather than being specialized to a specific class, such as social networks or road networks. Using betweenness centrality, a popular analytic used to find prominent entities of a network, as a motivating example, we show how parallelism, distributed computing, hybrid and on-line algorithms, and dynamic algorithms can all contribute to substantial improvements in the performance and energy-efficiency of these computations. We further generalize this approach and provide an abstraction that can be applied to a whole class of graph algorithms that require many simultaneous breadth-first searches. Finally, to show that our findings can be applied in real-world scenarios, we apply these techniques to the problem of verifying that a multiprocessor complies with its memory consistency model.
17

Task Performance with List-Mode Data

Caucci, Luca January 2012 (has links)
This dissertation investigates the application of list-mode data to detection, estimation, and image reconstruction problems, with an emphasis on emission tomography in medical imaging. We begin by introducing a theoretical framework for list-mode data and we use it to define two observers that operate on list-mode data. These observers are applied to the problem of detecting a signal~(known in shape and location) buried in a random lumpy background. We then consider maximum-likelihood methods for the estimation of numerical parameters from list-mode data, and we characterize the performance of these estimators via the so-called Fisher information matrix. Reconstruction from PET list-mode data is then considered. In a process we called "double maximum-likelihood" reconstruction, we consider a simple PET imaging system and we use maximum-likelihood methods to first estimate a parameter vector for each pair of gamma-ray photons that is detected by the hardware. The collection of these parameter vectors forms a list, which is then fed to another maximum-likelihood algorithm for volumetric reconstruction over a grid of voxels. Efficient parallel implementation of the algorithms discussed above is then presented. In this work, we take advantage of two low-cost, mass-produced computing platforms that have recently appeared on the market, and we provide some details on implementing our algorithms on these devices. We conclude this dissertation work by elaborating on a possible application of list-mode data to X-ray digital mammography. We argue that today's CMOS detectors and computing platforms have become fast enough to make X-ray digital mammography list-mode data acquisition and processing feasible.
18

A smoothed particle hydrodynamic simulation utilizing the parallel processing capabilites of the GPUs

Lundqvist, Viktor January 2009 (has links)
<p>Simulating fluid behavior has proven to be a demanding challenge which requires complex computational models and highly efficient data structures. Smoothed Particle Hydrodynamics (SPH) is a particle based computational model used to simulate fluid behavior that has been found capable of producing convincing results. However, the SPH algorithm is computational heavy which makes it cumbersome to work with.</p><p>This master thesis describes how the SPH algorithm can be accelerated by utilizing the GPU’s computational resources. It describes a model for how to distribute the work load on the GPU and presents a suitable data structure. In addition, it proposes a method to represent and handle moving objects in the fluids surroundings. Finally, the performance gain due to the GPU is evaluated by comparing processing times with an identical implementation running solely on the CPU.</p>
19

Modèle analytique de performance orienté débit d'évaluation de performance des accélérateurs programmables

Lai, Junjie 15 February 2013 (has links) (PDF)
L'ère du multi-cœur est arrivée. Les fournisseurs continuent d'ajouter des cœurs aux puces et avec davantage de cœurs, les consommateurs sont persuadés de transformer leurs ordinateurs en plateformes. Cependant, très peu d'applications sont optimisées pour les systèmes multi-cœurs. Il reste difficile de développer efficacement et de façon rentable des applications parallèles. Ces dernières années, de plus en plus de chercheurs dans le domaine de la HPS ont commencé à utiliser les GPU (Graphics Processing Unit, unité de traitement graphique) pour accélérer les applications parallèles. Une GPU est composée de nombreux cœurs plus petits et plus simples que les processeurs de CPU multi-cœurs des ordinateurs de bureau. Il n'est pas difficile d'adapter une application en série à une plateforme GPU. Bien que peu d'efforts soient nécessaires pour adapter de manière fonctionnelle les applications aux GPU, les programmeurs doivent encore passer beaucoup de temps à optimiser leurs applications pour de meilleures performances. Afin de mieux comprendre le résultat des performances et de mieux optimiser les applications de GPU, la communauté GPGPU travaille sur plusieurs thématiques intéressantes. Des modèles de performance analytique sont créés pour aider les développeurs à comprendre le résultat de performance et localiser le goulot d'étranglement. Certains outils de réglage automatique sont conçus pour transformer le modèle d'accès aux données, l'agencement du code, ou explorer automatiquement l'espace de conception. Quelques simulateurs pour applications de GPU sont également lancés. La difficulté évidente pour l'analyse de performance des applications de GPGPU réside dans le fait que l'architecture sous- jacente de la GPU est très peu documentée. La plupart des approches développées jusqu'à présent n'étant pas assez bonnes pour une optimisation efficace des applications du monde réel, et l'architecture des GPU évoluant très rapidement, la communauté a encore besoin de perfectionner les modèles et de développer de nouvelles approches qui permettront aux développeurs de mieux optimiser les applications de GPU. Dans ce travail de thèse, nous avons principalement travaillé sur deux aspects de l'analyse de performance des GPU. En premier lieu, nous avons étudié comment mieux estimer les performances des GPU à travers une approche analytique. Nous souhaitons élaborer une approche suffisamment simple pour être utilisée par les développeurs, et permettant de mieux visualiser les résultats de performance. En second lieu, nous tentons d'élaborer une approche permettant d'estimer la limite de performance supérieure d'une application dans certaines architectures de GPU, et d'orienter l'optimisation des performances.
20

Estratégia paralela exata para o alinhamento múlltiplo de sequências biológicas utilizando Unidades de Processamento Gráfico (GPU)

Lima, Daniel Sundfeld 28 August 2012 (has links)
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2012. / Submitted by Albânia Cézar de Melo (albania@bce.unb.br) on 2013-04-11T12:42:16Z No. of bitstreams: 1 2012_DanielSundfeldLima.pdf: 2274332 bytes, checksum: 03f64cd52764929edc5ad78619656562 (MD5) / Approved for entry into archive by Guimaraes Jacqueline(jacqueline.guimaraes@bce.unb.br) on 2013-05-20T14:40:19Z (GMT) No. of bitstreams: 1 2012_DanielSundfeldLima.pdf: 2274332 bytes, checksum: 03f64cd52764929edc5ad78619656562 (MD5) / Made available in DSpace on 2013-05-20T14:40:19Z (GMT). No. of bitstreams: 1 2012_DanielSundfeldLima.pdf: 2274332 bytes, checksum: 03f64cd52764929edc5ad78619656562 (MD5) / O alinhamento múltiplo de sequências biológicas é um problema muito importante em Biologia Molecular, pois permite que sejam detectadas similaridades e diferenças entre um conjunto de sequências. Esse problema foi provado NP-Difícil e, por essa razão, geralmente algoritmos heurísticos são usados para resolvê-lo. No entanto, a obtenção da solucão ótima é bastante desejada e, por essa razão, existem alguns algoritmos exatos que solucionam esse problema para um número reduzido de sequências. Dentre esses algoritmos, destaca-se o método exato Carrillo-Lipman, que permite reduzir o espaço de busca utilizando um limite inferior e superior. Mesmo com essa redução, o algoritmo com Carrillo-Lipman executa-se em tempo exponencial. Com o objetivo de acelerar a obtenção de resultados, plataformas computacionais de alto desempenho podem ser utilizadas para resolver o problema do alinhamento múltiplo. Dentre essas plataformas, destacam-se as Unidades de Processamento Gráfico (GPU) devido ao seu potencial para paralelismo massivo e baixo custo. O objetivo dessa dissertação de mestrado é propor e avaliar uma estratégia paralela para execução do algoritmo Carrillo-Lipman em GPU. A nossa estratégia permite a exploração do paralelismo em granularidade na, onde o espaço de busca é percorrido por várias threads em um cubo tridimensional, divido em janelas de processamento que são diagonais projetadas em duas dimensões. Os resultados obtidos com a comparação de conjuntos de 3 sequências reais e sintéticas de diversos tamanhos mostram que speedups de até 8,60x podem ser atingidos com a nossa estratégia. ______________________________________________________________________________ ABSTRACT / Multiple Sequence Alignment is a very important problem in Molecular Biology since it is able to detect similarities and di erences in a set of sequences. This problem has been proven NP-Hard and, for this reason, heuristic algorithms are usually used to solve it. Nevertheless, obtaining the optimal solution is highly desirable and there are indeed some exact algorithms that solve this problemfor a reduced number of sequences. Carrillo-Lipman is a well-known exact algorithmfor the Multiple Sequence Alignment problemthat is able to reduce the search space by using inferior and superior bounds. Even with this reduction, the Carrillo-Lipman algorithm executes in exponential time. High Performance Computing (HPC) Platforms can be used in order to produce results faster. Among the existing HPC platforms, GPUs (Graphics Processing Units) are receiving a lot of attention due to their massive parallelism and low cost. The goal of this MsC dissertation is to propose and evaluate a parallel strategy to execute the Carrillo-Lipman algorithm in GPU. Our strategy explores parallelism at ne granularity, where the search space is a tridimensional cube, divided on processing windows with bidimensional diagonals, explored by multiple threads. The results obtained when comparing several sets of 3 real and synthetic sequences show that speedups of 8.60x can be obtained with our strategy.

Page generated in 0.0594 seconds