• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 226
  • 81
  • 30
  • 24
  • 14
  • 7
  • 6
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 501
  • 501
  • 103
  • 70
  • 61
  • 58
  • 58
  • 57
  • 57
  • 56
  • 54
  • 54
  • 52
  • 50
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Massively parallel computing for particle physics

Preston, Ian Christopher January 2010 (has links)
This thesis presents methods to run scientific code safely on a global-scale desktop grid. Current attempts to harness the world’s idle desktop computers face obstacles such as donor security, portability of code and privilege requirements. Nereus, a Java-based architecture, is a novel framework that overcomes these obstacles and allows the creation of a globally-scalable desktop grid capable of executing Java bytecode. However, most scientific code is written for the x86 architecture. To enable the safe execution of unmodified scientific code, we created JPC, a pure Java x86 PC emulator. The Nereus framework is applied to two tasks, a trivially parallel data generation task, BlackMax, and a parallelization and fault tolerance framework, Mycelia. Mycelia is an implementation of the Map-Reduce parallel programming paradigm. BlackMax is a microscopic blackhole event generator, of direct relevance for the Large Hadron Collider (LHC). The Nereus based BlackMax adaptation dramatically speeds up the production of data, limited only by the number of desktop machines available.
112

Logiciel de génération de nombres aléatoires dans OpenCL

Kemerchou, Nabil 08 1900 (has links)
clRNG et clProbdist sont deux interfaces de programmation (APIs) que nous avons développées pour la génération de nombres aléatoires uniformes et non uniformes sur des dispositifs de calculs parallèles en utilisant l’environnement OpenCL. La première interface permet de créer au niveau d’un ordinateur central (hôte) des objets de type stream considérés comme des générateurs virtuels parallèles qui peuvent être utilisés aussi bien sur l’hôte que sur les dispositifs parallèles (unités de traitement graphique, CPU multinoyaux, etc.) pour la génération de séquences de nombres aléatoires. La seconde interface permet aussi de générer au niveau de ces unités des variables aléatoires selon différentes lois de probabilité continues et discrètes. Dans ce mémoire, nous allons rappeler des notions de base sur les générateurs de nombres aléatoires, décrire les systèmes hétérogènes ainsi que les techniques de génération parallèle de nombres aléatoires. Nous présenterons aussi les différents modèles composant l’architecture de l’environnement OpenCL et détaillerons les structures des APIs développées. Nous distinguons pour clRNG les fonctions qui permettent la création des streams, les fonctions qui génèrent les variables aléatoires uniformes ainsi que celles qui manipulent les états des streams. clProbDist contient les fonctions de génération de variables aléatoires non uniformes selon la technique d’inversion ainsi que les fonctions qui permettent de retourner différentes statistiques des lois de distribution implémentées. Nous évaluerons ces interfaces de programmation avec deux simulations qui implémentent un exemple simplifié d’un modèle d’inventaire et un exemple d’une option financière. Enfin, nous fournirons les résultats d’expérimentation sur les performances des générateurs implémentés. / clRNG and clProbdist are two application programming interfaces (APIs) that we have developed respectively for the generation of uniform and non-uniform random numbers on parallel computing devices in the OpenCL environment. The first interface is used to create at a central computer level (host) objects of type stream considered as parallel virtual generators that can be used both on the host and on parallel devices (graphics processing units, multi-core CPU, etc.) for generating sequences of random numbers. The second interface can be used also on the host or devices to generate random variables according to different continuous and discrete probability distributions. In this thesis, we will recall the basic concepts of random numbers generators, describe the heterogeneous systems and the generation techniques of parallel random number, then present the different models composing the OpenCL environment. We will detail the structures of the developed APIs, distinguish in clRNG the functions that allow creating streams from the functions that generate uniform random variables and the functions that manipulate the states of the streams.We will describe also clProbDist that allow the generation of non-uniform random variables based on the inversion technique as well as returning different statistical values related to the distributions implemented. We will evaluate these APIs with two simulations, the first one implements a simplified example of inventory model and the second one estimate the value of an Asian call option. Finally, we will provide results of experimentations on the performance of the implemented generators.
113

Scalable data-management systems for Big Data / Sur le passage à l'échelle des systèmes de gestion des grandes masses de données

Tran, Viet-Trung 21 January 2013 (has links)
La problématique "Big Data" peut être caractérisée par trois "V": - "Big Volume" se rapporte à l'augmentation sans précédent du volume des données. - "Big Velocity" se réfère à la croissance de la vitesse à laquelle ces données sont déplacées entre les systèmes qui les gèrent. - "Big Variety" correspond à la diversification des formats de ces données. Ces caractéristiques imposent des changements fondamentaux dans l'architecture des systèmes de gestion de données. Les systèmes de stockage doivent être adaptés à la croissance des données, et se doivent de passer à l'échelle tout en maintenant un accès à hautes performances. Cette thèse se concentre sur la construction des systèmes de gestion de grandes masses de données passant à l'échelle. Les deux premières contributions ont pour objectif de fournir un support efficace des "Big Volumes" pour les applications data-intensives dans les environnements de calcul à hautes performances (HPC). Nous abordons en particulier les limitations des approches existantes dans leur gestion des opérations d'entrées/sorties (E/S) non-contiguës atomiques à large échelle. Un mécanisme basé sur les versions est alors proposé, et qui peut être utilisé pour l'isolation des E/S non-contiguës sans le fardeau de synchronisations coûteuses. Dans le contexte du traitement parallèle de tableaux multi-dimensionels en HPC, nous présentons Pyramid, un système de stockage large-échelle optimisé pour ce type de données. Pyramid revoit l'organisation physique des données dans les systèmes de stockage distribués en vue d'un passage à l'échelle des performances. Pyramid favorise un partitionnement multi-dimensionel de données correspondant le plus possible aux accès générés par les applications. Il se base également sur une gestion distribuée des métadonnées et un mécanisme de versioning pour la résolution des accès concurrents, ce afin d'éliminer tout besoin de synchronisation. Notre troisième contribution aborde le problème "Big Volume" à l'échelle d'un environnement géographiquement distribué. Nous considérons BlobSeer, un service distribué de gestion de données orienté "versioning", et nous proposons BlobSeer-WAN, une extension de BlobSeer optimisée pour un tel environnement. BlobSeer-WAN prend en compte la hiérarchie de latence et favorise les accès aux méta-données locales. BlobSeer-WAN inclut la réplication asynchrone des méta-données et une résolution des collisions basée sur des "vector-clock". Afin de traîter le caractère "Big Velocity" de la problématique "Big Data", notre dernière contribution consiste en DStore, un système de stockage en mémoire orienté "documents" qui passe à l'échelle verticalement en exploitant les capacités mémoires des machines multi-coeurs. Nous montrons l'efficacité de DStore dans le cadre du traitement de requêtes d'écritures atomiques complexes tout en maintenant un haut débit d'accès en lecture. DStore suit un modèle d'exécution mono-thread qui met à jour les transactions séquentiellement, tout en se basant sur une gestion de la concurrence basée sur le versioning afin de permettre un grand nombre d'accès simultanés en lecture. / Big Data can be characterized by 3 V’s. • Big Volume refers to the unprecedented growth in the amount of data. • Big Velocity refers to the growth in the speed of moving data in and out management systems. • Big Variety refers to the growth in the number of different data formats. Managing Big Data requires fundamental changes in the architecture of data management systems. Data storage should continue being innovated in order to adapt to the growth of data. They need to be scalable while maintaining high performance regarding data accesses. This thesis focuses on building scalable data management systems for Big Data. Our first and second contributions address the challenge of providing efficient support for Big Volume of data in data-intensive high performance computing (HPC) environments. Particularly, we address the shortcoming of existing approaches to handle atomic, non-contiguous I/O operations in a scalable fashion. We propose and implement a versioning-based mechanism that can be leveraged to offer isolation for non-contiguous I/O without the need to perform expensive synchronizations. In the context of parallel array processing in HPC, we introduce Pyramid, a large-scale, array-oriented storage system. It revisits the physical organization of data in distributed storage systems for scalable performance. Pyramid favors multidimensional-aware data chunking, that closely matches the access patterns generated by applications. Pyramid also favors a distributed metadata management and a versioning concurrency control to eliminate synchronizations in concurrency. Our third contribution addresses Big Volume at the scale of the geographically distributed environments. We consider BlobSeer, a distributed versioning-oriented data management service, and we propose BlobSeer-WAN, an extension of BlobSeer optimized for such geographically distributed environments. BlobSeer-WAN takes into account the latency hierarchy by favoring locally metadata accesses. BlobSeer-WAN features asynchronous metadata replication and a vector-clock implementation for collision resolution. To cope with the Big Velocity characteristic of Big Data, our last contribution feautures DStore, an in-memory document-oriented store that scale vertically by leveraging large memory capability in multicore machines. DStore demonstrates fast and atomic complex transaction processing in data writing, while maintaining high throughput read access. DStore follows a single-threaded execution model to execute update transactions sequentially, while relying on a versioning concurrency control to enable a large number of simultaneous readers.
114

Improving GEMFsim: a stochastic simulator for the generalized epidemic modeling framework

Fan, Futing January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Caterina M. Scoglio / The generalized epidemic modeling framework simulator (GEMFsim) is a tool designed by Dr. Faryad Sahneh, former PhD student in the NetSE group. GEMFsim simulates stochastic spreading process over complex networks. It was first introduced in Dr. Sahneh’s doctoral dissertation "Spreading processes over multilayer and interconnected networks" and implemented in Matlab. As limited by Matlab language, this implementation typically solves only small networks; the slow simulation speed is unable to generate enough results in reasonable time for large networks. As a generalized tool, this framework must be equipped to handle large networks and contain sufficient support to provide adequate performance. The C language, a low-level language that effectively maps a program to machine in- structions with efficient execution, was selected for this study. Following implementation of GEMFsim in C, I packed it into Python and R libraries, allowing users to enjoy the flexibility of these interpreted languages without sacrificing performance. GEMFsim limitations are not limited to language, however. In the original algorithm (Gillespie’s Direct Method), the performance (simulation speed) is inversely proportional to network size, resulting in unacceptable speed for very large networks. Therefore, this study applied the Next Reaction Method, making the performance irrelevant of network size. As long as the network fits into memory, the speed is proportional to the average node degree of the network, which is not very large for most real-world networks. This study also applied parallel computing in order to advantageously utilize multiple cores for repeated simulations. Although single simulation can not be paralleled as a Markov process, multiple simulations with identical network structures were run simultaneously, sharing one network description in memory.
115

Avaliação socioeconômica de uma rede ferroviária regional para o transporte de passageiros / Socioeconomic assessment of a regional railway network for passenger transport

Isler, Cassiano Augusto 15 May 2015 (has links)
A utilização majoritária da malha ferroviária brasileira para o transporte de cargas e a sua incompatibilidade para desenvolvimento de altas velocidades inviabiliza a oferta de serviços competitivos para o transporte intermunicipal de passageiros. A questão explorada nesta tese é sobre qual tecnologia ferroviária provê melhores resultados socioeconômicos no contexto de uma nova rede para o transporte intermunicipal de passageiros com operação exclusiva de Trens de Alto Desempenho (High Performance Trains - HPTs) ou Trens de Alta Velocidade (High Speed Trains - HSTs), caracterizados por velocidades médias de 150 km/h e 300 km/h respectivamente. Nesse sentido, a hipótese é que a diferença entre benefícios e custos socioeconômicos da oferta de serviços de HSTs resultam em valores positivos e maiores do que aquelas decorrentes da operação de HPTs, mediante parâmetros específicos de avaliação socioeconômica e uma configuração hipotética de rede na Região Sudeste do Brasil. Assim, o objetivo principal da tese é estimar e comparar os custos e benefícios socioeconômicos de uma rede ferroviária pela estimativa dos investimentos para a construção de novos traçados ferroviários, estimativa do número de viagens intermunicipais na Região Sudeste em um horizonte de planejamento estratégico e a propensão à escolha modal, estabelecimento de uma formulação de Análise Custo-Benefício (Cost Benefit Analysis - CBA) e aplicação dos modelos em cenários considerando a operação exclusiva de HPTs ou de HSTs. A resolução do problema de otimização de traçados ferroviários, com uma abordagem de computação paralela aplicada a um Algoritmo Genético, indica que os investimentos em infraestrutura variam majoritariamente em função da topografia, os custos de desapropriações são proporcionalmente pequenos e as restrições geométricas dos traçados não influenciam significativamente nesses resultados. Após a projeção do número de viagens intermunicipais por modelos analíticos, os dados coletados em uma pesquisa de preferência declarada são utilizados para modelagem da propensão à escolha modal. Finalmente, uma proposta de formulação para o cálculo dos principais itens de custos e benefícios de uma avaliação socioeconômica para o transporte ferroviário de passageiros viabiliza a análise de cenários em função de produtividade da construção da infraestrutura, variabilidade dos investimentos estimados e capacidade de atração de demanda reprimida. Os resultados dos cenários indicam que a diferença entre os benefícios estritamente econômicos (receita operacional e valor residual dos investimentos em infraestrutura) e os custos de construção e operacionais são negativos para qualquer tarifa ferroviária, apesar dos resultados para a rede de HSTs serem maiores do que para a de HPTs. Considerando os aspectos sociais, os benefícios totais da operação ferroviária sobrepõem-se aos seus custos em situações tarifárias específicas, novamente com resultados maiores para uma rede de HSTs, cuja tendência também é observada quando apenas os benefícios sociais são relacionados com custos totais através de uma Razão Benefício-Custo (Benefit-Cost Ratio - BCR). Portanto, segundo as premissas desta tese, há indícios que o investimento em uma infraestrutura ferroviária aparentemente não é uma decisão promissora em termos de viabilidade socioeconômica, apesar da operação de HSTs ser mais conveniente do que HPTs nas condições analisadas. / The major use of the Brazilian railway network for freight transport and its incompatibi-lity with high speed trains does not enable running competitive services of intercity passenger transport. The research question of this thesis is which rolling stock technology provides better socioeconomic results on a new intercity passenger network with exclusive operation of High Performance Trains (HPTs) or High Speed Trains (HSTs) with average speeds of 150 km/h and 300 km/h respectively. The hypothesis is that the difference between the socioeconomic benefits and costs of operating HSTs results in positive values and greater than those from the operation of HPTs given specific socioeconomic parameters and a hypothetical network in the Southeastern region of Brazil. The main goal of this research is to estimate and compare the major socioeconomic costs and benefits of a hypothetical railway network by first estimating the required investments of new alignments. Furthermore, the number of trips among cities in the Southeastern Region over a strategic planning horizon and the mode choice are estimated and a Cost-Benefit Analysis formulation is provided to be applied to scenarios of exclusive operation of HPTs or HSTs. The solution of the railway alignment optimization problem with a parallel computing approach applied to a Genetic Algorithm shows that the infrastructure investments vary mainly due to the topography, whereas the expropriation costs are proporti-onally small and the railway geometric constraints do not significantly affect the results. The number of trips by transport mode over a planning horizon is expanded analytically and the collected data of a stated preference survey is applied to a mode choice modeling approach. Finally, a formulation for the major items of costs and benefits of a socioeconomic assessment for a rail passenger transport project is proposed and it is applied to specific scenarios where the effects of the infrastructure building productivity, variability of the estimated investments, and the ability to attract new passengers are analyzed. The results from these scenarios show that the difference between the operating income and the net residual value of investments in infrastructure, construction and operating costs are negative for any rail fare, despite the results for the network of HSTs being higher than for HPTs. Furthermore, the overall benefits of the railway operation outweigh the costs in specific pricing policies with better results for a network operating HSTs. This trend is also noticed when only the social benefits are compared to the total costs by a Benefit-Cost Ratio (BCR). Therefore, there is evidence that the investment in railway infrastructure for passenger transport apparently is not a promising decision in terms of socioeconomic feasibility under the assumptions of this thesis, despite the operation of HSTs being more attractive than HPTs under the specified conditions.
116

Estudo de técnicas de paralelização de métodos computacionais de fatoração de matrizes esparsas aplicados à redes bayesianas e redes credais / Study of parallelization techniques of computational methods for sparse matrix factorization applied to Bayesian and credal networks

Maranhão, Viviane Teles de Lucca 19 August 2013 (has links)
Neste trabalho demos continuidade ao estudo desenvolvido por Colla (2007) que utilizou-se do arcabouço de álgebra linear com técnicas de fatoração de matrizes esparsas aplicadas à inferência em redes Bayesianas. Com isso, a biblioteca computacional resultante possui uma separação clara entre a fase simbólica e numérica da inferência, o que permite aproveitar os resultados obtidos na primeira etapa para variar apenas os valores numéricos. Aplicamos técnicas de paralelização para melhorar o desempenho computacional, adicionamos inferência para Redes Credais e novos algoritmos para inferência em Redes Bayesianas para melhor eciência dependendo da estrutura do grafo relacionado à rede e buscamos tornar ainda mais independentes as etapas simbólica e numérica. / In this work we continued the study by Colla (2007), who used the framework of linear algebra techniques with sparse matrix factorization applied to inference in Bayesian networks. Thus, the resulting computational library has a clear separation between the symbolic and numerical phase of inference, which allows you to use the results obtained in the rst step to vary only numeric values. We applied parallelization techniques to improve computational performance, we add inference to Credal Networks and new algorithms for inference in Bayesian networks for better eciency depending on the structure of the graph related to network and seek to become more independent symbolic and numerical steps.
117

Uma arquitetura sistólica para solução de sistemas lineares implementada com circuitos FPGAs. / A systolic architecture to solving linear systems implemented with FPGAs devices.

Aragão, Antônio Carlos de Oliveira Souza 17 December 1998 (has links)
Neste trabalho de mestrado foi desenvolvido o projeto de uma máquina paralela dedicada para solução de sistemas de equações lineares. Este é um problema presente em uma grande variedade de aplicações científicas e de engenharia e cuja solução torna-se uma tarefa computacionalmente intensiva , a medida em que o número de incógnitas aumenta. Implementou-se uma Arquitetura Sistólica unidimensional, conectada numa topologia em anel, que mapeia métodos de solução iterativos. Essa classe de arquiteturas paralelas apresenta características de simplicidade, regularidade e modularidade que facilitam implementações em hardware, sendo muito utilizadas em sistemas de computação dedicados à solução de problemas específicos, que possuem como características básicas a grande demanda computacional e a necessidade de respostas em tempo real. Foram adotadas metodologias e ferramentas avançadas para projeto de hardware que aceleram o ciclo de desenvolvimento e para a implementação foram utilizados circuitos reconfiguráveis FPGAs (Field Programmable Gate Arrays). Os resultados de desempenho são apresentados e avaliados apontado a melhor configuração da arquitetura para atingir um speedup em relação a implementações em máquinas seqüenciais. Também são discutidas as vantagens e desvantagens deste tipo de abordagem e metodologia na solução de problemas que possuem requisitos de tempo. / This dissertation presents the project of a parallel machine dedicated for solving linear systems. This is a problem that appears in a great variety of scientific and engineering applications with a solution that becomes a computationally intensive task, measured by the increasing number of unknown variables. An Systolic Architecture was implemented, connected in a ring topology, mapping an iterative solution method. This class of parallel architectures presents characteristics of simplicity, regularity and modularity that facilitate hardware implementations, being very used in dedicated computation systems to the solution of specific problems, which possess as requirements to handle great computational demand and real-time response. Advanced methodologies and tools for hardware project were adopted to accelerate the development cycle. The architecture has been implemented and verified on FPGAs (Field Programmable Gate Arrays). The performance results are presented and discussed, indicating the feasibility and efficiency of the adopted approach and methodology for this kind of problem.
118

Técnicas de orientação ao objeto para computação científica paralela / Object orinted techniques for parallel scientific computing

Rodrigues, Francisco Aparecido 29 April 2004 (has links)
Neste trabalho apresentamos a metodologia de orientação ao objeto no desenvolvimentos de uma biblioteca de classes para facilitar o processo de programação numérica paralela. Na implementação dos métodos das classes utilizamos as rotinas do pacote ScaLAPACK, sendo que essas classes oferecem métodos para manipulações matriciais básicas e para a diagonalização de matrizes, onde essas matrizes podem ser reais e complexas, de simples e dupla precisão. Este trabalho apresenta detalhes de implementação e uma análise comparativa de desempenho, a fim de mostrarmos a eficiência e as facilidades de uso da orientação ao objeto no desenvolvimento de programas científicos paralelos. / In this work current vs. voltage (I vs. V) and alternating conductivity (ac) measurements were carried out in poly[(2-methoxy- 5-hexyloxy)-pphenylenevinilene] ? MEH-PPV light-emitting diodes having zinc oxide (ZnO) as transparent anode and Al as metallic cathode. MEH-PPV is a PPV derivative, which emits in the red spectral region; ZnO has a work function similar to that of ITO, but it is less aggressive to the polymer, less expensive and easily processed. The retificated I vs. V curves shows that the direct current depends on the temperature. Moreover, the real and imaginary components of alternating conductivity (ac) present typical behavior of somewhat disordered material: the imaginary component grows as a function of the frequency and the real component was observed to be frequency independent for lower frequencies, and follows a power-law above a certain frequency. The Random Energy Free Barrier model approaches and a resistance in series for the interface phenomenon were developed and adjusted for the ac results. From this experimental-theoretical fitting we obtained important parameters of the devices as well as, quantitative informations about the MEH-PPV transport phenomenon.
119

Sobre a escolha da relaxação e ordenação das projeções no método de Kaczmarz com ênfase em implementações altamente paralelas e aplicações em reconstrução tomográfica / On the choice of relaxation and ordering of projections in Kaczmarz method with emphasis on highly prallel implementations and applications in tomographic reconstruction

Estácio, Leonardo Bravo 16 May 2014 (has links)
O método de Kaczmarz é um algoritmo iterativo que soluciona sistemas lineares do tipo Ax = b através de projeções sobre hiperplanos bastante usado em aplicações que envolvem a Tomografia Computadorizada. Recentemente voltou a ser destaque após a publicação de uma versão aleatória apresentada por Strohmer e Vershynin em 2009 a qual foi provada possuir taxa de convergência esperada exponencial. Posteriormente, Eldar e Needell em 2011 sugeriram uma versão modificada do algoritmo de Strohmer e Vershynin, na qual a cada iteração é selecionada a projeção ótima a partir de um conjunto aleatório, utilizando para isto o lema de Johnson-Lindenstrauss. Nenhum dos artigos mencionados apresenta uma técnica para a escolha do parâmetro de relaxação, entretanto, a seleção apropriada deste parâmetro pode ter uma influência substancial na velocidade do método. Neste trabalho apresentamos uma metodologia para a escolha do parâmetro de relaxação, bem como implementações paralelas do algoritmo de Kaczmarz utilizando as ideias de Eldar e Needell. Nossa metodologia para seleção do parâmetro utiliza uma nova generalização dos resultados de Strohmer e Vershynin que agora leva em consideração o parâmetro λ de relaxação e, a partir daí, obtemos uma estimativa da taxa de convergência como função de λ. Escolhemos então, para uso no algoritmo, aquele que otimiza esta estimativa. A paralelização dos métodos foi realizada através da plataforma CUDA e se mostrou muito promissora, pois conseguimos, através dela, um ganho significativo na velocidade de convergência / The Kaczmarz method is an iterative algorithm for finding the solution of a system of linear equations Ax = b by projecting onto the hyperplanes widely used in applications involving Computerized Tomography. It has been recently highlighted after the publication of a random version presented by Strohmer and Vershynin in 2009 that yields probably exponential convergence in expectation. Thereafter, Eldar and Needell in 2011 suggested a modified version of Strohmer and Vershynin algorithm, which at each iteration selects the optimal projection from a random set making use of the Johnson-Lindenstrauss lemma. None of the mentioned articles presents a technique for choosing the relaxation parameter, however, the proper selection of this parameter can achieve a substantial gain on the speed of the method. In this project we present a methodology for finding the relaxation parameter, as well as parallel implementations of Kacmarzs Algorithm using the ideas of Eldar and Needell. Our methodology for parameter selection uses a new generalization on Strohmer and Vershynins results which now regards the relaxation parameter λ. Thenceforward, we obtain an estimate of the convergence rate as a function of λ. Then we use this estimate in the algorithm the optimizer of this estimate. The parallelization of the methods has been implemented through the CUDA platform and appears to be very promising, since it delivers substantial gain in the convergence speed
120

FPGA-based programmable embedded platform for image processing applications

Siddiqui, Fahad Manzoor January 2018 (has links)
A vast majority of electronic systems including medical, surveillance and critical infrastructure employs image processing to provide intelligent analysis. They use onboard pre-processing to reduce data bandwidth and memory requirements before sending information to the central system. Field Programmable Gate Arrays (FPGAs) represent a strong platform as they permit reconfigurability and pipelining for streaming applications. However, rapid advances and changes in these application use cases crave adaptable hardware architectures that can process dynamic data workloads and be easily programmed to achieve ecient solutions in terms of area, time and power. FPGA-based development needs iterative design cycles, hardware synthesis and place-and-route times which are alien to the software developers. This work proposes an FPGA-based programmable hardware acceleration approach to reduce design effort and time. This allows developers to use FPGAs to profile, optimise and quickly prototype algorithms using a more familiar software-centric, edit-compile-run design flow that enables the programming of the platform by software rather than high-level synthesis (HLS) engineering principles. Central to the work has been the development of an optimised FPGA-based processor called Image Processing Processor (IPPro) which efficiently uses the underlying resources and presents a programmable environment to the programmer using a dataflow design principle. This gives superior performance when compared to competing alternatives. From this, a three-layered platform has been created which enables the realisation of parallel computing skeletons on FPGA which are used to eciently express designs in high-level programming languages. From bottom-up, these layers represent programming (actor, multiple actors and parallel skeletons) and hardware (IPPro core, multicore IPPro, system infrastructure) abstraction. The platform allows acceleration of parallel and non-parallel dataflow applications. A set of point and area image pre-processing functions are implemented on Avnet Zedboard platform which allows the evaluation of the performance. The point function achieved 2.53 times better performance than the area functions and point and area functions achieved performance improvements of 7.80 and 5.27 times over sin- gle core IPPro by exploiting data parallelism. The pipelined execution of multiple stages revealed that a dataflow graph can be decomposed into balanced actors to deliver maximum performance by hiding data transfer and processing time through exploiting task parallelism; otherwise, the maximum achievable performance is limited by the slowest actor due to the ripple effect caused by unbalanced actors. The platform delivered better performance in terms of fps/Watt/Area than Embedded Graphic Processing Unit (GPU) considering both technologies allows a software-centric design flow.

Page generated in 0.0646 seconds