21 |
Hardware paralelo reconfigurável para identificação de alinhamentos de sequências de DNA. / Parallel reconfigurable hardware to identify alignments in DNA sequences.Edgar José Garcia Neto Segundo 09 August 2012 (has links)
Amostras de DNA são encontradas em fragmentos, obtidos em vestígios de uma cena de crime, ou coletados de amostras de cabelo ou sangue, para testes genéticos ou de paternidade. Para identificar se esse fragmento pertence ou não a uma sequência de DNA, é necessário compará-los com uma sequência determinada, que pode estar armazenada em um banco de dados para, por exemplo, apontar um suspeito. Para tal, é preciso uma ferramenta eficiente para realizar o alinhamento da sequência de DNA encontrada com a armazenada no banco de dados. O alinhamento de sequências de DNA, em inglês DNA matching, é o campo da bioinformática que tenta entender a relação entre as sequências genéticas e suas relações funcionais e parentais. Essa tarefa é frequentemente realizada através de softwares que varrem clusters de base de dados, demandando alto poder computacional, o que encarece o custo de um projeto de alinhamento de sequências de DNA. Esta dissertação apresenta uma arquitetura de hardware paralela, para o algoritmo BLAST, que permite o alinhamento de um par de sequências de DNA. O algoritmo BLAST é um método heurístico e atualmente é o mais rápido. A estratégia do BLAST é dividir as sequências originais em subsequências menores de tamanho w. Após realizar as comparações nessas pequenas subsequências, as etapas do BLAST analisam apenas as subsequências que forem idênticas. Com isso, o algoritmo diminui o número de testes e combinações necessárias para realizar o alinhamento. Para cada sequência idêntica há três etapas, a serem realizadas pelo algoritmo: semeadura, extensão e avaliação. A solução proposta se inspira nas características do algoritmo para implementar um hardware totalmente paralelo e com pipeline entre as etapas básicas do BLAST. A arquitetura de hardware proposta foi implementada em FPGA e os resultados obtidos mostram a comparação entre área ocupada, número de ciclos e máxima frequência de operação permitida, em função dos parâmetros de alinhamento. O resultado é uma arquitetura de hardware em lógica reconfigurável, escalável, eficiente e de baixo custo, capaz de alinhar pares de sequências utilizando o algoritmo BLAST. / DNA samples are found in fragments, obtained in traces of a crime scene, collected from hair or blood samples, for genetic or paternity tests. To identify whether this fragment belongs or not to a given DNA sequence it is necessary to compare it with a determined sequence which usually come from a database, for instance, to point a suspect. To this end, we need an efficient tool to perform the alignment of the DNA sequence found with the ones stored in the database. The alignment of DNA sequences, which is a field of bioinformatics that helps to understand the relationship between genetic sequences and their functional relationships and parenting. This task is often performed by software that scan clusters of databases, which requires high computing effort, thus increasing the cost of DNA sequences alignment projects. This work presents a parallel hardware architecture, for BLAST algorithm, to DNA pairwise alignment. This is the original version of the BLAST algorithm, that resulted in several other versions. The BLAST algorithm is a heuristic method and is the fastest algorithm for sequence alignment. The strategy of BLAST is to divide the sequences into smaller subsequences of size w. After making comparisons in these subsequences, algorithm steps analyzes only the subsequences that are identical. Thus, reducing the number of tests and combinations needed to perform the alignment. For each identical sequence found, three steps are followed by the algorithm: seeding, extension and evaluation. The proposed hardware architecture is based on the characteristics of the algorithm to implement a fully parallel hardware, where the basic steps of BLAST are pipelined. The proposed architecture was implemented in FPGA and the results show a comparison between the area occupied, number of cycles and maximum frequency of operation permitted, as a function of alignment parameters. The result is a hardware architecture in reconfigurable logic, scalable, efficient and with low cost, capable of aligning the pairs of sequences using BLAST algorithm.
|
22 |
Development of Parallel Architectures for Radar/Video Signal Processing ApplicationsJarrah, Amin January 2014 (has links)
No description available.
|
23 |
Adéquation algorithme-architecture pour les réseaux de neurones à convolution : application à l'analyse de visages embarquée / Algorithm-architecture matching for convolutional neural network : application to embedded facial analysisMamalet, Franck 06 July 2011 (has links)
La prolifération des capteurs d'images dans de nombreux appareils électroniques, et l'évolution des capacités de traitements à proximité de ces capteurs ouvrent un champ d'exploration pour l'implantation et l'optimisation d'algorithmes complexes de traitement d'images afin de proposer des systèmes de vision artificielle embarquée. Ces travaux s'inscrivent dans la problématique dite d'adéquation algorithme-architecture (A3). Ils portent sur une classe d'algorithmes appelée réseau de neurones à convolutions (ConvNet) et ses applications en analyse de visages embarquée. La chaîne d'analyse de visages, introduite par Garcia et al., a été choisie d'une part pour ses performances en taux de détection/reconnaissance au niveau de l'état de l'art, et d'autre part pour son caractère homogène reposant sur des ConvNets. La première contribution de ces travaux porte sur une étude d'adéquation de cette chaîne d'analyse de visages aux processeurs embarqués. Nous proposons plusieurs adaptations algorithmiques des ConvNets, et montrons que celles-ci permettent d'obtenir des facteurs d'accélération importants (jusqu'à 700) sur un processeur embarqué pour mobile, sans dégradation des performances en taux de détection/reconnaissance. Nous présentons ensuite une étude des capacités de parallélisation des ConvNets, au travers des travaux de thèse de N. Farrugia. Une exploration "gros-grain" du parallélisme des ConvNets, suivie d'une étude de l'ordonnancement interne des processeurs élémentaires, conduisent à une architecture parallèle paramétrable, capable de détecter des visages à plus de 10 images VGA par seconde sur FPGA. Nous proposons enfin une extension de ces études à la phase d'apprentissage de ces réseaux de neurones. Nous étudions des restrictions de l'espace des hypothèses d'apprentissage, et montrons, sur un cas d'application, que les capacités d'apprentissage des ConvNets ne sont pas dégradées, et que le temps d'apprentissage peut être réduit jusqu'à un facteur cinq. / Proliferation of image sensors in many electronic devices, and increasing processing capabilities of such sensors, open a field of exploration for the implementation and optimization of complex image processing algorithms in order to provide embedded vision systems. This work is a contribution in the research domain of algorithm-architecture matching. It focuses on a class of algorithms called convolution neural network (ConvNet) and its applications in embedded facial analysis. The facial analysis framework, introduced by Garcia et al., was chosen for its state of the art performances in detection/recognition, and also for its homogeneity based on ConvNets. The first contribution of this work deals with an adequacy study of this facial analysis framework with embedded processors. We propose several algorithmic adaptations of ConvNets, and show that they can lead to significant speedup factors (up to 700) on an embedded processor for mobile phone, without performance degradation. We then present a study of ConvNets parallelization capabilities, through N. Farrugia's PhD work. A coarse-grain parallelism exploration of ConvNets, followed by study of internal scheduling of elementary processors, lead to a parameterized parallel architecture on FPGA, able to detect faces at more than 10 VGA frames per second. Finally, we propose an extension of these studies to the learning phase of neural networks. We analyze several hypothesis space restrictions for ConvNets, and show, on a case study, that classification rate performances are almost the same with a training time divided by up to five.
|
24 |
Modelagem e otimização de um robô de arquitetura paralela para aplicações industriais. / Modeling and optimization of a parallel architecture robot for industrial applications.Tartari Filho, Sylvio Celso 07 April 2006 (has links)
Este trabalho trata do estudo de robôs de arquitetura paralela, focando na modelagem e otimização dos mesmos. Não foi construído nenhum tipo de protótipo físico, contudo os modelos virtuais poderão, no futuro, habilitar tal façanha. Após uma busca por uma aplicação que se beneficie do uso de um robô de arquitetura paralela, fez-se uma pesquisa por arquiteturas viáveis já existentes ou relatadas na literatura. Escolheu-se a mais apta e prosseguiu-se com os estudos e modelagem cinemática e dinâmica, dando uma maior ênfase na cinemática e dinâmica inversa, esta última utilizando a formulação de Newton - Euler. Foi construído um simulador virtual em ambiente MATLAB 6.5, dotado de várias capacidades como interpolação linear e circular, avanço e uso de múltiplos eixos coordenados. Seu propósito principal é o de demonstrar a funcionalidade e eficácia dos métodos utilizados. Depois foi incorporado ao simulador um algoritmo de cálculo do volume de trabalho da máquina que utiliza alguns dados do usuário para calcular o volume, que pode ser aquele atrelado a uma postura em particular ou o volume de trabalho de orientação total. Algoritmos para medir o desempenho da máquina quanto à uniformidade e utilização da força dos atuadores foram construídos e também incorporados ao simulador, que consegue mostrar o elipsóide de forças ao longo de quaisquer movimentos executados pela plataforma móvel. Quanto à otimização, parte do ferramental previamente construído foi utilizado para que se pudesse chegar a um modelo de uma máquina que respeitasse restrições mínimas quanto ao tamanho e forma de seu volume de trabalho, mas ainda mantendo o melhor desempenho possível dentro deste volume. / This work is about the study of parallel architecture robots, focusing in modeling and optimization. No physical prototypes were built, although the virtual models can help those willing to do so. After searching for an application that could benefit from the use of a parallel robot, another search was made, this time for the right architecture type. After selecting the architecture, the next step was the kinematics and dynamics analysis. The dynamics model is developed using the Newton ? Euler method. A virtual simulator was also developed in MATLAB 6.5 environment. The simulator?s main purpose was to demonstrate that the methods applied were correct and efficient, so it has several features such as linear and circular interpolations, capacity to use multiple coordinate systems and others. After finishing the simulator, an algorithm to calculate the machine workspace was added. The algorithm receives as input some desired requirements regarding the manipulator pose and then calculates the workspace, taking into consideration imposed constraints. Lastly, algorithms capable to measure the manipulator?s performance regarding to its actuator and end-effector force relationship were also incorporated into the simulator that calculates the machine?s force ellipsoid during any movement, for each desired workspace point. For the optimization procedures, some previously developed tools were used, so that the resulting model was capable to respect some workspace constraints regarding size and shape, but also maintaining the best performance possible inside this volume.
|
25 |
Area and energy efficient VLSI architectures for low-density parity-check decoders using an on-the-fly computationGunnam, Kiran Kumar 15 May 2009 (has links)
The VLSI implementation complexity of a low density parity check (LDPC)
decoder is largely influenced by the interconnect and the storage requirements. This
dissertation presents the decoder architectures for regular and irregular LDPC codes that
provide substantial gains over existing academic and commercial implementations. Several
structured properties of LDPC codes and decoding algorithms are observed and are used to
construct hardware implementation with reduced processing complexity. The proposed
architectures utilize an on-the-fly computation paradigm which permits scheduling of the
computations in a way that the memory requirements and re-computations are reduced.
Using this paradigm, the run-time configurable and multi-rate VLSI architectures for the
rate compatible array LDPC codes and irregular block LDPC codes are designed. Rate
compatible array codes are considered for DSL applications. Irregular block LDPC codes
are proposed for IEEE 802.16e, IEEE 802.11n, and IEEE 802.20. When compared with a
recent implementation of an 802.11n LDPC decoder, the proposed decoder reduces the
logic complexity by 6.45x and memory complexity by 2x for a given data throughput.
When compared to the latest reported multi-rate decoders, this decoder design has an area efficiency of around 5.5x and energy efficiency of 2.6x for a given data throughput. The
numbers are normalized for a 180nm CMOS process.
Properly designed array codes have low error floors and meet the requirements of
magnetic channel and other applications which need several Gbps of data throughput. A
high throughput and fixed code architecture for array LDPC codes has been designed. No
modification to the code is performed as this can result in high error floors. This parallel
decoder architecture has no routing congestion and is scalable for longer block lengths.
When compared to the latest fixed code parallel decoders in the literature, this design has
an area efficiency of around 36x and an energy efficiency of 3x for a given data throughput.
Again, the numbers are normalized for a 180nm CMOS process. In summary, the design
and analysis details of the proposed architectures are described in this dissertation. The
results from the extensive simulation and VHDL verification on FPGA and ASIC design
platforms are also presented.
|
26 |
Desenvolvimento de uma cadeira de rodas rob?tica para transporte de portador de necessidades especiaisOliveira Neto, Ivo Alves de 31 January 2013 (has links)
Made available in DSpace on 2014-12-17T14:56:13Z (GMT). No. of bitstreams: 1
IvoAON_DISSERT.pdf: 3534559 bytes, checksum: 4a686a0bd15089c1b9710cf531e7beda (MD5)
Previous issue date: 2013-01-31 / Universidade Federal do Rio Grande do Norte / The objective of the dissertation was the realization of kinematic modeling of a robotic wheelchair using virtual chains, allowing the wheelchair modeling as a set of robotic manipulator arms forming a cooperative parallel kinematic chain. This document presents the development of a robotic wheelchair to transport people with special needs who overcomes obstacles like a street curb and barriers to accessibility in streets and avenues, including the study of assistive technology, parallel architecture, kinematics modeling, construction and assembly of the prototype robot with the completion of a checklist of problems and barriers to accessibility in several pathways, based on rules, ordinances and existing laws. As a result, simulations were performed on the chair in various states of operation to accomplish the task of going up and down stair with different measures, making the proportional control based on kinematics. To verify the simulated results we developed a prototype robotic wheelchair. This project was developed to provide a better quality of life for people with disabilities / O objetivo da disserta??o foi a realiza??o da modelagem cinem?tica de uma cadeira de rodas rob?tica usando cadeias virtuais, que permitiu modelar a cadeira como um conjunto de bra?os manipuladores cooperativos formando uma cadeia cinem?tica paralela.
Foi desenvolvida uma cadeira de rodas rob?tica para transporte de portador de necessidades especiais que supera obst?culos como desn?veis e barreiras existentes ? acessibilidade em ruas e avenidas, incluindo o estudo sobre tecnologia assistiva, arquitetura paralela, modelagem cinem?tica, constru??o e montagem do prot?tipo do rob? com a realiza??o de uma lista de verifica??o de problemas e barreiras ? acessibilidade em diversos percursos, tomando como base normas, decretos e leis existentes.
Como resultado, foram realizadas simula??es da cadeira em v?rios estados de opera??o para realizar a tarefa de subir e descer desn?veis com diferentes alturas, realizando o controle proporcional baseado na cinem?tica. Para comprovar os resultados simulados foi desenvolvido um prot?tipo do rob?. Este projeto foi desenvolvido visando proporcionar uma melhor qualidade de vida ?s pessoas portadoras de necessidades especiais
|
27 |
Modelagem e otimização de um robô de arquitetura paralela para aplicações industriais. / Modeling and optimization of a parallel architecture robot for industrial applications.Sylvio Celso Tartari Filho 07 April 2006 (has links)
Este trabalho trata do estudo de robôs de arquitetura paralela, focando na modelagem e otimização dos mesmos. Não foi construído nenhum tipo de protótipo físico, contudo os modelos virtuais poderão, no futuro, habilitar tal façanha. Após uma busca por uma aplicação que se beneficie do uso de um robô de arquitetura paralela, fez-se uma pesquisa por arquiteturas viáveis já existentes ou relatadas na literatura. Escolheu-se a mais apta e prosseguiu-se com os estudos e modelagem cinemática e dinâmica, dando uma maior ênfase na cinemática e dinâmica inversa, esta última utilizando a formulação de Newton - Euler. Foi construído um simulador virtual em ambiente MATLAB 6.5, dotado de várias capacidades como interpolação linear e circular, avanço e uso de múltiplos eixos coordenados. Seu propósito principal é o de demonstrar a funcionalidade e eficácia dos métodos utilizados. Depois foi incorporado ao simulador um algoritmo de cálculo do volume de trabalho da máquina que utiliza alguns dados do usuário para calcular o volume, que pode ser aquele atrelado a uma postura em particular ou o volume de trabalho de orientação total. Algoritmos para medir o desempenho da máquina quanto à uniformidade e utilização da força dos atuadores foram construídos e também incorporados ao simulador, que consegue mostrar o elipsóide de forças ao longo de quaisquer movimentos executados pela plataforma móvel. Quanto à otimização, parte do ferramental previamente construído foi utilizado para que se pudesse chegar a um modelo de uma máquina que respeitasse restrições mínimas quanto ao tamanho e forma de seu volume de trabalho, mas ainda mantendo o melhor desempenho possível dentro deste volume. / This work is about the study of parallel architecture robots, focusing in modeling and optimization. No physical prototypes were built, although the virtual models can help those willing to do so. After searching for an application that could benefit from the use of a parallel robot, another search was made, this time for the right architecture type. After selecting the architecture, the next step was the kinematics and dynamics analysis. The dynamics model is developed using the Newton ? Euler method. A virtual simulator was also developed in MATLAB 6.5 environment. The simulator?s main purpose was to demonstrate that the methods applied were correct and efficient, so it has several features such as linear and circular interpolations, capacity to use multiple coordinate systems and others. After finishing the simulator, an algorithm to calculate the machine workspace was added. The algorithm receives as input some desired requirements regarding the manipulator pose and then calculates the workspace, taking into consideration imposed constraints. Lastly, algorithms capable to measure the manipulator?s performance regarding to its actuator and end-effector force relationship were also incorporated into the simulator that calculates the machine?s force ellipsoid during any movement, for each desired workspace point. For the optimization procedures, some previously developed tools were used, so that the resulting model was capable to respect some workspace constraints regarding size and shape, but also maintaining the best performance possible inside this volume.
|
28 |
Conception et développement d'un circuit multiprocesseurs en ASIC dédié à une caméra intelligente / Design of a multiprocessor ASIC dedicated to smart cameraBoussadi, Mohamed Amine 25 February 2015 (has links)
Suffisante pour exécuter les algorithmes à la cadence de ces capteurs d’images performants, tout en gardant une faible consommation d’énergie. Les systèmes monoprocesseur n’arrivent plus à satisfaire les exigences de ce domaine. Ainsi, grâce aux avancées technologiques et en s’appuyant sur de précédents travaux sur les machines parallèles, les systèmes multiprocesseurs sur puce (MPSoC) représentent une solution intéressante et prometteuse. Dans de précédents travaux à cette thèse, la cible technologique pour développer de tels systèmes était les FPGA. Or les résultats ont montré les limites de cette cible en terme de ressource matérielles et en terme de performance (vitesse notamment). Ce constat nous amène à changer de cible c’est-à-dire à passer sur cible ASIC nécessitant ainsi de retravailler profondément l’architecture et les IPs qui existaient autour de la méthode existante (appelée HNCP, pour Homogeneous Network of Communicating Processors). Afin de bénéficier de la performance offerte par la cible ASIC, les systèmes multiprocesseurs proposés s’appuient sur la flexibilité de son architecture. Combinés à des squelettes de parallélisation facilitant la programmabilité de l’architecture, les circuits proposés permettent d’offrir des systèmes supportant le portage en temps réels de différentes classes d’algorithme de traitement d’images. Le résultat de ce travail a abouti à la fabrication d’un circuit intégré à base d’un seul processeur et de ses périphériques en technologie ST CMOS 65nm dont la surface est d’environ 1 mm² et à la définition de 2 architectures multiprocesseurs flexibles basées sur le concept des squelettes de parallélisation (une architecture de 16 coeurs de processeur en technologie ST CMOS 65 nm et une deuxième architecture de 64 coeurs de processeur en technologie ST CMOS FD-SOI 28 nm). / Smart sensors today require processing components with sufficient power to run algorithms at the rate of these high-performance image sensors, while maintaining low power consumption. Monoprocessor systems are no longer able to meet the requirements of this field. Thus, thanks to technological advances and based on previous works on parallel computers, multiprocessor systems on chip (MPSoC) represent an interesting and promising solution. Previous works around this thesis have used FPGA as technological target. However, results have shown the limits of this target in terms of hardware resources and in terms of performance (speed in particular). This observation leads us to change the target from FPGA to ASIC. This migration requires deep rework at the architecture level. Particularly, existing IPs around the method (called HNCP for Homogeneous Network of Communicating Processors) have to be revisited. To take advantage of the performance offered by the ASIC target, proposed multiprocessor systems are based on the flexibility of its architecture. Combined with parallel skeletons that ease programmability of the architecture, the proposed circuits allow to offer systems that support various real-time image processing algorithms. This work has led to the fabrication of an integrated circuit based on a single processor and its peripheral using ST CMOS 65nm technology with an area around 1 mm². Moreover, two flexible multiprocessor architectures based on the concept of parallel skeletons have been proposed (a 16 cores 65 nm CMOS multiprocessors and a 64 cores 28 nm FD-SOI CMOS multiprocessors).
|
29 |
Compilation pour machines à mémoire répartie : une approche multipasse / Compilation for distributed memory machines : a multipass approachLossing, Nelson 03 April 2017 (has links)
Les grilles de calculs sont des architectures distribuées couramment utilisées pour l'exécution de programmes scientifiques ou de simulation. Les programmeurs doivent ainsi acquérir de nouvelles compétences pour pouvoir tirer partie au mieux de toutes les ressources offertes. Ils doivent apprendre à écrire un code parallèle, et, éventuellement, à gérer une mémoire distribuée.L'ambition de cette thèse est de proposer une chaîne de compilation permettant de générer automatiquement un code parallèle distribué en tâches à partir d'un code séquentiel. Pour cela, le compilateur source-à-source PIPS est utilisé. Notre approche a deux atouts majeurs : 1) une succession de transformations simples et modulaires est appliquée, permettant à l'utilisateur de comprendre les différentes transformations appliquées, de les modifier, de les réutiliser dans d'autres contextes, et d'en ajouter de nouvelles; 2) une preuve de correction de chacune des transformations est donnée, permettant de garantir que le code généré est équivalent au code initial.Cette génération automatique de code parallèle distribué de tâches offre également une interface de programmation simple pour les utilisateurs. Une version parallèle du code est automatiquement générée à partir d'un code séquentiel annoté.Les expériences effectuées sur deux machines parallèles, sur des noyaux de Polybench, montrent une accélération moyenne linéaire voire super-linéaire sur des exemples de petites tailles et une accélération moyenne égale à la moitié du nombre de processus sur des exemples de grandes tailles. / Scientific and simulation programs often use clusters for their execution. Programmers need new programming skills to fully take advantage of all the available resources. They have to learn how to write parallel codes, and how to manage the potentially distributed memory.This thesis aims at generating automatically a distributed parallel code for task parallelisation from a sequential code. A source-to-source compiler, PIPS, is used to achieve this goal. Our approach has two main advantages: 1) a chain of simple and modular transformations to apply, thus visible and intelligible by the users, editable and reusable, and that make new optimisations possible; 2) a proof of correctness of the parallelisation process is made, allowing to insure that the generated code is correct and has the same result as the sequential one.This automatic generation of distributed-task program for distributed-memory machines provide a simple programming interface for the users to write a task oriented code. A parallel code can thus automatically be generated with our compilation process.The experimental results obtained on two parallel machines, using Polybench kernels, show a linear to super-linear average speedup on small data sizes. For large ones, average speedup is equal to half the number of processes.
|
30 |
Implementace obrazových klasifikátorů v FPGA / Implementation of Image Classifiers in FPGAsKadlček, Filip January 2010 (has links)
The thesis deals with image classifiers and their implementation using FPGA technology. There are discussed weak and strong classifiers in the work. As an example of strong classifiers, the AdaBoost algorithm is described. In the case of weak classifiers, basic types of feature classifiers are shown, including Haar and Gabor wavelets. The rest of work is primarily focused on LBP, LRP and LR classifiers, which are well suitable for efficient implementation in FPGAs. With these classifiers is designed pseudo-parallel architecture. Process of classifications is divided on software and hardware parts. The thesis deals with hardware part of classifications. The designed classifier is very fast and produces results of classification every clock cycle.
|
Page generated in 0.0692 seconds