Global ETD Search

41	Geração automática de módulos VHDL para localização de padrões invariante a escala e rotação em FPGA. / Automatic VHDL generation for solving rotation and scale-invariant template matching in FPGA. Henrique Pires Almeida Nobre 26 March 2009 (has links) A busca por padrões em imagens é um problema clássico em visão computacional e consiste em detectar a presença de uma dada máscara em uma imagem digital. Tal tarefa pode se tornar consideravelmente mais complexa com a invariância aos aspectos da imagem tais como rotação, escala, translação, brilho e contraste (RSTBC - rotation, scale, translation, brightness and contrast). Um algoritmo de busca de máscara foi recentemente proposto. Este algoritmo, chamado de Ciratefi, é invariante aos aspectos RSTBC e mostrou-se bastante robusto. Entretanto, a execução deste algoritmo em um computador convencional requer diversos segundos. Além disso, sua implementação na forma mais geral em hardware é difícil pois há muitos parâmetros ajustáveis. Este trabalho propõe o projeto de um software que gera automaticamente módulos compiláveis em Hardware Description Logic (VHDL) que implementam o filtro circular do algoritmo Ciratefi em dispositivos Field Programmable Gate Array (FPGA). A solução proposta acelera o tempo de processamento de 7s (em um PC de 3GHz) para 1,367ms (em um dispositivo Stratix III da Altera). Esta performance excelente (mais do que o necessário em sistemas em tempo-real) pode levar a sistemas de visão computacional de alta performance e de baixo custo. / Template matching is a classical problem in computer vision. It consists in detecting the presence of a given template in a digital image. This task becomes considerably more complex with the invariance to rotation, scale, translation, brightness and contrast (RSTBC). A novel RSTBC-invariant robust template matching algorithm named Ciratefi was recently proposed. However, its execution in a conventional computer takes several seconds. Moreover, the implementation of its general version in hardware is difficult, because there are many adjustable parameters. This work proposes a software that automatically generates compilable Hardware Description Logic (VHDL) modules that implement the circular filter of the Ciratefi template matching algorithm in Field Programmable Gate Array (FPGA) devices. The proposed solution accelerates the time to process a frame from 7s (in a 3GHz PC) to 1.367ms (in Altera Stratix III device). This excellent performance (more than the required for a real-time system) may lead to cost-effective high-performance coprocessing computer vision systems. FPGAs Processamento de imagem Template matching VHDL Computer vision FPGA Real time RSTBC-invariant Template matching VHDL
42	LALPC: uma ferramenta para compilação de programas em C para exploração do paralelismo de loops em FPGAs Porto, Lucas Faria 04 February 2015 (has links) Made available in DSpace on 2016-06-02T19:06:23Z (GMT). No. of bitstreams: 1 6777.pdf: 1533148 bytes, checksum: 25830198cf2d72379370c2466a0688cc (MD5) Previous issue date: 2015-02-04 / The physical limitations of silicon forced the industry to develop solutions that exploit the processing power of combining several general purpose processors. Even complex supercomputers that have multiple processors, they are still considered to inefficient processes that require large amounts of arithmetic operations using floating point data. Reconfigurable computing is gaining more space to have a performance close to a specific purpose devices (ASIC), and yet keep the flexibility provided by the architecture of general purpose processors. However, the complexity of hardware description languages often becomes a problem to the development of new projects. Tools for high-level synthesis have become more popular, they allow the transformation code in high-level hardware simply and quickly. However, solutions found in current tools generate simple hardware that does not exploit the techniques to improve the pipeline in hardware. This paper presents the development of techniques to exploit processing parallelism of the reconfigurable devices through programs described in language C. These techniques identify loops and improve the performance in hardware. As a result, we have improved in the high-level synthesis process generating optimized hardware. / A limitação física do silício forçou a indústria a desenvolver soluções que explorassem o poder de processamento de combinação de vários processadores de propósito geral. Mesmo os supercomputadores complexos que dispõem de vários processadores, eles ainda são considerados ineficientes para processamentos que exigem grandes quantidades de operações aritméticas utilizando dados em ponto flutuante. A computação reconfigurável vem ganhando cada vez mais espaço por ter um desempenho próximo aos dispositivos de propósito específico (ASIC), e ainda assim, manter a flexibilidade proporcionada pela arquitetura dos processadores de propósito geral. Entretanto, a complexidade das linguagens de descrição de hardware se torna muitas vezes uma barreira para o desenvolvimento de novos projetos. Ferramentas de síntese de alto nível vem se popularizando, elas permitem a transformação de códigos em alto nível em hardware de maneira simples e rápida. Entretanto, soluções encontradas nas ferramentas atuais, geram hardware simples que não exploram as técnicas que permitam melhorar o pipeline em hardware. Este trabalho apresenta o desenvolvimento de técnicas que permitem explorar o poder do paralelismo nos dispositivos reconfiguráveis por meio de programas descritos em uma linguagem C. Essas técnicas identificam laços de repetição e melhoram o desempenho em hardware. Como resultado, temos a melhora no processo de síntese de alto nível gerando hardware otimizado. Compiladores (Computadores) FPGAs Loop pipeline FPGA Compiler
43	Uma metodologia para esclarecimento de tarefas de tempo real em arquiteturas dinamicamente reconfiguráveis Eskinazi Sant'Anna, Remy January 2006 (has links) Made available in DSpace on 2014-06-12T15:59:44Z (GMT). No. of bitstreams: 2 arquivo5532_1.pdf: 2107348 bytes, checksum: f54c0cde06194bb510ceea86f3c8cf6a (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2006 / Atualmente, os dispositivos eletrônicos digitais programáveis vêm trazendo grandes avanços quando na prototipação e na implementação de sistemas embarcados, especialmente aqueles que apresentam algum tipo de computação embarcada. Estes dispositivos são largamente utilizados em áreas como telecomunicações, redes de computadores, tratamento de sinais, controle, etc. Particularmente, dispositivos lógicos programáveis como os FPGAs (Field Programmable Gate Arrays) têm apresentado vantagens especiais por apresentarem características e recursos computacionais implementadas em hardware, em sua reconfiguração. Esta característica permite que este tipo de dispositivo possa ser reprogramado total ou parcialmente de acordo com a demanda do projeto, através de núcleos de hardware previamente sintetizado e armazenado em bibliotecas de componentes. O principal problema a ser abordado neste trabalho diz respeito à necessidade de ferramentas adequadas para análise e síntese de aplicações com características de tempo real em hardware reconfigurável. Estas aplicações, que no contexto deste trabalho, são implementadas em hardware, necessitam ter suas tarefas adequadamente escalonadas ao longo do tempo, de acordo com os requisitos temporais impostos pela aplicação. A forma como esta distribuição de tarefas é feita pode degradar o desempenho do sistema ou fazer com que não seja possível garantir que todos os requisitos da aplicação sejam atendidos. O objetivo desta Tese é portanto, propor um novo método de escalonamento de tarefas para aplicações em tempo real em arquiteturas parcial e dinamicamente reconfiguráveis baseadas em FPGAs. A metodologia proposta, usa como linguagem interna para representação e modelagem de sistemas, redes de Petri temporizadas. Para tal, considera-se inicialmente as especificações temporais da aplicação como um todo e particularmente de cada tarefa que compõe a aplicação, a interdependência de dados entre estas tarefas e a arquitetura onde será implementada a aplicação. Nesta tese é apresentado o estado da arte em projetos com FPGAs, bem como uma revisão dos métodos de escalonamento de tarefas que podem ser implementados em sistemas baseados nestes dispositivos. As principais contribuições desta tese referem-se a geração de um conjunto de escalonamentos que atendam as especificções de precedencia e de tempo da aplicação e a seleção de um escalonamento em particular que apresente o melhor desempenho temporal do conjunto gerado para implementação em FPGA. De acordo com estes levantamentos e juntamente com os resultados obtidos, conclui-se que a metodologia desenvolvida representa uma efetiva contribuição ao projeto de sistemas dinamicamente reconfiguráveis. Exemplos são discutidos como forma de demonstrar a metodologia sugerida bem como, suas vantagens e limitações FPGAs Reconfiguração parcial e dinâmica Aplicações em tempo real Escalonamento de tarefas Redes de Petri
44	Aquarius Uma plataforma para desenvolvimento de sistemas digitais dinamicamente reconfiguráveis Leandro Seixas, Jordana January 2007 (has links) Made available in DSpace on 2014-06-12T15:59:50Z (GMT). No. of bitstreams: 2 arquivo5650_1.pdf: 2595763 bytes, checksum: 42fc72bb1ec45c1ac0cfbbcdfa706d6d (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2007 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Há um grande interesse por parte dos pesquisadores em relação às características de autoreconfiguração e auto-adaptação presentes em plataformas modernas de hardware baseadas em dispositivos lógicos dinamicamente reconfiguráveis FPGAs (Field Programmable Gate Arrays). Alguns destes dispositivos apresentam características ainda mais específicas, permitindo sua reconfiguração parcial e dinâmica, o que permite que, parte da lógica, possa ser modificada enquanto o restante do circuito permanece em operação. O objetivo desta dissertação é desenvolver uma Plataforma de Reconfiguração Dinâmica baseada em FPGAs, que permita a execução de aplicações utilizando os métodos de hardware virtual, permitindo modificações nas configurações parciais em hardware, processamento massivo de dados, etc. Esta plataforma é um estudo de caso em reconfiguração dinâmica para implementação real dos trabalhos de pesquisa em Escalonamento de Tarefas e Particionamento Temporal. Esta plataforma híbrida, denominada Aquarius, é composta pelas plataformas Altera e Xilinx, baseadas nos dispositivos FPGAs Stratix-II e Virtex-II, respectivamente. A plataforma Altera oferece todo o suporte para reconfiguração do dispositivo da Xilinx. Esta plataforma é controlada por um processador soft-core Nios da Altera, o qual possui o suporte de um SO uCLinux, além de device drivers especialmente desenvolvidos para reconfiguração do dispositivo da Xilinx. Um módulo de reconfiguração especial, o IP-SelectMAP, foi desenvolvido para programação do hardware dinâmica e parcialmente reconfigurável. Este módulo recebe informações da plataforma da Altera, através dos device drivers, os bitstreams, arquivos responsáveis pela programação do dispositivo da Xilinx. Todos os bitstreams de configuração são previamente escalonados de acordo com a aplicação do usuário. Desenvolver sistemas de reconfiguração dinâmica ainda é um desafio, porque sua implementação é complexa e por haver poucas plataformas de hardware e software para projetá-los. No entanto, metodologias de projeto como as aqui propostas, permitem que novas classes de hardware virtual possam ser, no futuro, mais facilmente utilizados, assim como, soluções reais, em processamento massivo de dados em plataforma Multi-FPGAs
45	Reconfigurable Computing Systems for Robotics using a Component-Oriented Approach Podlubne, Ariel 18 December 2023 (has links) Robotic platforms are becoming more complex due to the wide range of modern applications, including multiple heterogeneous sensors and actuators. In order to comply with real-time and power-consumption constraints, these systems need to process a large amount of heterogeneous data from multiple sensors and take action (via actuators), which represents a problem as the resources of these systems have limitations in memory storage, bandwidth, and computational power. Field Programmable Gate Arrays (FPGAs) are programmable logic devices that offer high-speed parallel processing. FPGAs are particularly well-suited for applications that require real-time processing, high bandwidth, and low latency. One of the fundamental advantages of FPGAs is their flexibility in designing hardware tailored to specific needs, making them adaptable to a wide range of applications. They can be programmed to pre-process data close to sensors, which reduces the amount of data that needs to be transferred to other computing resources, improving overall system efficiency. Additionally, the reprogrammability of FPGAs enables them to be repurposed for different applications, providing a cost-effective solution that needs to adapt quickly to changing demands. FPGAs' performance per watt is close to that of Application-Specific Integrated Circuits (ASICs), with the added advantage of being reprogrammable. Despite all the advantages of FPGAs (e.g., energy efficiency, computing capabilities), the robotics community has not fully included them so far as part of their systems for several reasons. First, designing FPGA-based solutions requires hardware knowledge and longer development times as their programmability is more challenging than Central Processing Units (CPUs) or Graphics Processing Units (GPUs). Second, porting a robotics application (or parts of it) from software to an accelerator requires adequate interfaces between software and FPGAs. Third, the robotics workflow is already complex on its own, combining several fields such as mechanics, electronics, and software. There have been partial contributions in the state-of-the-art for FPGAs as part of robotics systems. However, a study of FPGAs as a whole for robotics systems is missing in the literature, which is the primary goal of this dissertation. Three main objectives have been established to accomplish this. (1) Define all components required for an FPGAs-based system for robotics applications as a whole. (2) Establish how all the defined components are related. (3) With the help of Model-Driven Engineering (MDE) techniques, generate these components, deploy them, and integrate them into existing solutions. The component-oriented approach proposed in this dissertation provides a proper solution for designing and implementing FPGA-based designs for robotics applications. The modular architecture, the tool 'FPGA Interfaces for Robotics Middlewares' (FIRM), and the toolchain 'FPGA Architectures for Robotics' (FAR) provide a set of tools and a comprehensive design process that enables the development of complex FPGA-based designs more straightforwardly and efficiently. The component-oriented approach contributed to the state-of-the-art in FPGA-based designs significantly for robotics applications and helps to promote their wider adoption and use by specialists with little FPGA knowledge. FPGAs, Robotics, MDE, Embedded Systems info:eu-repo/classification/ddc/006 ddc:006
46	iPACE-V1: A PORTAABLE ADAPTIVE COMPUTING ENGINE KHAN, JAWAD BASIT 11 October 2002 (has links) No description available. Computer Science reconfigurable computing adaptive computing FPGAs mobile computing image processing
47	An Analysis of NoCs in FPGAs Binesh, Marvasti Mohammadreza 10 1900 (has links) <p>Accurate analytic models for the area, delay and power of NoC routers realized in FPGA technology are presented. Several router designs are explored, including the demultiplexer-multiplexer design, the broadcast-and-select design, a RAM-based design, and pipelined designs with arbitrary amounts of buffering. The analytic models are compared with extensive experimental results, and shown to be very accurate. Using these router models, accurate analytic models for the area, delay and power of graph-based and hypergraph-based NoC topologies realized in FPGAs are presented, including 2D Mesh, Torus, Binary Hypercube (BHC), Generalized Hypercube (GHC), and Hypermesh. Three traffic patterns are considered, (a) Random-Uniform traffic patterns, (b) traffic patterns in Bitonic sorting algorithm, and (c) traffic patterns in FFT parallel algorithm.</p> <p>The analytic models for NoCs are compared to extensive experimental results and shown to be very accurate, typically within 10%. Using these analytical models, architectural choices such as NoC topology, buffer sizing, crossbar switch design, and degree of pipelining can be explored analytically early in the design-space exploration process. It has been observed that an efficient and accurate early design process results in lower system costs, and in order to come up with feasible designs, early design-space exploration tools are essential.</p> <p>Early design-space exploration tools using analytic models are ideal, as they do not require the generation of detailed logic design in a hardware description language such as VHDL or Verilog. However, to date there are no analytic models for NoCs in FPGAs. This thesis addresses this problem. According to our analytic power models, in an FPGA environment with equal bisection bandwidth the 2D BHC outperforms the 2D Mesh and Torus significantly. For example under equivalent bisection bandwidth, when performing FFT computations in an FPGA environment the 2D BHC consumes 8% of the power of a 2D Mesh, and 15% of the power of a 2D Torus.</p> <p>Hypermeshes are based on the concept of hypergraphs, which consist of a set of nodes and a set of hyperedges, where the hyperedges represent low-latency switches. Under equivalent bisection bandwidth, 2D Hypermesh NoCs outperform the 2D Mesh and Torus significantly. To improve the performance of the Hypermesh, two new hyperedge designs are proposed. We propose the energy-area product as a design metric to compare the NoCs. The energy-area product reflects both the cost and performance design metrics. Our analysis indicates that the 2D Hypermesh NoCs generally have considerably lower area, energy, and energy-area product compared to the 2D Hypercubes. Under equal bisection bandwidth, the area usage of the 2D Hypermesh using the broadcast-and-select designs as the hyperedges uses 30% of the area of the GHC and 42% of the area of the BHC. The energy-area product of the 2D Hypermesh under the FFT algorithm is 9% of the GHC, and 29% of the BHC.</p> / Doctor of Philosophy (PhD) Network-on-Chip Analytic Models FPGAs
48	Computational Acceleration for Next Generation Chemical Standoff Sensors Using FPGAs Ruddy, John January 2012 (has links) This research provides the real-time computational resource for three dimensional tomographic chemical threat mapping using mobile hyperspectral sensors from sparse input data. The crucial calculation limiting real-time execution of the algorithm is the determination of the projection matrix using the algebraic reconstruction technique (ART). The computation utilizes the inherent parallel nature of ART with an implementation of the algorithm on a field programmable gate array. The MATLAB Fixed-Point Toolbox is used to determine the optimal fixed-point data types in the conversion from the original floating-point algorithm. The computation is then implemented using the Xilinx System Generator, which generates a hardware description language representation from a block diagram design. / Electrical and Computer Engineering Electrical Engineering Algorithm Acceleration Fpgas High Performance Computing Progammable Gate Array
49	Energy Efficient Loop Unrolling for Low-Cost FPGAs Dumpala, Naveen Kumar 27 October 2017 (has links) (PDF) Many embedded applications implement block ciphers and sorting and searching algorithms which use multiple loop iterations for computation. These applications often demand low power operation. The power consumption of designs varies with the implementation choices made by designers. The sequential implementation of loop operations consumes minimal area, but latency and clock power are high. Alternatively, loop unrolling causes high glitch power. In this work, we propose a low area overhead approach for unrolling loop iterations that exhibits reduced glitch power. A latch based glitch filter is introduced that reduces the propagation of glitches from one iteration to next. We explore the optimal number of filters to be inserted for different applications that give a good balance between area and power. We also implement partial unrolling with glitch filters. This approach consumes less area while still giving energy savings comparable to the fully unrolled implementation. Our approach is targeted to Xilinx and Altera FPGAs. We simulate different implementation choices and compare energy results to evaluate the savings. We demonstrate our approach on SIMON-128 and AES-256 block ciphers and a sorting algorithm. We prototype our design on Xilinx Artix-7 and Altera Cyclone-IV-GX FPGA development boards and measure the actual power savings. Results show up-to 90% dynamic energy reduction in Xilinx designs, and 97% reduction in Altera designs with our glitch filtering approach due to glitch power reduction. Field-Programmable Gate Array Loop Unrolling Glitch Filters Power in FPGAs
50	High-Level CSP Model Compiler for FPGAs Asthana, Rohit Mohan 19 January 2011 (has links) The ever-growing competition in current electronics industry has resulted in stringent time-to-market goals and reduced design time available to engineers. Lesser design time has subsequently raised a need for high-level synthesis design methodologies that raise the design to a higher level of abstraction. Higher level of abstraction helps in increasing the predictability and productivity of the design and reduce the number of bugs due to human-error. It also enables the designer to try out dierent optimization strategies early in the design stage. In-spite of all these advantages, high-level synthesis design methodologies have not gained much popularity in the mainstream design flow mainly because of the reasons like lack of readability and reliability of the generated register transfer level (RTL) code. The compiler framework presented in this thesis allows the user to draw high-level graphical models of the system. The compiler translates these models into synthesizeable RTL Verilog designs that exhibit their desired functionality following communicating sequential processes (CSP) model of computation. CSP model of computation introduces a good handshaking mechanism between different components in the design that makes designs less prone to timing violations during implementation and bottlenecks while in actual operation. / Master of Science High-Level Synthesis FPGAs Models of Computation (MoC) Communicating Sequential Processes (CSP) Autocode Generation

Search results