• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 62
  • 37
  • 3
  • 2
  • 2
  • Tagged with
  • 137
  • 137
  • 68
  • 57
  • 57
  • 25
  • 24
  • 24
  • 23
  • 20
  • 20
  • 19
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Characterization of Sparsity-aware Optimization Paths for Graph Traversal on FPGA

Gondhalekar, Atharva 25 May 2023 (has links)
Breath-first search (BFS) is a fundamental building block in many graph-based applications, but it is difficult to optimize for a field-programmable gate array (FPGA) due to its irregular memory-access patterns. Prior work, based on hardware description languages (HDLs) and high-level synthesis (HLS), address the memory-access bottleneck of BFS by using techniques such as data alignment and compute-unit replication on FPGAs. The efficacy of such optimizations depends on factors such as the sparsity of target graph datasets. Optimizations intended for sparse graphs may not work as effectively for dense graphs on an FPGA and vice versa. This thesis presents two sets of FPGA optimization strategies for BFS, one for near-hypersparse graphs and the other designed for sparse to moderately dense graphs. For near-hypersparse graphs, a queue-based kernel with maximal use of local memory on FPGA is implemented. For denser graphs, an array-based kernel with compute-unit replication is implemented. Across a diverse collection of graphs, our OpenCL optimization strategies for near-hypersparse graphs delivers a 5.7x to 22.3x speedup over a state-of-the-art OpenCL implementation, when evaluated on an Intel Stratix~10 FPGA. The optimization strategies for sparse to moderately dense graphs deliver 1.1x to 2.3x speedup over a state-of-the-art OpenCL implementation on the same FPGA. Finally, this work uses graph metrics such as average degree and Gini coefficient to observe the impact of graph properties on the performance of the proposed optimization strategies. / M.S. / A graph is a data structure that typically consists of two sets -- a set of vertices and a set of edges representing connections between the vertices. Graphs are used in a broad set of application domains such as the testing and verification of digital circuits, data mining of social networks, and analysis of road networks. In such application areas, breadth-first search (BFS) is a fundamental building block. BFS is used to identify the minimum number of edges needed to be traversed from a source vertex to one or many destination vertices. In recent years, several attempts have been made to optimize the performance of BFS on reconfigurable architectures such as field-programmable gate arrays (FPGAs). However, the optimization strategies for BFS are not necessarily applicable to all types of graphs. Moreover, the efficacy of such optimizations oftentimes depends on the sparsity of input graphs. To that end, this work presents optimization strategies for graphs with varying levels of sparsity. Furthermore, this work shows that by tailoring the BFS design based on the sparsity of the input graph, significant performance improvements are obtained over the state-of-the-art BFS implementations on an FPGA.
82

Design and Implementation of a Soft Radio Architecture for Reconfigurable Platforms

Srikanteswara, Srikathyayani 31 July 2001 (has links)
Software radios have evolved as multimode, programmable digital radios that perform radio functions using digital signal processing algorithms. They have been designed as software programmable radios using a combination of various hardware elements and structures. In this dissertation a {em{soft radio}} refers to a completely configurable radio that can be programmed through software, to change the radio behavior including the hardware functionality. Conventional software radios achieve flexibility through software with the use of static hardware. While these radios have the flexibility to operate in multiple modes, the hardware is not used efficiently. This inefficient utilization of hardware frequently limits the flexibility of software radios and the number of modes the radio can support. Soft radios however, attempt to gain flexibility through the use of reconfigurable hardware. The same piece of hardware can be configured to perform different functions based on the mode the radio is operating in. While many soft/software radio architectures have been suggested and implemented, there remains a lack of a formal design methodology that can be used to design and implement reconfigurable soft radios. Most designs are based on ad hoc approaches which are appropriate only for the problem at hand. After examining the design issues of a soft radio an architecture, called the {em{Layered Radio Architecture}}, is developed with the use of stream based processing and run-time reconfigurable hardware. These choices aid in maximizing performance with minimum hardware while keeping the architecture robust, simple, and scalable. The reconfigurable platform enables {em hardware paging} through reusability hardware. The stream-based approach gives a uniform modular structure to the processing modules and defines the protocol for interaction between various modules. The architecture describes a formal yet open design methodology and makes it possible to incorporate all of the features of a software radio while minimizing complexity issues. The layered architecture also defines the methodology for incorporating changes and updates into the system. The layered radio architecture assumes run-time reconfigurability of the hardware. This feature is not supported by existing commercial reconfigurable hardware, like FPGAs. An Custom Computing Machine (CCM), called Stallion that supports fast run time reconfiguration, has been developed at Virginia Tech. This dissertation describes the deficiencies of existing commercial reconfigurable hardware and shows how the Stallion is capable of supporting the layered radio architecture. The dissertation presents algorithms and procedures that can be used to implement the layered radio architecture using existing hardware. The architecture is validated with the implementation of two receivers: A single user CDMA receiver based on complex adaptive filtering and a W-CDMA downlink rake receiver with channel estimation. Performance analysis of these receivers show that it is important to keep the paging ratio high while maximizing utilization of the processing elements. The layered radio architecture with the use of Stallion can support existing high data rate systems. / Ph. D.
83

Register Transfer Level Simulation Acceleration via Hardware/Software Process Migration

Blumer, Aric David 16 November 2007 (has links)
The run-time reconfiguration of Field Programmable Gate Arrays (FPGAs) opens new avenues to hardware reuse. Through the use of process migration between hardware and software, an FPGA provides a parallel execution cache. Busy processes can be migrated into hardware-based, parallel processors, and idle processes can be migrated out increasing the utilization of the hardware. The application of hardware/software process migration to the acceleration of Register Transfer Level (RTL) circuit simulation is developed and analyzed. RTL code can exhibit a form of locality of reference such that executing processes tend to be executed again. This property is termed executive temporal locality, and it can be exploited by migration systems to accelerate RTL simulation. In this dissertation, process migration is first formally modeled using Finite State Machines (FSMs). Upon FSMs are built programs, processes, migration realms, and the migration of process state within a realm. From this model, a taxonomy of migration realms is developed. Second, process migration is applied to the RTL simulation of digital circuits. The canonical form of an RTL process is defined, and transformations of HDL code are justified and demonstrated. These transformations allow a simulator to identify basic active units within the simulation and combine them to balance the load across a set of processors. Through the use of input monitors, executive locality of reference is identified and demonstrated on a set of six RTL designs. Finally, the implementation of a migration system is described which utilizes Virtual Machines (VMs) and Real Machines (RMs) in existing FPGAs. Empirical and algorithmic models are developed from the data collected from the implementation to evaluate the effect of optimizations and migration algorithms. / Ph. D.
84

Improved Abstractions and Turnaround Time for FPGA Design Validation and Debug

Iskander, Yousef Shafik 11 September 2012 (has links)
Design validation is the most time-consuming task in the FPGA design cycle. Although manufacturers and third-party vendors offer a range of tools that provide different perspectives of a design, many require that the design be fully re-implemented for even simple parameter modifications or do not allow the design to be run at full speed. Designs are typically first modeled using a high-level language then later rewritten in a hardware description language, first for simulation and then later modified for synthesis. IP and third-party cores may differ during these final two stages complicating development and validation. The developed approach provides two means of directly validating synthesized hardware designs. The first allows the original high-level model written in C or C++ to be directly coupled to the synthesized hardware, abstracting away the traditional gate-level view of designs. A high-level programmatic interface allows the synthesized design to be validated with the same arbitrary test data on the same framework as the hardware. The second approach provides an alternative view to FPGAs within the scope of a traditional software debugger. This debug framework leverages partially reconfigurable regions to accelerate the modification of dynamic, software-like breakpoints for low-level analysis and provides a automatable, scriptable, command-line interface directly to a running design on an FPGA. / Ph. D.
85

Using an FPGA-Based Processing Platform in an Industrial Machine Vision System

King, William E. 28 April 1999 (has links)
This thesis describes the development of a commercial machine vision system as a case study for utilizing the Modular Reprogrammable Real-time Processing Hardware (MORRPH) board. The commercial system described in this thesis is based on a prototype system that was developed as a test-bed for developing the necessary concepts and algorithms. The prototype system utilized color linescan cameras, custom framegrabbers, and standard PCs to color-sort red oak parts (staves). When a furniture manufacturer is building a panel, very often they come from edge-glued paneled parts. These are panels formed by gluing several smaller staves together along their edges to form a larger panel. The value of the panel is very much dependent upon the "match" of the individual staves—i.e. how well they create the illusion that the panel came from a single board as opposed to several staves. The prototype system was able to accurately classify staves based on color into classes defined through a training process. Based on Trichromatic Color Theory, the system developed a probability density function in 3-D color space for each class based on the parts assigned to that class during training. While sorting, the probability density function was generated for each scanned piece, and compared with each of the class probability density functions. The piece was labeled the name of the class whose probability density function it most closely matched. A "best-face" algorithm was also developed to arbitrate between pieces whose top and bottom faces did not fall into the same classes. [1] describes the prototype system in much greater detail. In developing a commercial-quality machine vision system based on the prototype, the primary goal was to improve throughput. A Field Programmable Gate Array (FPGA)-based Custom Computing Machine (FCCM) called the MORRPH was selected to assume most of the computational burden, and increase throughput in the commercial system. The MORRPH was implemented as an ISA-bus interface card, with a 3 x 2 array of Processing Elements (PE). Each PE consists of an open socket which can be populated with a Xilinx 4000 series FPGA, and an open support socket which can be populated with support chips such as external RAM, math processors, etc. In implementing the prototype algorithms for the commercial system, a partition was created between those algorithms that would be implemented on the MORRPH board, and those that would be left as implemented on the host PC. It was decided to implement such algorithms as Field-Of-View operators, Shade Correction, Background Extraction, Gray-Scale Channel Generation, and Histogram Generation on the MORRPH board, and to leave the remainder of the classification algorithms on the host. By utilizing the MORRPH board, an industrial machine vision system was developed that has exceeded customer expectations for both accuracy and throughput. Additionally, the color-sorter received the International Woodworking Fair's Challengers Award for outstanding innovation. / Master of Science
86

A Model-Based Approach to Reconfigurable Computing

Taylor, Daniel Kyle 06 January 2009 (has links)
Throughout the history of software development, advances have been made that improve the ability of developers to create systems by enabling them to work closer to their application domain. These advances have given programmers higher level abstractions with which to reason about problems. A separation of concerns between logic and implementation allows for reuse of components, portability between implementation platforms, and higher productivity. Parallels can be drawn between the challenges that the field of reconfigurable computing (RC) is facing today and what the field of software engineering has gone through in the past. Most RC work is done in low level hardware description languages (HDLs) at the circuit level. A large productivity gap exists between the ability of RC developers and the potential of the technology. The small number of RC experts is not enough to meet the demands for RC applications. Model-based engineering principles provide a way to reason about RC devices at a higher level, allowing for greater productivity, reuse, and portability. Higher level abstractions allow developers to deal with larger and more complex systems. A modeling environment has been developed to aid users in creating models, storing, reusing and generating hardware implementation code for their system. This environment serves as a starting point to apply model-based techniques to the field of RC to tighten the productivity gap. Future work can build on this model-based framework to take advantage of the unique features of reconfigurable devices, optimize their performance, and further open the field to a wider audience. / Master of Science
87

A Self-Reconfiguring Platform For Embedded Systems

Leon, Santiago Andres 24 August 2001 (has links)
The JBits Application Programming Interface has significantly shortened FPGA reconfiguration times by manipulating the configurable resources of the FPGAs directly under software control. The execution of JBits programs, however, requires a Java Virtual Machine to be implemented on the platform where the configurations will be modified. This presents a problem for embedded systems where a microprocessor to run a Java Virtual Machine may not be available or desirable. This thesis discusses the implementation of a FPGA platform that allows the execution of JBits programs, effectively changing the configuration of a FPGA within a FPGA. This thesis also presents a four step developing and testing strategy for JBits programs that are intended to run on this FPGA platform. / Master of Science
88

Projeto de um sistema para monitoramento de hardware/software on-chip baseado em computação reconfigurável / A on-chip hardware/software monitoring system based on reconfigurable computing

Ravagnani, Guilherme Stella 25 April 2007 (has links)
A tendência de integração de diversos componentes em um único chip tem proporcionado um aumento da complexidade dos sistemas computacionais. Tanto as indústrias quanto o meio acadêmico estão em busca de técnicas que possibilitem diminuir o tempo e o esforço gastos com a verificação no processo de desenvolvimento de hardware, a fim de garantir qualidade, robustez e confiabilidade a esses dispositivos. De forma a contribuir para várias aplicações envolvendo a verificação de sistemas, tais como busca por erros de projeto, avaliação de desempenho, otimização de algoritmos e extração de dados do sistema, o presente trabalho propõe um sistema de monitoramento baseado em computação reconfigurável, capaz de observar de forma não intrusiva o comportamento de um SoC (System-on-Chip) em tempo de execução. Tal sistema é composto por um módulo de monitoramento responsável por captar informações de execução de software em um processador embarcado e uma ferramenta de análise, chamada ACAD, que interpreta esses dados. Por meio da realização de experimentos, verificou-se que o sistema desenvolvido foi capaz de fornecer dados fiéis sobre a quantidade de acessos a memória ou a outros periféricos, tempos de execução de porções (ou a totalidade) do código e número de vezes que cada instrução foi executada. Esses resultados permitem traçar, de maneira precisa, o comportamento de um software executado no processador softcore Nios II, contribuindo assim para facilitar o processo de verificação em sistemas baseados em computação reconfigurável / The trend of integrating several components on a single chip has motivated an increase in the complexity of computing systems. Both industry and academy are in search of new techniques that allow time and effort spent with verification on hardware development process to be reduced to guarantee quality, robustness, reability to these devices. In order to contribute to applications in the system verification area, such as search for design errors, performance evaluation, algorithm optimization and data extraction from the system, this work proposes a monitoring system based on reconfigurable computing. This system must be able to have a run-time non-intrusive probing of a System-on-Chip behaviour. It is formed by a monitoring core responsible for capturing software execution information of a embedded processor and an analysis tool, called ACAD, that decodes the data. Empirically, the implemented system was able to provide precise data about the amount of memory and other peripherals accesses, time measurement for sections (or the entire) of the source code, and number of times each instruction was executed. These results allow to draw, in accurate way, the behaviour of a software executed on the softcore Nios II processor, collaborating to make the verification process of systems based on reconfigurable computing easier
89

Projeto de um processador open source em Bluespec baseado no processador soft-core Nios II da Altera / Design of an open source processor in Bluespec based on Altera Nios II soft-core processor

Pereira, Erinaldo da Silva 09 June 2014 (has links)
Este trabalho apresenta o desenvolvimento de um processador open source baseado no processador Nios II da Altera. O processador desenvolvido permite a customização de instruções, a inclusão de componentes que possibilitem um estudo detalhado da memória cache, tal como um monitor de cache, definir o tamanho da cache, dentre outras características. Além disso, o processador é baseado na arquitetura do Nios II e implementa 90% do ISA do Nios II, o mesmo está integrado aos ambientes Qsys e SOPC Builder da ferramenta Quartus II da Altera, sendo possível utilizar todo o conjunto de IP (Propriedade Intelectual) e ferramentas disponíveis pela Altera. Assim, este trabalho tem como propósito colaborar com o desenvolvimento de arquiteturas de hardware com uma unidade de processamento configurável e customizável facilmente pelo usuário, uma vez que o seu código fonte em Bluespec SystemVerilog está aberto a todos os usuários, diferente do Nios II da Altera, que tem o código encriptado, inviabilizando fornecer qualquer mudança no processador a nível RTL (Register Transfer Level ). Para o desenvolvimento do processador foi utilizada a Linguagem de Descrição de Hardware Bluespec SystemVerilog, pelo fato de ser uma ESL (Electronic System Level ) que acelera o processo de desenvolvimento de hardware / This work presents the development of an open source based Nios II processor from Altera. The developed processor allows custom instructions, use of components that allows a detailed study of the cache memory, among other features. In addition, the processor is based on the Nios II architecture, which can be integrated into the Qsys and SOPC Builder of the Altera Quartus II environment tool as well as use the entire set of IP (Intellectual Property) and tools available from Altera. This work contributes to the development of hardware architectures with a processing unit configurable and easily customizable by the user, since its source code in Bluespec SystemVerilog is open to all users, other than the Nios II from Altera which has encrypted code, making it impossible to do any changes in the processor at RTL (Register Transfer level) level. For the development of the processor hardware the description language Bluespec SystemVerilog was used, which is an ESL (Electronic System Level) that speeds up the development of the hardware
90

ChipCflow - em hardware dinamicamente reconfigurável / ChipCflow - in dynamically reconfigurable hardware

Astolfi, Vitor Fiorotto 04 December 2009 (has links)
Nos últimos anos, houve um grande avanço na computação reconfigurável, em particular em hardware que emprega Field-Programmable Gate Arrays. Porém, esse aumento de capacidade e desempenho aumentou a distância entre a capacidade de projeto e a disponibilidade de tecnologia para o desenvolvimento do projeto. As linguagens de programação imperativas de alto nível, como C, são mais apropriadas para o desenvolvimento de aplicativos complexos que as linguagens de descrição de hardware. Por isso, surgiram diversas ferramentas para o desenvolvimento de hardware a partir de código em C. A ferramenta ChipCflow, da qual faz parte este projeto, é uma delas. A execução dos programas por meio dessa ferramenta será completamente baseada em seu fluxo de dados, seguindo o modelo dinâmico encontrado nas arquiteturas de computadores a fluxo de dados, aproveitando ao máximo o paralelismo considerado natural desse modelo e as características do hardware parcialmente reconfigurável. Neste projeto em particular, o objetivo é a prova de conceito (proof of concept) para a criação de instâncias, em forma de operadores, de um algoritmo ChipCflow em hardware parcialmente reconfigurável, tendo como base a plataforma Virtex da Xilinx / In recent years, reconfigurable computing has become increasingly more advanced, especially in hardware that uses Field-Programmable Gate Arrays. However, the increase of performance in FPGAs accumulated the gap between design capacity and technology for the development of the design. Imperative high-level programming languages such as C are more appropriate for the development of complex algorithms than hardware description languages (HDL). For this reason, many ANSI C-like programming tools for the development of hardware came to existence. The ChipCflow project, of which this project is part, is one of these tools. The execution of algorithms through this tool will be completely directed by data flow, according to the dynamic model found on Dataflow Architectures, taking advantage of its natural high levels of parallelism and the characteristics of the partially reconfigurable hardware. In this project, the objective is a proof of concept for the creation of instances, in the form of operators, of a ChipCflow algorithm on a partially reconfigurable hardware, taking as reference the Xilinx Virtex boards

Page generated in 0.4388 seconds