Global ETD Search

291	A Device-Level FPGA Simulator Hunter, Jesse Everett III 03 August 2004 (has links) In the realm of FPGAs, many tool vendors offer behaviorally-based simulators aimed at easing the complexity of large FPGA designs. At times, a behaviorally-modeled design does not work in hardware as expected or intended. VTsim, a Virtex-II device simulator, was designed to resolve this and many other design problems by providing a window into the FPGA fabric via a virtual device. VTsim is an event-driven device simulator modeled at the CLB level with multiple clock domain support. Utilizing JBits3 and ADB, VTsim enables simulation and examination of all resources within an FPGA via a virtual device. The only input required by VTsim is a bitstream, which can be generated from any tool suite. The simulator is part of the JHDLBits open-source project, and was designed for rapid response, low memory usage, and ease of interaction. / Master of Science Field programmable gate arrays Device Simulator JHDLBits JHDL JBits VTsim Virtex-II Xilinx
292	Runtime Intellectual Property Protection on Programmable Platforms Simpson, Eric 18 July 2007 (has links) Modern Field-Programmable Gate Arrays (FPGAs) can accommodate complex system-on-chip designs and require extensive intellectual-property (IP) support. However, current IP protection mechanisms in FPGAs are limited, and do not reach beyond whole-design bitstream encryption. This work presents an architecture and protocol for securing IP based designs in programmable platforms. The architecture is reprsented by the Secure Authentication Module (SAM), an enabler for next-generation intellectual-property exchange in complex FPGAs. SAM protects hardware, software, application data, and also provides mutual assurances for the end-user and the intellectual-property developer. Further, this work demonstrates the use of SAM in a secure video messaging device on top of a Virtex-II Pro development system. / Master of Science runtime protection intellectual property HW/SW authentication PUF IP programmable platform Field programmable gate arrays
293	Framework for a Context-Switching Run-Time Reconfigurable System Lehn, David Ilan 10 May 2002 (has links) The reprogrammable nature of configurable computing machines has led to a wealth of research in run-time reconfigurable systems and applications. A limitation often encountered in this research is the slow configuration time with respect to the system clock speed. One technique to deal with these configuration delays has been to develop devices that can hold multiple rapidly interchangeable configurations. This technique is known as context-switching. This thesis discusses the development of a framework to support applications which execute on a run-time reconfigurable system containing context-switching devices. The framework is divided into a number of layers: hardware, middleware, software, and applications. The design, implementation, and details of each layer are presented. / Master of Science Context Switching Run-Time Reconfiguration CCM Field programmable gate arrays Configurable Computing
294	A Scalable Approach to Multi-core Prototyping Newcomb, Jamie David 22 April 2008 (has links) In recent years, multi-core processors and multi-processor networks have grown in popularity as a solution to the limits on increasing clock speed, rising power consumption, and the nanometer manufacturing processes. Multi-core processors and multi-processor networks are seen as the next step in the advancement of computational capabilities by way of concurrent processing. However, parallel software design is difficult due to the immaturity of scalable architectures and software development environments for multi-core hardware. How should processors effectively and quickly pass information, with as little overhead as possible? What kind of communication architecture is best suited for parallelism? How can large-scale architectures be quickly produced, verified and properly utilized by software? Using commercially available FPGA development boards, Xilinx tools and components, this thesis offers a light-weight solution to these questions for effective, low-overhead, low-latency multi-core communication and fast prototyping of multi-processor networks for scalable processor arrays. / Master of Science multi-gigabit Aurora Xilinx Field programmable gate arrays multi-processor array multi-core
295	Searching Biological Sequence Databases Using Distributed Adaptive Computing Pappas, Nicholas Peter 06 February 2003 (has links) Genetic research projects currently can require enormous computing power to processes the vast quantities of data available. Further, DNA sequencing projects are generating data at an exponential rate greater than that of the development microprocessor technology; thus, new, faster methods and techniques of processing this data are needed. One common type of processing involves searching a sequence database for the most similar sequences. Here we present a distributed database search system that utilizes adaptive computing technologies. The search is performed using the Smith-Waterman algorithm, a common sequence comparison algorithm. To reduce the total search time, an initial search is performed using a version of the algorithm, implemented in adaptive computing hardware, which is designed to efficiently perform the initial search. A final search is performed using a complete version of the algorithm. This two-stage search, employing adaptive and distributed hardware, achieves a performance increase of several orders of magnitude over similar processor based systems. / Master of Science bioinformatics Smith-Waterman algorithm adaptive computing configurable computing sequence comparison sequence alignment Field programmable gate arrays
296	Context Switching Strategies in a Run-Time Reconfigurable system Puttegowda, Kiran 30 April 2002 (has links) A distinctive feature of run-time reconfigurable systems is the ability to change the configuration of programmable resources during execution. This opens a number of possibilities such as virtualisation of computational resources, simplified routing and in certain applications lower power. Seamless run-time reconfiguration requires rapid configuration. Commodity programmable devices have relatively long configuration time, which makes them poor candidates for run-time reconfigurable systems. Reducing this reconfiguration time to the order of nano seconds will enable rapid run-time reconfiguration. Having multiple configuration planes and switching between them while processing data is one approach towards achieving rapid reconfiguration. An experimental context switching programmable device, called the Context Switching Reconfigurable Computer (CSRC), has been created by BAE Systems, which provided opportunities to explore context-switching strategies for run-time reconfigurable systems. The work presented here studies this approach for run-time reconfiguration, by applying the concepts to develop applications on a context switching reconfigurable system. The work also discusses the advantages and disadvantages of such an approach and ways of leveraging the concept for efficient computing. / Master of Science Field programmable gate arrays virtual hardware multi-context context switching configurable computing run-time reconfiguration
297	Using an FPGA-Based Processing Platform in an Industrial Machine Vision System King, William E. 28 April 1999 (has links) This thesis describes the development of a commercial machine vision system as a case study for utilizing the Modular Reprogrammable Real-time Processing Hardware (MORRPH) board. The commercial system described in this thesis is based on a prototype system that was developed as a test-bed for developing the necessary concepts and algorithms. The prototype system utilized color linescan cameras, custom framegrabbers, and standard PCs to color-sort red oak parts (staves). When a furniture manufacturer is building a panel, very often they come from edge-glued paneled parts. These are panels formed by gluing several smaller staves together along their edges to form a larger panel. The value of the panel is very much dependent upon the "match" of the individual staves—i.e. how well they create the illusion that the panel came from a single board as opposed to several staves. The prototype system was able to accurately classify staves based on color into classes defined through a training process. Based on Trichromatic Color Theory, the system developed a probability density function in 3-D color space for each class based on the parts assigned to that class during training. While sorting, the probability density function was generated for each scanned piece, and compared with each of the class probability density functions. The piece was labeled the name of the class whose probability density function it most closely matched. A "best-face" algorithm was also developed to arbitrate between pieces whose top and bottom faces did not fall into the same classes. [1] describes the prototype system in much greater detail. In developing a commercial-quality machine vision system based on the prototype, the primary goal was to improve throughput. A Field Programmable Gate Array (FPGA)-based Custom Computing Machine (FCCM) called the MORRPH was selected to assume most of the computational burden, and increase throughput in the commercial system. The MORRPH was implemented as an ISA-bus interface card, with a 3 x 2 array of Processing Elements (PE). Each PE consists of an open socket which can be populated with a Xilinx 4000 series FPGA, and an open support socket which can be populated with support chips such as external RAM, math processors, etc. In implementing the prototype algorithms for the commercial system, a partition was created between those algorithms that would be implemented on the MORRPH board, and those that would be left as implemented on the host PC. It was decided to implement such algorithms as Field-Of-View operators, Shade Correction, Background Extraction, Gray-Scale Channel Generation, and Histogram Generation on the MORRPH board, and to leave the remainder of the classification algorithms on the host. By utilizing the MORRPH board, an industrial machine vision system was developed that has exceeded customer expectations for both accuracy and throughput. Additionally, the color-sorter received the International Woodworking Fair's Challengers Award for outstanding innovation. / Master of Science color sorting hardwood image processing machine vision Field programmable gate arrays color matching reconfigurable computing
298	Dynamic Module Library Generation for FPGA-based Run-Time Reconfigurable Systems Bowen, John Kipp 25 February 2008 (has links) Modern Field Programmable Gate Arrays (FPGAs) can implement entire run-time reconfigurable systems using partial reconfiguration. Module-based run-time reconfiguration permits the construction of custom applications at run-time using pre-compiled Intellectual Property (IP) from a module library. The need for both flexible module placement and custom inter-module communication is mostly ignored by existing modular run-time reconfiguration approaches and few existing tool flows for module generation address the need for automation. This thesis introduces an automated compile-time tool flow for generating dynamic modules that allow flexible run-time placement and communication synthesis. / Master of Science Partial Reconfiguration Field programmable gate arrays Run-time Reconfiguration Module Library
299	Characterization of Sparsity-aware Optimization Paths for Graph Traversal on FPGA Gondhalekar, Atharva 25 May 2023 (has links) Breath-first search (BFS) is a fundamental building block in many graph-based applications, but it is difficult to optimize for a field-programmable gate array (FPGA) due to its irregular memory-access patterns. Prior work, based on hardware description languages (HDLs) and high-level synthesis (HLS), address the memory-access bottleneck of BFS by using techniques such as data alignment and compute-unit replication on FPGAs. The efficacy of such optimizations depends on factors such as the sparsity of target graph datasets. Optimizations intended for sparse graphs may not work as effectively for dense graphs on an FPGA and vice versa. This thesis presents two sets of FPGA optimization strategies for BFS, one for near-hypersparse graphs and the other designed for sparse to moderately dense graphs. For near-hypersparse graphs, a queue-based kernel with maximal use of local memory on FPGA is implemented. For denser graphs, an array-based kernel with compute-unit replication is implemented. Across a diverse collection of graphs, our OpenCL optimization strategies for near-hypersparse graphs delivers a 5.7x to 22.3x speedup over a state-of-the-art OpenCL implementation, when evaluated on an Intel Stratix~10 FPGA. The optimization strategies for sparse to moderately dense graphs deliver 1.1x to 2.3x speedup over a state-of-the-art OpenCL implementation on the same FPGA. Finally, this work uses graph metrics such as average degree and Gini coefficient to observe the impact of graph properties on the performance of the proposed optimization strategies. / M.S. / A graph is a data structure that typically consists of two sets -- a set of vertices and a set of edges representing connections between the vertices. Graphs are used in a broad set of application domains such as the testing and verification of digital circuits, data mining of social networks, and analysis of road networks. In such application areas, breadth-first search (BFS) is a fundamental building block. BFS is used to identify the minimum number of edges needed to be traversed from a source vertex to one or many destination vertices. In recent years, several attempts have been made to optimize the performance of BFS on reconfigurable architectures such as field-programmable gate arrays (FPGAs). However, the optimization strategies for BFS are not necessarily applicable to all types of graphs. Moreover, the efficacy of such optimizations oftentimes depends on the sparsity of input graphs. To that end, this work presents optimization strategies for graphs with varying levels of sparsity. Furthermore, this work shows that by tailoring the BFS design based on the sparsity of the input graph, significant performance improvements are obtained over the state-of-the-art BFS implementations on an FPGA. High performance computing Reconfigurable computing Graph traversal Field-programmable gate arrays
300	f-DSM: An FPGA-Accelerated Distributed Shared Memory for Heterogeneous Instruction-Set-Architecture Hardware VSathish, Naarayanan Rao 03 March 2022 (has links) Due to the diminishing relevance of Moore's Law, traditional multi-core systems are increasingly struggling to meet the computational demands of many emerging workloads. Heterogeneous computing, which involves exploiting higher degrees of parallelism (e.g., GPUs) and application-specific specialization (e.g., FPGAs), is increasingly used to meet this demand. An important architectural trend in this space involves using instruction-set-architecture (ISA) heterogeneity. An exemplar case is emerging I/O devices that include CPU cores with ISAs (e.g., ARM, RISC-V) that differ from that of host CPUs (e.g., x86) and have physically discrete memory. Shared-memory programming of such systems requires the Dis- tributed Shared Memory (DSM) abstraction. Software DSM incurs significant OS overhead for maintaining memory coherency. Despite outperforming software predecessors, hardware DSM and cache-coherent interfaces require custom chips and lack the flexibility to experiment with different DSM consistency protocols. This thesis presents fDSM, an FPGA-accelerated DSM framework for ISA-heterogeneous hardware. fDSM implements a high-speed messaging layer to enable inter-node communication across ISA-different CPU cores and a DSM protocol processor that maintains virtual memory coherency using a multiple-reader single- writer DSM algorithm. Experimental studies reveal that fDSM outperforms prior art, including Popcorn Linux's software DSM abstraction, which uses TCP-IP and state-of-the-art Infiniband RDMA messaging layers by 2.8X and 7%, respectively. fDSM also provides reconfigurability and thereby allows implementation and experimentation of different memory consistency models. / Master of Science / Moore's Law predicts that the number of transistors in a chip will double approximately every two years. Chip vendors are increasingly observing that this law is nearing its limit when transistor sizes are shrunk to 5nm and 3nm due to power consumption and heat dissipation issues. As a result, innovation in new computing architectures has increasingly focused on heterogeneity, i.e., the use of hardware performance accelerators like graphic processors and reconfigurable logic used in confluence with a computer's CPU (host). To improve the programmability of these architectures, which usually have physically separate memory, the shared-memory programming model is usually used to provide coherent virtual memory. The shared memory model, when applied to such distributed systems, called distributed shared memory (or DSM), has been previously developed in software as well as in hardware. The former usually suffer from high latency overheads, while the latter often requires custom chips and lack programmability for implementing new memory consistency protocols. This thesis presents fDSM, a reconfigurable distributed shared memory framework that provides coherent shared memory between a host and a smart I/O device such as a SmartNIC. fDSM is implemented in FPGAs, which are increasingly available in hosts and Smart I/O devices at the commodity scale. Our prototype implementation uses ISA-heterogeneous hosts to emulate such an environment. Our experimental evaluation using applications from High- Performance Computing benchmark suites reveal that fDSM yields performance benefits over a state-of-the-art software DSM. Distributed Shared Memory Sequential Consistency Popcorn Linux Field programmable gate arrays Instruction-Set-Architecture

Search results